Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cat / airbyte-ci: improve CAT container orchestration #31699

Merged

Conversation

alafanechere
Copy link
Contributor

@alafanechere alafanechere commented Oct 23, 2023

What

Closes #31703

This PR aims at improving how we orchestrate CAT through airbyte-ci.
We want to:

  • Reduce test cold start duration
  • Optionally enable concurrent CAT tests

How

Reducing test cold start duration

  • Make the connector_container a session scoped fixture. It means the tests will reuse the same connector container.
  • Load the connector container with Dagger container id instead of importing the container image as a tar. Avoiding the use of tar allows us to remove the long and costly docker export and dagger import steps.

Enabling test concurrency

Pytest has pytest-xdist plugin which can distribute tests across multiple CPUs.
With the --numprocesses=auto flag in the pytest command this plugin will distribute tests across all the available cores.
I don't want to enable this by default on Python connectors as I'm concerned it can have a bad impact on the API connectors' rate limit. So I:

  • Expose a new --concurrent-cat flag on airbyte-ci connectors test command. Defaulting to false.
  • Force concurrency to true on Java connectors. Java sources are connecting to databases that are not affected by rate limits.
  • We can eventually enable CAT concurrency by default when we want to make Python connector CAT faster. But a thorough investigation of its effect must be done before.

Performance boost on source-postgres (with concurrency).

This change makes source-postgres CAT execution run in 01mn56s. Which is a ~2mn boost compared to the current 03mn54 execution on master.

Perfomance boost on python connectors (no concurrency)

I used source-google-sheets for testing.
As I said, test concurrency is disabled by default for Python connectors.
We still get a ~1mn CAT boost due to a lower test cold start.
On master: 3mn11s
On this branch: 2mn07

Recommended reading order

  1. Changes to CAT: airbyte-integrations/bases/connector-acceptance-test/*
  2. Changes to airbyte-ci: airbyte-ci/connectors/pipelines/*

🚨 User Impact 🚨

  • Significant CAT performance boost on Java connectors test thanks to concurrency. Tested for source-postgres
  • 1mn CAT performance boost on Python connectors thanks to a lower test start-up time. Tested for source-google-sheets

@vercel
Copy link

vercel bot commented Oct 23, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Oct 24, 2023 8:02am

Copy link
Contributor Author

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 23, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@alafanechere alafanechere force-pushed the augustin/10-20-cat_session_scoped_connector_container branch from e2cb8f8 to b66416e Compare October 23, 2023 09:13
@octavia-squidington-iii octavia-squidington-iii removed the area/connectors Connector related issues label Oct 23, 2023
@alafanechere alafanechere force-pushed the augustin/10-20-cat_session_scoped_connector_container branch 3 times, most recently from d87ef2e to b3c44a4 Compare October 23, 2023 11:31
@octavia-squidington-iii octavia-squidington-iii added the area/connectors Connector related issues label Oct 23, 2023
@alafanechere alafanechere marked this pull request as ready for review October 23, 2023 11:31
@alafanechere alafanechere requested a review from a team October 23, 2023 11:31
@alafanechere alafanechere requested a review from a team as a code owner October 23, 2023 11:31
@octavia-squidington-iv octavia-squidington-iv requested a review from a team October 23, 2023 11:32
@alafanechere alafanechere marked this pull request as draft October 23, 2023 11:38
@alafanechere alafanechere marked this pull request as ready for review October 23, 2023 11:59
@airbyte-oss-build-runner
Copy link
Collaborator

source-google-sheets test report (commit 7e46301bb4) - ✅

⏲️ Total pipeline duration: 01mn53s

Step Result
Build source-google-sheets docker image for platform(s) linux/amd64
Unit tests
Acceptance tests
Check our base image is used
Code format checks
Validate metadata for source-google-sheets
Connector version semver check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-google-sheets test

@alafanechere alafanechere force-pushed the augustin/10-20-cat_session_scoped_connector_container branch from 7e46301 to 5db9753 Compare October 23, 2023 12:11
@octavia-squidington-iii octavia-squidington-iii added area/connectors Connector related issues and removed area/connectors Connector related issues labels Oct 23, 2023
@airbyte-oss-build-runner
Copy link
Collaborator

source-postgres test report (commit bbb4d89b41) - ✅

⏲️ Total pipeline duration: 19mn36s

Step Result
Build connector tar
Java Connector Unit Tests
Build source-postgres docker image for platform(s) linux/amd64
Acceptance tests
Java Connector Integration Tests
Validate metadata for source-postgres
Connector version semver check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres test

@alafanechere alafanechere force-pushed the augustin/10-20-cat_session_scoped_connector_container branch from bbb4d89 to 00184ec Compare October 23, 2023 13:50
@octavia-squidington-iii octavia-squidington-iii removed the area/connectors Connector related issues label Oct 23, 2023
Copy link
Contributor

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the parts that I do understand. It makes sense to me to leverage dagger in this way.

@bazarnov
Copy link
Collaborator

I still don't understand why we are not getting performance boost for python-based sources?

Please share, if there are more details around this. LGTM!

@alafanechere alafanechere force-pushed the augustin/10-20-cat_session_scoped_connector_container branch from 00184ec to f0ce56d Compare October 24, 2023 08:02
@alafanechere
Copy link
Contributor Author

I still don't understand why we are not getting performance boost for python-based sources? Please share, if there are more details around this. LGTM!

@bazarnov As Python sources make API requests that can be rate-limited I'm afraid that running the CAT tests in parallel can easily burst this rate limit. Feel free to try out on python connectors locally by running airbyte-ci connectors --name=<your-connector> tests --concurrent-cat and let me know if you perceive a performance boost or regression 😄

@alafanechere alafanechere enabled auto-merge (squash) October 24, 2023 08:11
@alafanechere alafanechere merged commit ff2fcf8 into master Oct 24, 2023
24 checks passed
@alafanechere alafanechere deleted the augustin/10-20-cat_session_scoped_connector_container branch October 24, 2023 08:14
@sentry-io
Copy link

sentry-io bot commented Oct 24, 2023

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ ImportError: cannot import name 'PipelineContext' from 'pipelines.models.contexts' (/home/runner/.local/pipx/v... pipelines.airbyte_ci.connectors.test.steps.comm... View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lower source-postgres CAT duration
5 participants