Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-mysql : chunking queries impl #29109

Merged
merged 43 commits into from
Aug 18, 2023
Merged

Conversation

akashkulk
Copy link
Contributor

@akashkulk akashkulk commented Aug 4, 2023

Closes #28186

Reading order + major changes :

  1. MySqlInitialLoadRecordIterator.java : New record that implements the chunking logic. Continuously makes queries to the database with the configured chunk size and previously processed high watermark of PK value evaluated. Previous logic of creating prepared statements has been moved to this class from MySqlInitialLoadHandler. e.g.
Query 1 : select * from table order by pk limit 1,800,000
Query 2 : select * from table where pk > pk_max_1 order by pk limit 1,800,000
Query 3 : select * from table where pk > pk_max_2 order by pk limit 1,800,000
  1. MySqlQueryUtils.java : Queries the size and average row length of each table in initial load.
  2. MySqlInitialLoadHandler.java: Logic to calculate the chunk (limit size) used in the above queries. Each chunk should correspond to about ~1GB of data.
  3. MySqlInitialSyncStateIterator.java, MySqlLoadGlobalStateManager.java : Adds interface to update the primary key watermark processed. This is needed to dynamically calculate pk_max_1, pk_max_2 in the example above.
  4. MySqlInitialLoadSourceOperations.java : Small bug fix to process null values. In line with the parent class at AbstractjdbcSourceOperations.java.
  5. Tests : Added tests for limit size calculation. Existing test suite should be unchanged since number of records processed + state messages emitted should stay the same

Some other changes :

  1. Update state emission for initial load from 10_000 to 100_000 to reduce log verbosity and alleviate the pressure from large state message size. There doesn't seem to be any benefit for having such a high frequency emission.

Sync logs from local testing :
26af7a74_0437_42a9_9b17_981b4c659fff_logs_57_txt.txt

@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

@akashkulk akashkulk marked this pull request as ready for review August 11, 2023 12:33
@akashkulk
Copy link
Contributor Author

/legacy-test connector=connectors/source-mysql

@github-actions
Copy link
Contributor

source-mysql test report (commit 0583a62798) - ❌

⏲️ Total pipeline duration: 20mn24s

Step Result
Validate airbyte-integrations/connectors/source-mysql/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Build connector tar
Build source-mysql docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql:integrationTest
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql test

@github-actions
Copy link
Contributor

source-mysql test report (commit 48a7c4b386) - ❌

⏲️ Total pipeline duration: 19mn29s

Step Result
Validate airbyte-integrations/connectors/source-mysql/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Build connector tar
Build source-mysql docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql:integrationTest
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql test

@akashkulk
Copy link
Contributor Author

/legacy-test connector=connectors/source-mysql

@github-actions
Copy link
Contributor

source-mysql test report (commit bfbc132528) - ❌

⏲️ Total pipeline duration: 20mn56s

Step Result
Validate airbyte-integrations/connectors/source-mysql/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Build connector tar
Build source-mysql docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql:integrationTest
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql test

@github-actions
Copy link
Contributor

source-mysql-strict-encrypt test report (commit 12aabb2cd9) - ❌

⏲️ Total pipeline duration: 13mn37s

Step Result
Validate airbyte-integrations/connectors/source-mysql-strict-encrypt/metadata.yaml
Connector version semver check
QA checks
Build connector tar
Build source-mysql-strict-encrypt docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql-strict-encrypt:integrationTest
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql-strict-encrypt test

@github-actions
Copy link
Contributor

source-mysql test report (commit 12aabb2cd9) - ❌

⏲️ Total pipeline duration: 17mn03s

Step Result
Validate airbyte-integrations/connectors/source-mysql/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Build connector tar
Build source-mysql docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql:integrationTest
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql test

@akashkulk
Copy link
Contributor Author

/legacy-test connector=connectors/source-mysql

@github-actions
Copy link
Contributor

source-mysql test report (commit f6fbdb059f) - ❌

⏲️ Total pipeline duration: 20mn43s

Step Result
Validate airbyte-integrations/connectors/source-mysql/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Build connector tar
Build source-mysql docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql:integrationTest
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql test

@github-actions
Copy link
Contributor

source-mysql-strict-encrypt test report (commit f6fbdb059f) - ❌

⏲️ Total pipeline duration: 13mn00s

Step Result
Validate airbyte-integrations/connectors/source-mysql-strict-encrypt/metadata.yaml
Connector version semver check
QA checks
Build connector tar
Build source-mysql-strict-encrypt docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql-strict-encrypt:integrationTest
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql-strict-encrypt test

@github-actions
Copy link
Contributor

source-mysql test report (commit 9911361aee) - ❌

⏲️ Total pipeline duration: 19mn31s

Step Result
Validate airbyte-integrations/connectors/source-mysql/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Build connector tar
Build source-mysql docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql:integrationTest
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql test

@akashkulk
Copy link
Contributor Author

/legacy-test connector=connectors/source-mysql

@github-actions
Copy link
Contributor

source-mysql test report (commit 3b3b92df36) - ❌

⏲️ Total pipeline duration: 21mn04s

Step Result
Validate airbyte-integrations/connectors/source-mysql/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Build connector tar
Build source-mysql docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql:integrationTest
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql test

@github-actions
Copy link
Contributor

source-mysql test report (commit d541b4b79a) - ❌

⏲️ Total pipeline duration: 20mn44s

Step Result
Validate airbyte-integrations/connectors/source-mysql/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Build connector tar
Build source-mysql docker image for platform linux/x86_64
./gradlew :airbyte-integrations:connectors:source-mysql:integrationTest
Acceptance tests

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql test

@akashkulk
Copy link
Contributor Author

/approve-and-merge reason="mysql tests passing"

@octavia-approvington
Copy link
Contributor

All in!!
all in baby

@octavia-approvington octavia-approvington merged commit 2b18864 into master Aug 18, 2023
17 of 21 checks passed
@octavia-approvington octavia-approvington deleted the chunking_chkpt branch August 18, 2023 19:26
harrytou pushed a commit to KYVENetwork/airbyte that referenced this pull request Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query data in chunks for initial snapshot
5 participants