Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Source-MySQL] Enhanced Standard sync with PK initial load -> Cursor based switch over #30270

Merged
merged 14 commits into from Sep 18, 2023

Conversation

nguyenaiden
Copy link
Contributor

@nguyenaiden nguyenaiden commented Sep 7, 2023

What

Utilize Primary keys for initial sync then switch over to user defined cursor once high water mark is found and persisted in state.

How

With normal incremental syncs now, we will be using Primary key for the initial sync and retrieve the high water mark in the beginning and construct all of the relevant Primary Key and Standard Sync iterators at the beginning of the sync.

This includes a SELECT MAX query to retrieve max cursor value and populate the initial and final StreamState to provide a checkpoint so that there won't be any missing records on subsequent syncs.

Note: The getIncrementalIterators method in the MysqlInitialLoadHandler class now has logic to handle the edge case of when the customer de-select a primary key column. Solution here is to re-add the column names to the selectedDatabaseFields in order to construct the correct SELECT query, but those columns will not be part of the record as the CatalogHelpers.getTopLevelFieldNames(airbyteStream) will not include the de-selected fields.

Tests with flag turned on is in the file MySqlPkJdbcSourceAcceptanceTest.java

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • PR name follows PR naming conventions
  • Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • Secrets in the connector's spec are annotated with airbyte_secret
  • All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a change to this file needed?

LOGGER.info("Querying max cursor value for {}.{}", namespace, name);
final String cursorField = cursorInfoOptional.get().getCursorField();
final String cursorBasedSyncStatusQuery = String.format(MAX_CURSOR_VALUE_QUERY,
cursorField,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should escape quote cursor fields like in this PR : https://github.com/airbytehq/airbyte/pull/30059/files

this.cdcState = cdcState;
this.pairToPrimaryKeyLoadStatus = initPairToPrimaryKeyLoadStatusMap(initialLoadStreams.pairToInitialLoadStatus());
this.pairToPrimaryKeyInfo = pairToPrimaryKeyInfo;
this.streamsThatHaveCompletedSnapshot = initStreamsCompletedSnapshot(initialLoadStreams, catalog);
}

private static Set<AirbyteStreamNameNamespacePair> initStreamsCompletedSnapshot(final InitialLoadStreams initialLoadStreams,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we deleting this code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to the Utils class so I can use it as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this back now that this is only being used in here! Just trying to minimise the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with that! On it

@@ -338,6 +349,54 @@ public List<AutoCloseableIterator<AirbyteMessage>> getIncrementalIterators(final
LOGGER.info("Using PK + CDC");
return MySqlInitialReadUtil.getCdcReadIterators(database, catalog, tableNameToTable, stateManager, emittedAt, getQuoteString());
} else {
if (isAnyStreamIncrementalSyncMode(catalog)) {
final MySqlCursorBasedStateManager cursorBasedStateManager = new MySqlCursorBasedStateManager(stateManager.getRawStateMessages(), catalog);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to MySqlInitialReadUtil? I assume this is the code that is doing the main logic of figuring out which streams to sync via pk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it! I actually eliminated the filtering based on state_type = cursor_based since anything that's not PK is implicitly cursor_based.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class MySqlCursorBasedStateManager extends StreamStateManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to the PostgresCursorBasedStateManager class correct? Basically just to enable writing the state_type and version keys? Can we add a comment for that

IIRC we had a ticket to refactor this in StreamStateManager when we implemented source-mysql - do you think we should do that here?


}

public record StreamsCategorised(InitialLoadStreams initialLoadStreams,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this and just keep InitialLoadStreams? Anything that is not in InitialLoadStreams would implicitly be a cursor based streams right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Addressed above.

@nguyenaiden nguyenaiden marked this pull request as ready for review September 11, 2023 04:45
@nguyenaiden nguyenaiden requested review from a team as code owners September 11, 2023 04:45
@nguyenaiden nguyenaiden changed the title [Source-MySQL] Enhanced Standard sync with PK initial load -> Cursor based switch over [WIP] [Source-MySQL] Enhanced Standard sync with PK initial load -> Cursor based switch over Sep 11, 2023
@github-actions
Copy link
Contributor

source-snowflake test report (commit 569bfc0623) - ❌

⏲️ Total pipeline duration: 13mn02s

Step Result
Java Connector Unit Tests
Build connector tar
Build source-snowflake docker image for platform linux/x86_64
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-snowflake/metadata.yaml
Connector version semver check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-snowflake test

@github-actions
Copy link
Contributor

github-actions bot commented Sep 11, 2023

Coverage report for source-postgres

File Coverage [87.89%] 🍏
PostgresSource.java 88.62% 🍏
PostgresUtils.java 84.9% 🍏
Total Project Coverage 70.56% 🍏

@github-actions
Copy link
Contributor

source-cockroachdb test report (commit 569bfc0623) - ❌

⏲️ Total pipeline duration: 602mn36s

Step Result
Java Connector Unit Tests
Build connector tar
Build source-cockroachdb docker image for platform linux/x86_64
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-cockroachdb/metadata.yaml
Connector version semver check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-cockroachdb test

@github-actions
Copy link
Contributor

source-mysql-strict-encrypt test report (commit 569bfc0623) - ❌

⏲️ Total pipeline duration: 41mn21s

Step Result
Java Connector Unit Tests
Build connector tar
Build source-mysql-strict-encrypt docker image for platform linux/x86_64
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-mysql-strict-encrypt/metadata.yaml
Connector version semver check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql-strict-encrypt test

@github-actions
Copy link
Contributor

source-oracle-strict-encrypt test report (commit 569bfc0623) - ❌

⏲️ Total pipeline duration: 21mn31s

Step Result
Java Connector Unit Tests
Build connector tar
Build source-oracle-strict-encrypt docker image for platform linux/x86_64
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-oracle-strict-encrypt/metadata.yaml
Connector version semver check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-oracle-strict-encrypt test

@github-actions
Copy link
Contributor

source-postgres test report (commit 569bfc0623) - ✅

⏲️ Total pipeline duration: 29mn26s

Step Result
Java Connector Unit Tests
Build connector tar
Build source-postgres docker image for platform linux/x86_64
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-postgres/metadata.yaml
Connector version semver check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres test

@github-actions
Copy link
Contributor

source-mysql test report (commit 76c62cea72) - ✅

⏲️ Total pipeline duration: 02mn36s

Step Result
Build connector tar
Build source-mysql docker image for platform linux/x86_64
Java Connector Unit Tests
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-mysql/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql test

@github-actions
Copy link
Contributor

source-postgres test report (commit 76c62cea72) - ❌

⏲️ Total pipeline duration: 02mn31s

Step Result
Build connector tar
Build source-postgres docker image for platform linux/x86_64
Java Connector Unit Tests
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-postgres/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres test

@github-actions
Copy link
Contributor

source-mssql test report (commit 76c62cea72) - ❌

⏲️ Total pipeline duration: 13.21s

Step Result

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mssql test

@octavia-squidington-iii octavia-squidington-iii removed the area/documentation Improvements or additions to documentation label Sep 18, 2023
@nguyenaiden
Copy link
Contributor Author

/approve-and-merge reason="ci passes and features are gated behind flag". Source-postgres and mssql changes are non-functional. "

@octavia-approvington
Copy link
Contributor

Jerry would be proud
Spongebob

@octavia-approvington octavia-approvington merged commit f226503 into master Sep 18, 2023
19 of 26 checks passed
@octavia-approvington octavia-approvington deleted the mysql-standard-pk-sync branch September 18, 2023 22:17
@github-actions
Copy link
Contributor

source-mysql test report (commit b0c1071caf) - ❌

⏲️ Total pipeline duration: 30mn51s

Step Result
Build connector tar
Build source-mysql docker image for platform linux/x86_64
Java Connector Unit Tests
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-mysql/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql test

@github-actions
Copy link
Contributor

source-mssql test report (commit b0c1071caf) - ❌

⏲️ Total pipeline duration: 6.00s

Step Result

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mssql test

@github-actions
Copy link
Contributor

source-postgres test report (commit b0c1071caf) - ❌

⏲️ Total pipeline duration: 01mn51s

Step Result
Build connector tar
Build source-postgres docker image for platform linux/x86_64
Java Connector Unit Tests
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-postgres/metadata.yaml
Connector version semver check
Connector version increment check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-postgres test

@github-actions
Copy link
Contributor

source-mysql-strict-encrypt test report (commit b0c1071caf) - ❌

⏲️ Total pipeline duration: 08mn25s

Step Result
Build connector tar
Build source-mysql-strict-encrypt docker image for platform linux/x86_64
Java Connector Unit Tests
Java Connector Integration Tests
Acceptance tests
Validate airbyte-integrations/connectors/source-mysql-strict-encrypt/metadata.yaml
Connector version semver check
QA checks

🔗 View the logs here

☁️ View runs for commit in Dagger Cloud

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-mysql-strict-encrypt test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants