-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
postgres-source: complete implementation for for ctid and xmin sync #27302
postgres-source: complete implementation for for ctid and xmin sync #27302
Conversation
Before Merging a Connector Pull RequestWow! What a great pull request you have here! 🎉 To merge this PR, ensure the following has been done/considered for each connector added or updated:
If the checklist is complete, but the CI check is failing,
|
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|---|---|---|
source-alloydb |
2.0.28 |
✅ | ✅ |
source-alloydb-strict-encrypt |
2.0.28 |
🔵 (ignored) |
🔵 (ignored) |
source-postgres-strict-encrypt |
2.0.34 |
🔵 (ignored) |
🔵 (ignored) |
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Destinations (0)
Connector | Version | Changelog | Publish |
---|
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Other Modules (0)
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the cloud or oss registry, so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that you have added a metadata.yaml file and the expected registries are enabled. |
Coverage report for source-postgres
|
…nto state-structure-change-xmin-ctid
@rodireich @akashkulk please take a look at this PR, I pulled in changes from Rodi's PR #27229 but I realised it lacked things in order for me to build the logic so made such changes to unblock myself. This is still in draft cause it lacks tests but if you both are happy with the changes, I will add the tests and mark it as ready for review. |
...ource-postgres/src/main/java/io/airbyte/integrations/source/postgres/PostgresQueryUtils.java
Show resolved
Hide resolved
@@ -89,7 +90,9 @@ public static XminStatus getXminStatus(final JdbcDatabase database) throws SQLEx | |||
return new XminStatus() | |||
.withNumWraparound(result.get(NUM_WRAPAROUND_COL).asLong()) | |||
.withXminXidValue(result.get(XMIN_XID_VALUE_COL).asLong()) | |||
.withXminRawValue(result.get(XMIN_RAW_VALUE_COL).asLong()); | |||
.withXminRawValue(result.get(XMIN_RAW_VALUE_COL).asLong()) | |||
.withVersion(2L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this version be explicitly defined in a common class somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friendly ping - I think we should explicitly define a common version constant somewhere (maybe in a version utility file)
final PostgresXminHandler handler = new PostgresXminHandler(database, sourceOperations, getQuoteString(), xminStatus, xminStateManager); | ||
return handler.getIncrementalIterators(catalog, tableNameToTable, emittedAt); | ||
final List<AirbyteStateMessage> rawStateMessages = stateManager.getRawStateMessages(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as I understand at a very high-level we are :
- Identifying streams to sync via ctid (these include newly added streams and in progress streams)
- Identify streams to sync via xmin
- Build those interators
- Concatenate them
If so, can we add a quick comment indicating this is what we're doing
final List<AirbyteStateMessage> statesFromCtidSync = new ArrayList<>(); | ||
final List<AirbyteStateMessage> statesFromXminSync = new ArrayList<>(); | ||
|
||
final Set<AirbyteStreamNameNamespacePair> alreadySeenStreams = new HashSet<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add helper methods to calculate streamsStillInCtidSync
, streamsForXminSync
, etc.
@@ -73,6 +73,14 @@ public XminStatus getXminStatus(final AirbyteStreamNameNamespacePair pair) { | |||
* @return AirbyteMessage which includes information on state of records read so far | |||
*/ | |||
public static AirbyteMessage createStateMessage(final AirbyteStreamNameNamespacePair pair, final XminStatus xminStatus) { | |||
final AirbyteStateMessage stateMessage = getAirbyteStateMessage(pair, xminStatus); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just a refactor?
if (!streamsForCtidSync.isEmpty()) { | ||
final CtidStateManager ctidStateManager = new CtidStateManager(statesFromCtidSync); | ||
final PostgresCtidHandler ctidHandler = new PostgresCtidHandler(database, sourceOperations, getQuoteString(), ctidStateManager, | ||
x -> new ObjectMapper().valueToTree(xminStatus), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : IMO it's super confusing to have lambdas as function arguments here and above. can we calculate them seperately and pass them in?
new ConfiguredAirbyteCatalog().withStreams(streamsForXminSync), tableNameToTable, emittedAt)); | ||
} | ||
|
||
return Stream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : More of an extension of the discussion we had in the AM
Maybe not for this PR, but this is the issue I was talking about : https://github.com/airbytehq/airbyte/issues/27115 where ideally we want to kick off an incremental xmin sync after the initial successful ctid sync.
One way to do this is to slightly augment what you have here. Even for streams that belong to streamsForCtidSync
we can build iterators for the xmin query and merge them so that the xmin iterator runs after the ctid iterator.
When that happens, we'll have to make sure that the handlers can merge streams based on stream name
|
Step | Result |
---|---|
Validate airbyte-integrations/connectors/source-postgres/metadata.yaml | ✅ |
Connector version semver check. | ✅ |
Connector version increment check. | ❌ |
QA checks | ✅ |
Build connector tar | ✅ |
Build source-postgres docker image for platform linux/x86_64 | ✅ |
Unit tests | ✅ |
Integration tests | ✅ |
Acceptance tests | ❌ |
Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command
airbyte-ci connectors --name=source-postgres test
/legacy-test connector=connectors/source-postgres
Build FailedTest summary info:
|
@subodh1810 I believe the automated test workflow failed due to a memory problem. |
* Initial logic for xmin wraparound * Add tests * Address comments * add xmin-wraparound check * address review comments * 🤖 Auto format source-postgres code [skip ci] * missed this * 🤖 Auto format source-postgres code [skip ci] --------- Co-authored-by: Akash Kulkarni <akash@airbyte.io> Co-authored-by: Akash Kulkarni <113392464+akashkulk@users.noreply.github.com> Co-authored-by: octavia-squidington-iii <octavia-squidington-iii@users.noreply.github.com>
/approve-and-merge reason="The CI seems to have an issue + change is not going to impact any current running connector cause its for xmin which is a completely new sync mode." |
FYI @subodh1810 the integration tests passed: https://storage.googleapis.com/airbyte-ci-reports/airbyte-ci/connectors/test/manual/state-structure-change-xmin-ctid/1687417901/f0a1dc48c0dd6ad70008c55626d1e014024f2654/source-postgres/2.0.34/output.html Acceptance tests are knowingly not passing and @bnchrch will investigate soon. |
@alafanechere, @bnchrch As part of this work to sync large tables in a new way, we're also making change to the structure of our saved state. this is most likely going to need a matching change on CAT. |
Issue : #26724, #26735 and everything related to ctid + xmin syncs.
The ctid iterator comes from PR #27229 but that was test PR plus I messed up while pushing commit so closing it in favor of this PR. This PR contains end to end features for xmin and ctid compatibility syncs
I have tested this PR in my local airbyte instance as well. I set up a connector and ran it though xmin sync mode (had to edit spec to do that) and ran a couple of syncs and verified that it works as expected. The first sync went through ctid sync mode and second sync went through xmin. Here are the logs from the two syncs.
d801b200_62f7_478c_b955_c041315cc10a_logs_30_txt.txt
d801b200_62f7_478c_b955_c041315cc10a_logs_29_txt.txt
Snippet from state being saved in the database