Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres on Resumable full refresh #37112

Merged
merged 45 commits into from
May 10, 2024
Merged

Postgres on Resumable full refresh #37112

merged 45 commits into from
May 10, 2024

Conversation

xiaohansong
Copy link
Contributor

Postgres on Resumable full refresh

  • adapt to rfr cdk interface
  • create state manager for rfr (final state handling)

Copy link

vercel bot commented Apr 12, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview May 10, 2024 4:00pm

@rodireich
Copy link
Contributor

rodireich commented May 8, 2024

An empty tables saved a streamState: null, causing the next sync to fail

Actually, even a table with small amount of records will first emit a null stream state. Not sure if that's the case with mssql and mysql also

@rodireich
Copy link
Contributor

With xmin: final state is not saved so the next full refresh sync will read the last 10,000 records chunk over and over

@xiaohansong
Copy link
Contributor Author

An empty tables saved a streamState: null, causing the next sync to fail

Actually, even a table with small amount of records will first emit a null stream state. Not sure if that's the case with mssql and mysql also

it's because in postgres, unless we reach to the first checkpoint the streamState will be null. Not sure why it would cause to fail?

@xiaohansong xiaohansong requested a review from a team as a code owner May 9, 2024 21:01
@xiaohansong
Copy link
Contributor Author

xiaohansong commented May 9, 2024

/publish-java-cdk

🕑 https://github.com/airbytehq/airbyte/actions/runs/9023493316
✅ Successfully published Java CDK version=0.34.2!

@xiaohansong xiaohansong enabled auto-merge (squash) May 10, 2024 16:00
@xiaohansong xiaohansong merged commit 80920d1 into master May 10, 2024
33 checks passed
@xiaohansong xiaohansong deleted the xiaohan/postgres-rfr branch May 10, 2024 16:13
@Hashcode-Ankit
Copy link

Hi @xiaohansong I tried Postgres with CDC and there was ctid for some streams and some have an empty cursor field, but once the sync failed and I started the sync again it fully refreshed again. Is this normal behavior?

@xiaohansong
Copy link
Contributor Author

xiaohansong commented Jul 5, 2024

@Hashcode-Ankit "resumable full refresh" only happens within the same sync job among attempts - that means if the sync job has a 2nd attempt it will pick up from the previous checkpoint of a full refresh stream, but if user kicks off a new sync job, regardless of the previous sync result, it will start full refresh from beginning.

If you do not wish to start from beginning consider using incremental refresh instead!

@piyushsingariya
Copy link

Hi @xiaohansong I think what @Hashcode-Ankit means here is that he's trying CDC with postgres and it's the first sync during the sync some streams are fully loaded but the cursor fields are missing for those streams, and the current running stream has a CTID state, and I think the sync failed at the same time.

When he ran the next sync ran with the same state, It's restarting the full-load for every stream, which it shouldn't had.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation CDK Connector Development Kit connectors/source/postgres
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants