Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres intermediate state may lead to data loss for incremental sync #15427

Closed
tuliren opened this issue Aug 9, 2022 · 0 comments · Fixed by #15496
Closed

Postgres intermediate state may lead to data loss for incremental sync #15427

tuliren opened this issue Aug 9, 2022 · 0 comments · Fixed by #15496
Assignees
Labels
type/bug Something isn't working

Comments

@tuliren
Copy link
Contributor

tuliren commented Aug 9, 2022

Environment

  • Source Connector and version: Postgres source version 0.4.41
  • Step where error happened: Incremental sync

Current Behavior

  • Intermediate state message emission was introduced to Postgres source in version 0.4.41. But the implementation has a flaw that can lead to data loss.
  • In this implementation, incremental syncs will sort the table with the cursor field, and emit the max cursor for every 10K records. The purpose is to emit the states frequently, so that if any transient failure occurs during a long sync, the next run does not need to start from the beginning, but can resume from the last successful intermediate state committed on the destination. The next run will start with cursorField > cursor.
  • However, it is possible that there are multiple records with the same cursor value. If the intermediate state is emitted before all these records have been synced to the destination, some of these records may be lost.
  • Here is an example:
Record ID Cursor Field Other Field
1 F1=16 F2="abc"
2 F1=16 F2="def" <- state emission and failure
3 F1=16 F2="ghi"

If the intermediate state is emitted for record 2 and the sync fails immediately such that the cursor value 16 is committed, but only record 1 and 2 are actually synced, the next run will start with F1 > 16 and skip record 3.

Expected Behavior

Intermediate state emission should only happen when all records with the same cursor value has been synced to destination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant