-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support pglogical replication #64
Comments
This was referenced Dec 1, 2021
bobvawter
added a commit
that referenced
this issue
Dec 1, 2021
If there are multiple changes to the same row that have been staged, we only want to apply the latest one, to avoid multiple updates to the same target row when flushing a resolved timestamp. This is currently accomplished by having cdc-sink read all staged changes in reverse-MVCC order and tracking which keys have been touched. This change moves the deduplication into CRDB to reduce the number of rows returned to cdc-sink. It is also a precursor to single-statement promotion in The Line type is broken out with a Mutation base type that only holds the PK and data to be upserted. The timestamp information is unused by the UPSERT, so we don't want to retrieve it. X-Ref: #64 X-Ref: #70
bobvawter
added a commit
that referenced
this issue
Dec 1, 2021
If there are multiple changes to the same row that have been staged, we only want to apply the latest one, to avoid multiple updates to the same target row when flushing a resolved timestamp. This is currently accomplished by having cdc-sink read all staged changes in reverse-MVCC order and tracking which keys have been touched. This change moves the deduplication into CRDB to reduce the number of rows returned to cdc-sink. It is also a precursor to single-statement promotion in The Line type is broken out with a Mutation base type that only holds the PK and data to be upserted. The timestamp information is unused by the UPSERT, so we don't want to retrieve it. X-Ref: #64 X-Ref: #70
bobvawter
added a commit
that referenced
this issue
Dec 3, 2021
If there are multiple changes to the same row that have been staged, we only want to apply the latest one, to avoid multiple updates to the same target row when flushing a resolved timestamp. This is currently accomplished by having cdc-sink read all staged changes in reverse-MVCC order and tracking which keys have been touched. This change moves the deduplication into CRDB to reduce the number of rows returned to cdc-sink. It is also a precursor to single-statement promotion in The Line type is broken out with a Mutation base type that only holds the PK and data to be upserted. The timestamp information is unused by the UPSERT, so we don't want to retrieve it. X-Ref: #64 X-Ref: #70
bobvawter
added a commit
that referenced
this issue
Dec 6, 2021
This change defers reifying the contents of a Line until the actual upsert into a table. The goal is to improve efficiency and to make it simpler to generate batches of lines from sources other than a CockroachDB CDC feed. Related: #64
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is a tracking issue to be able to consume a logical replication feed from PostgreSQL to act as a source of row data.
Much of the necessary wire-protocol is already present in jackc/pglogrepl.
In the desired end state, cdc-sink would support multiple frontends for row data, with a common backend that supports transactionally-consistent staging and reification of source data.
It's likely that a mapping/transform layer will exist in cdc-sink at some point in the future to perform on-the-fly schema or data-type adjustments. This should be accommodated as a desired end-state, but will not be added in the initial implementation.
Plan needs (revise as necessary):
Clean up existing configuration ergonomicspglogical
is now a separate subcommand.Lease the feed across multiple cdc-sink instancespglogical
is now a separate subcommand.The text was updated successfully, but these errors were encountered: