Non-transactional backfill path #56

bobvawter · 2021-10-06T14:43:13Z

There are related problems to consider:

initial_scan over large data sets
High update rates on the source cluster which overwhelm the 3x write amplification in the target db (i.e.: staging and flushing on a resolved timestamp)
High update rates or very large amounts of incoming data which exceed the maximum transaction size in the target database during resolved-timestamp flush operations.

These would seem to call for some kind of direct-feed approach, where cdc-sink bypasses the staging table and populates the target tables directly. For the initial_scan case, this should be relatively safe, as there is no expectation that the target database would be usable until the backfill is complete. For the other cases, we may want to allow the operator to place cdc-sink into a non-transactional, "catch up" mode to accommodate operational issues that may be encountered.

The text was updated successfully, but these errors were encountered:

bobvawter · 2021-10-08T00:37:41Z

Thought for later: receiving a resolved timestamp need not actually flush all pending writes to the destination tables, but just record a target timestamp for an ambient, batching, flush process.

For applications that only need to arrive at an eventually-consistent state (i.e.: not running fully-consistent queries against the target db, but just as a standby cluster), you just need to make sure that the flush process has completed up to the last resolved timestamp.

The current behavior of transactional flushes could be maintained by optionally eliminating any limits in the flush query.

bobvawter · 2021-12-15T12:23:32Z

Work in #73 adds an immediate mode wherein data rows are immediately written to the target tables.

It might be interesting to enable this automatically for targets that have no previous resolved timestamp.

bobvawter linked a pull request Dec 15, 2021 that will close this issue

cdc-sink: Overhaul #73

Merged

bobvawter removed a link to a pull request Dec 15, 2021

cdc-sink: Overhaul #73

Merged

bobvawter linked a pull request Dec 17, 2021 that will close this issue

cdc-sink: Overhaul #73

Merged

bobvawter closed this as completed in #73 Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-transactional backfill path #56

Non-transactional backfill path #56

bobvawter commented Oct 6, 2021

bobvawter commented Oct 8, 2021

bobvawter commented Dec 15, 2021

Non-transactional backfill path #56

Non-transactional backfill path #56

Comments

bobvawter commented Oct 6, 2021

bobvawter commented Oct 8, 2021

bobvawter commented Dec 15, 2021