-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-transactional backfill path #56
Comments
Thought for later: receiving a resolved timestamp need not actually flush all pending writes to the destination tables, but just record a target timestamp for an ambient, batching, flush process. For applications that only need to arrive at an eventually-consistent state (i.e.: not running fully-consistent queries against the target db, but just as a standby cluster), you just need to make sure that the flush process has completed up to the last resolved timestamp. The current behavior of transactional flushes could be maintained by optionally eliminating any limits in the flush query. |
Work in #73 adds an immediate mode wherein data rows are immediately written to the target tables. It might be interesting to enable this automatically for targets that have no previous resolved timestamp. |
There are related problems to consider:
initial_scan
over large data setsThese would seem to call for some kind of direct-feed approach, where cdc-sink bypasses the staging table and populates the target tables directly. For the
initial_scan
case, this should be relatively safe, as there is no expectation that the target database would be usable until the backfill is complete. For the other cases, we may want to allow the operator to place cdc-sink into a non-transactional, "catch up" mode to accommodate operational issues that may be encountered.The text was updated successfully, but these errors were encountered: