This change overhauls the cdc-sink code to split it into well-defined packages
and APIs.
Notable functional changes:
A cdc-sink endpoint now operates on an entire database schema (i.e. the
second-level namespace) at once, since this is the use-case that has been most
prominent in discussions around our preferred microservice architecture. For
users that only ever use the "public" schema in their databases, this will
handle whole-database use cases.
An "immediate" mode is supported, which applies incoming data without waiting
for resolved timestamps. This is intended for use when backfilling large
datasets or if a high-volume changefeed must catch up after an outage. It is
not expected to be the default configuration for cdc-sink.
The cdc-sink code can now detect and optionally recover from a limited amount
of structural schema drift between the source and target databases. The schema
for each target database is held in memory and refreshed from time to time.
Drift is checked during resolved-timestamp flushes, which will effectively
pause a changefeed until the stored payloads are at least
structurally-compatible with the target tables.
The new webhook-https:// scheme in CockroachDB v21.2 is now supported. This
necessitates that cdc-sink can use a TLS-enabled HTTP server. The option now
exists to load a certificate and private key from disk. For testing purposes, a
self-signed certifcate can be internally generated by cdc-sink.
Notes to reviewers:
The internal/types package defines interfaces for the major moving parts of the
revised cdc-sink code base. At present, each interface is implemented by a
single type, but having small APIs has been useful in identifying the
independent parts of cdc-sink.
Similarly, the package structure may be overly fine-grained in favor of
identifying specific, small portions of code that compose easily.
A comment at the top of most files indicates where code was repackaged from.
Recommended package review order:
types, source/cdc, target/stage, target/apply, target/timestamp