-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firestore #191
Conversation
Codecov ReportBase: 67.08% // Head: 64.18% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #191 +/- ##
==========================================
- Coverage 67.08% 64.18% -2.90%
==========================================
Files 92 111 +19
Lines 5854 7441 +1587
==========================================
+ Hits 3927 4776 +849
- Misses 1610 2219 +609
- Partials 317 446 +129
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 32 of 32 files at r1, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @bobvawter and @BramGruneir)
a discussion (no related file):
Could clarify the restrictions on the target table (if any). It seems to me that docs in a collection need to have the same fields. Perhaps I'm missing something.
-- commits
line 40 at r1:
Are we assuming that all the documents in the collection have the same fields?
internal/cmd/fslogical/fslogical.go
line 68 at r1 (raw file):
} // metricsServer starts a trivial prometheus endpoint server which runs
I think we should move this to util/metrics (perhaps in a separate PR). We have this code repeated all over.
internal/source/fslogical/integration_test.go
line 27 at r1 (raw file):
) func TestMain(m *testing.M) {
Do we need tests for different data types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @bobvawter, @BramGruneir, and @sravotto)
Previously, sravotto (silvano) wrote…
Are we assuming that all the documents in the collection have the same fields?
For the specific use-case this is being built for, yes. That said, it might be interesting to allow the source document to be inserted as a JSONB
column. If there's an extras
column, then we could lift the default restriction in the apply
package that all properties of a mutation must map onto (at least one) database column. Operators could perform schema changes at their leisure to unpack the extras
column into whatever schema ultimately makes sense.
internal/cmd/fslogical/fslogical.go
line 68 at r1 (raw file):
Previously, sravotto (silvano) wrote…
I think we should move this to util/metrics (perhaps in a separate PR). We have this code repeated all over.
Agreed, that's #117
55b8169
to
07201d8
Compare
909101e
to
37e9a95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 20 files at r5, 41 of 42 files at r6, 4 of 4 files at r7, all commit messages.
Reviewable status: 51 of 52 files reviewed, 16 unresolved discussions (waiting on @bobvawter, @BramGruneir, and @sravotto)
README.md
line 566 at r7 (raw file):
## Google Firestore replication `cdc-sink` supports [Google Cloud Firestore](https://cloud.google.com/firestore) as a source of
Please add a note about how only a single instance of cdc-sink should be used when handling FS replication.
internal/source/fslogical/config.go
line 54 at r6 (raw file):
UpdatedAtProperty ident.Ident docIDTemp string
What are these temps needed for? Ah, I see. I thought pflag allowed you to add arbitrary code to convert/validate.
internal/source/fslogical/consistent_point.go
line 66 at r6 (raw file):
o := other.(*consistentPoint) tt := t.AsTime()
can you just rename the variables to make it a bit more readable?
internal/source/fslogical/fslogical.go
line 35 at r6 (raw file):
cfg *loopConfig fs *firestore.Client st *Tombstones
tomb? Why st?
internal/source/fslogical/fslogical.go
line 66 at r6 (raw file):
ctx context.Context, ch chan<- logical.Message, state logical.State, ) error { // From will either be the update time of the last streamed record,
What is "From"? I understand the concept obviously, but there is no from variable.
internal/source/fslogical/fslogical.go
line 102 at r6 (raw file):
ctx context.Context, ch chan<- logical.Message, now time.Time, cp *consistentPoint, ) (_ *consistentPoint, moreWork bool, _ error) {
nit: empty line
internal/source/fslogical/fslogical.go
line 111 at r6 (raw file):
Limit(d.cfg.BackfillBatch) if !cp.IsZero() { if cp.AsID() == "" {
What's the logic here? Can you add a comment?
internal/source/fslogical/injector.go
line 25 at r6 (raw file):
) // Start creates a PostgreSQL logical replication loop using the
Firestore
internal/source/fslogical/tombstones.go
line 63 at r6 (raw file):
// // This implementation assumes that document ids within a single // collection are not recycled.
Is this a valid assumption? It should be possible to remove a tombstone if a new mutation was previously deleted.
internal/source/fslogical/tombstones.go
line 102 at r6 (raw file):
) error { cp, _ := state.GetConsistentPoint().(*consistentPoint) q := t.coll.
Is this sorting necessary?
internal/source/fslogical/jsonload/main.go
line 53 at r6 (raw file):
eg, egCtx := errgroup.WithContext(ctx) ch := make(chan map[string]interface{}, *workers)
Can you rename these variables to something clearer? Ch is channel, sure but for what? Ct is what? Count?
internal/source/logical/loop.go
line 201 at r6 (raw file):
// from backfill to streaming modes) or if the outer // context is being shut down. if err == nil {
Why split this up?
c2cffd1
to
12add65
Compare
d8f0411
to
2e03569
Compare
88290ee
to
a3d1202
Compare
3dbd2b3
to
5a546ba
Compare
This change adds a proper ParseIdent function and reworks the Relative function to be implemented in terms of ParseIdent. ParseIdent consumes possibly-quoted sequences of UTF8 data to produce an Ident. The motivation for this change is an upcoming use-case that make use of quoted schema names that contain multi-byte characters.
This change allows an Ident to be used as the target of a CLI flag. It also simplifies the existing logical/config.go.
5a546ba
to
39e32ab
Compare
This change adds the goja JavaScript runtime to cdc-sink in order to support arbitrary user-defined behavior at runtime. The motivating use case is to be able to support migrations from document stores where the source document schema does not have a direct mapping to a tabular schema. Consider the case of a document with nested objects or collections; a single document may expand into multiple rows in several tables. The userscript is washed through esbuild to convert the source to ES5-compatible, CommonJS-packaged scripts. This has the nice side-effect of allowing the userscript to be written as TypeScript, for which a .d.ts binding is provdided. A top-level CLI command is provided to print the bindings file and additional help related to userscripts. Future work and notes: * Execution through the userscript is single-threaded. If this presents a limitation on ultimate throughput, it would be straightforward to have multiple, independent JS runtimes. * Access to the target database could be added in the future, but is not currently part of the user API. * An experiment to bundle the V8 runtime was performed, but we wound up spending as much time marshaling data over the cgo barrier as executing the user code. h/t: @sjbarag for feedback on TypeScript API.
This change prepares for the Firestore integration by allowing dialects to implement a non-transaction backfill mode which then switches to the existing, fully-transactional mode. The Config type is broken out into an interface to allow all CLI flags and configuration to be driven via the user-script. If configured, the user-script is called via a shim around the existing logical.Events interface. The logical-replication source is added to Events.OnData to facilitate routing within the user-script.
This change adds support for using Google Firestore as a logical replication source. Tests are run against the Firestore SDK Emulator, which is packaged into a private Docker container, hosted within GitHub packages.
39e32ab
to
980fb5b
Compare
This PR contains a sequence of stacked commits that add support for consuming documents from Google Firestore. The goja JavaScript runtime is incorporated, since input documents may require arbitrarily complex mapping onto a relational schema. The core logical-loop code is also augmented with a backfill mode so that documents can be consumed in a page-oriented fashion before switching to streaming subscriptions. This code is in use at a customer and they consider it to be feature-complete with respect to their use case. It is also expected that this work will make subsequent document stores easy to add.
The
cdc-sink@v1.d.ts
file documents the userscript API and shows the end-point of this PR.This change is![Reviewable](https://camo.githubusercontent.com/23b05f5fb48215c989e92cc44cf6512512d083132bd3daf689867c8d9d386888/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)