-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Three-way merge #487
Labels
enhancement
New feature or request
Comments
Closed
bobvawter
added a commit
that referenced
this issue
Oct 6, 2023
This change is part of #487 to support three-way merges. It adds a before field to types.Mutation and allows this data to be persisted.
bobvawter
added a commit
that referenced
this issue
Oct 6, 2023
This change is part of #487 to support three-way merges. It adds a before field to types.Mutation and allows this data to be persisted. Code related to gzipping the mutation data has been extracted to helpers to compress both the before and data fields.
bobvawter
added a commit
that referenced
this issue
Oct 6, 2023
This change is part of #487 to support three-way merges. It adds a before field to types.Mutation and allows this data to be persisted. Code related to gzipping the mutation data has been extracted to helpers to compress both the before and data fields.
bobvawter
added a commit
that referenced
this issue
Oct 6, 2023
This change is part of #487 to support three-way merges. This change allows the CDC handler to extract the `before` attribute from incoming requests to populate the `Mutation.Before` field. The `server/integration_test.go` is updated to configure a RetireOffset and to inspect the staged mutation to ensure that the before value was recorded. Once merging is actually implemented, we can return to this test to verify end-to-end behavior.
bobvawter
added a commit
that referenced
this issue
Oct 6, 2023
This change is part of #487 to support three-way merges. This change allows the CDC handler to extract the `before` attribute from incoming requests to populate the `Mutation.Before` field. We use the new `RetireOffset` setting to allow the staged data to be inspected by `handler_test.go`. The `server/integration_test.go` is similarly updated to inspect the staged mutation to ensure that the before value was recorded. Once merging is actually implemented, we can return to this test to verify end-to-end behavior.
bobvawter
added a commit
that referenced
this issue
Oct 6, 2023
This change is part of #487 to support three-way merges. This change allows the CDC handler to extract the `before` attribute from incoming requests to populate the `Mutation.Before` field. We use the new `RetireOffset` setting to allow the staged data to be inspected by `handler_test.go`. The `server/integration_test.go` is similarly updated to inspect the staged mutation to ensure that the before value was recorded. Once merging is actually implemented, we can return to this test to verify end-to-end behavior.
github-merge-queue bot
pushed a commit
that referenced
this issue
Oct 6, 2023
This change is part of #487 to support three-way merges. It adds a before field to types.Mutation and allows this data to be persisted. Code related to gzipping the mutation data has been extracted to helpers to compress both the before and data fields.
bobvawter
added a commit
that referenced
this issue
Oct 6, 2023
This change is part of #487 to support three-way merges. This change allows the CDC handler to extract the `before` attribute from incoming requests to populate the `Mutation.Before` field. We use the new `RetireOffset` setting to allow the staged data to be inspected by `handler_test.go`. The `server/integration_test.go` is similarly updated to inspect the staged mutation to ensure that the before value was recorded. Once merging is actually implemented, we can return to this test to verify end-to-end behavior.
bobvawter
added a commit
that referenced
this issue
Oct 9, 2023
This change is part of #487 to support three-way merges. This change allows the CDC handler to extract the `before` attribute from incoming requests to populate the `Mutation.Before` field. We use the new `RetireOffset` setting to allow the staged data to be inspected by `handler_test.go`. The `server/integration_test.go` is similarly updated to inspect the staged mutation to ensure that the before value was recorded. Once merging is actually implemented, we can return to this test to verify end-to-end behavior.
bobvawter
added a commit
that referenced
this issue
Oct 9, 2023
This change is part of #487 to support three-way merges. This change allows the CDC handler to extract the `before` attribute from incoming requests to populate the `Mutation.Before` field. We use the new `RetireOffset` setting to allow the staged data to be inspected by `handler_test.go`. The `server/integration_test.go` is similarly updated to inspect the staged mutation to ensure that the before value was recorded. Once merging is actually implemented, we can return to this test to verify end-to-end behavior.
bobvawter
added a commit
that referenced
this issue
Oct 9, 2023
This change is part of #487 to support three-way merges. This change allows the CDC handler to extract the `before` or `cdc_prev` attributes from incoming requests to populate the `Mutation.Before` field. That is, if a tabular changefeed is created with the `diff` option, or a query changefeed includes the `cdc_prev` column, the before data will be persisted. The CDC queries endpoints had not been previously tested in `server/integration_test.go`, so there's a little bit of churn to cover some unhandled edge cases around the presence or absense of the `diff` option. This change uses the new-ish `RetireOffset` configuration to allow the staged records to be inspected even though there may be concurrent resolved timestamps being processed. Uninteresting uses of json.NewDecoder() have been replaced with json.Unmarshal(). Some error messages have been edited for clarity.
bobvawter
added a commit
that referenced
this issue
Oct 9, 2023
This change is part of #487 to support three-way merges. This change removes support for storing apply configurations in the staging database. The userscript is now integrated into all cdc-sink modes and provides superior ergonomics. The upcoming merge function would have to be configured through the userscript and does not make sense to persist. Futhermore, having two distinct ways of accomplishing a task is confusing. Breaking Change: Any deployments using table-based configuration of data application behaviors must instead switch to the userscript for configuration. The wiki pages have been scrubbed for any references to the table-based approach: X-Ref: https://github.com/cockroachdb/cdc-sink/wiki/Data-Behaviors X-Ref: https://github.com/cockroachdb/cdc-sink/wiki/User-Scripts
bobvawter
added a commit
that referenced
this issue
Oct 9, 2023
This change is part of #487 to support three-way merges. This change allows the CDC handler to extract the `before` or `cdc_prev` attributes from incoming requests to populate the `Mutation.Before` field. That is, if a tabular changefeed is created with the `diff` option, or a query changefeed includes the `cdc_prev` column, the before data will be persisted. The CDC queries endpoints had not been previously tested in `server/integration_test.go`, so there's a little bit of churn to cover some unhandled edge cases around the presence or absense of the `diff` option. This change uses the new-ish `RetireOffset` configuration to allow the staged records to be inspected even though there may be concurrent resolved timestamps being processed. Uninteresting uses of json.NewDecoder() have been replaced with json.Unmarshal(). Some error messages have been edited for clarity.
github-merge-queue bot
pushed a commit
that referenced
this issue
Oct 9, 2023
This change is part of #487 to support three-way merges. This change allows the CDC handler to extract the `before` or `cdc_prev` attributes from incoming requests to populate the `Mutation.Before` field. That is, if a tabular changefeed is created with the `diff` option, or a query changefeed includes the `cdc_prev` column, the before data will be persisted. The CDC queries endpoints had not been previously tested in `server/integration_test.go`, so there's a little bit of churn to cover some unhandled edge cases around the presence or absense of the `diff` option. This change uses the new-ish `RetireOffset` configuration to allow the staged records to be inspected even though there may be concurrent resolved timestamps being processed. Uninteresting uses of json.NewDecoder() have been replaced with json.Unmarshal(). Some error messages have been edited for clarity.
github-merge-queue bot
pushed a commit
that referenced
this issue
Oct 9, 2023
This change is part of #487 to support three-way merges. This change removes support for storing apply configurations in the staging database. The userscript is now integrated into all cdc-sink modes and provides superior ergonomics. The upcoming merge function would have to be configured through the userscript and does not make sense to persist. Futhermore, having two distinct ways of accomplishing a task is confusing. Breaking Change: Any deployments using table-based configuration of data application behaviors must instead switch to the userscript for configuration. The wiki pages have been scrubbed for any references to the table-based approach: X-Ref: https://github.com/cockroachdb/cdc-sink/wiki/Data-Behaviors X-Ref: https://github.com/cockroachdb/cdc-sink/wiki/User-Scripts
bobvawter
added a commit
that referenced
this issue
Oct 11, 2023
This change is part of #487 to support three-way merges. This change add supports for declaring a user-defind merge function within the userscript. The goja version is updated so that we can create a lightweight wrapper around the ident.Map that will store the reified mutation values.
bobvawter
added a commit
that referenced
this issue
Oct 11, 2023
This change is part of #487 to support three-way merges. This change add supports for declaring a user-defind merge function within the userscript. The goja version is updated so that we can create a lightweight wrapper around the ident.Map that will store the reified mutation values.
bobvawter
added a commit
that referenced
this issue
Oct 14, 2023
This change is part of #487 to support three-way merges. This change updates the apply package to call into the merge function when targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to return the index of the conflicting data and the contents of the blocking row. The blocking and conflicting row data are then used to drive the merge function. The merge API added in PR #534 is refined. The relevent types are extracted into their own package, which contains a new "Bag" type. A Bag holds reified properties and can represent the data in a mutation or in a database row. It additonally classifies properties as being "mapped" or "unmapped" as to whether or not the property maps onto a known column. Some of the bookkeeping previously in the apply code to track missing or extra properties is simplified. The upsert code also becomes recursive. Mutations are reified into Bags and are applied. If a Bag generates a conflict, the merge function will be called to produce a Bag that will be unconditionally applied. Once all conflicts have been resolved, the accumulated Bags will be upserted by a recursive call to the upsert method. There is only ever one level of recursion.
bobvawter
added a commit
that referenced
this issue
Oct 14, 2023
This change is part of #487 to support three-way merges. This change updates the apply package to call into the merge function when targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to return the index of the conflicting data and the contents of the blocking row. The blocking and conflicting row data are then used to drive the merge function. The merge API added in PR #534 is refined. The relevent types are extracted into their own package, which contains a new "Bag" type. A Bag holds reified properties and can represent the data in a mutation or in a database row. It additonally classifies properties as being "mapped" or "unmapped" as to whether or not the property maps onto a known column. Some of the bookkeeping previously in the apply code to track missing or extra properties is simplified. The upsert code also becomes recursive. Mutations are reified into Bags and are applied. If a Bag generates a conflict, the merge function will be called to produce a Bag that will be unconditionally applied. Once all conflicts have been resolved, the accumulated Bags will be upserted by a recursive call to the upsert method. There is only ever one level of recursion.
bobvawter
added a commit
that referenced
this issue
Oct 14, 2023
This change is part of #487 to support three-way merges. This change updates the apply package to call into the merge function when targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to return the index of the conflicting data and the contents of the blocking row. The blocking and conflicting row data are then used to drive the merge function. The merge API added in PR #534 is refined. The relevent types are extracted into their own package, which contains a new "Bag" type. A Bag holds reified properties and can represent the data in a mutation or in a database row. It additonally classifies properties as being "mapped" or "unmapped" as to whether or not the property maps onto a known column. Some of the bookkeeping previously in the apply code to track missing or extra properties is simplified. The upsert code also becomes recursive. Mutations are reified into Bags and are applied. If a Bag generates a conflict, the merge function will be called to produce a Bag that will be unconditionally applied. Once all conflicts have been resolved, the accumulated Bags will be upserted by a recursive call to the upsert method. There is only ever one level of recursion.
bobvawter
added a commit
that referenced
this issue
Oct 14, 2023
This change is part of #487 to support three-way merges. This change updates the apply package to call into the merge function when targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to return the index of the conflicting data and the contents of the blocking row. The blocking and conflicting row data are then used to drive the merge function. The merge API added in PR #534 is refined. The relevent types are extracted into their own package, which contains a new "Bag" type. A Bag holds reified properties and can represent the data in a mutation or in a database row. It additonally classifies properties as being "mapped" or "unmapped" as to whether or not the property maps onto a known column. Some of the bookkeeping previously in the apply code to track missing or extra properties is simplified. The upsert code also becomes recursive. Mutations are reified into Bags and are applied. If a Bag generates a conflict, the merge function will be called to produce a Bag that will be unconditionally applied. Once all conflicts have been resolved, the accumulated Bags will be upserted by a recursive call to the upsert method. There is only ever one level of recursion.
bobvawter
added a commit
that referenced
this issue
Oct 14, 2023
This change is part of #487 to support three-way merges. This change add supports for declaring a user-defind merge function within the userscript. The goja version is updated so that we can create a lightweight wrapper around the ident.Map that will store the reified mutation values.
bobvawter
added a commit
that referenced
this issue
Oct 14, 2023
This change is part of #487 to support three-way merges. This change updates the apply package to call into the merge function when targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to return the index of the conflicting data and the contents of the blocking row. The blocking and conflicting row data are then used to drive the merge function. The merge API added in PR #534 is refined. The relevent types are extracted into their own package, which contains a new "Bag" type. A Bag holds reified properties and can represent the data in a mutation or in a database row. It additonally classifies properties as being "mapped" or "unmapped" as to whether or not the property maps onto a known column. Some of the bookkeeping previously in the apply code to track missing or extra properties is simplified. The upsert code also becomes recursive. Mutations are reified into Bags and are applied. If a Bag generates a conflict, the merge function will be called to produce a Bag that will be unconditionally applied. Once all conflicts have been resolved, the accumulated Bags will be upserted by a recursive call to the upsert method. There is only ever one level of recursion.
bobvawter
added a commit
that referenced
this issue
Oct 14, 2023
This change is part of #487 to support three-way merges. This change updates the apply package to call into the merge function when targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to return the index of the conflicting data and the contents of the blocking row. The blocking and conflicting row data are then used to drive the merge function. The merge API added in PR #534 is refined. The relevent types are extracted into their own package, which contains a new "Bag" type. A Bag holds reified properties and can represent the data in a mutation or in a database row. It additonally classifies properties as being "mapped" or "unmapped" as to whether or not the property maps onto a known column. Some of the bookkeeping previously in the apply code to track missing or extra properties is simplified by letting the Bag keep track of unexpected input properties. The upsert code also becomes recursive. Mutations are reified into Bags and are applied. If a Bag generates a conflict, the merge function will be called to produce a Bag that will be unconditionally applied. Once all conflicts have been resolved, the accumulated Bags will be upserted by a recursive call to the upsert method. There is only ever one level of recursion.
bobvawter
added a commit
that referenced
this issue
Oct 16, 2023
This change is part of #487 to support three-way merges. This change updates the apply package to call into the merge function when targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to return the index of the conflicting data and the contents of the blocking row. The blocking and conflicting row data are then used to drive the merge function. The merge API proposed in PR #534 is refined. The relevent types are extracted into their own package, which contains a new "Bag" type. A Bag holds reified properties and can represent the data in a mutation or in a database row. It additonally classifies properties as being "mapped" or "unmapped" as to whether or not the property maps onto a known column. Some of the bookkeeping previously in the apply code to track missing or extra properties is simplified by letting the Bag keep track of unexpected input properties. The upsert code also becomes recursive. Mutations are reified into Bags and are applied. If a Bag generates a conflict, the merge function will be called to produce a Bag that will be unconditionally applied. Once all conflicts have been resolved, the accumulated Bags will be upserted by a recursive call to the upsert method. There is only ever one level of recursion.
bobvawter
added a commit
that referenced
this issue
Oct 16, 2023
This change is part of #487 to support three-way merges. This change adds, but does not integrate, support for a per-target-schema DLQ. cdc-sink, as a design rule, does not perform any schema changes in the target database. If the user wants to use the DLQ, they will need to create the destination table. The only requirements on the DLQ table are that it contain certain well-known column names with appropriate types. A basic schema is suggested by cdc-sink, and this suggested schema is used by the DLQ tests. The justification for this hand-waving is that the DLQ becomes, in essence, a part of the user's application and will likely need to be part of a schema-management system. We cannot predict how the DLQ entries will be used, indexed, etc. so integration with a minimum number of well-known columns seems like it should give the user maximum flexibility.
bobvawter
added a commit
that referenced
this issue
Oct 16, 2023
This change is part of #487 to support three-way merges. This change adds, but does not integrate, support for a per-target-schema DLQ. cdc-sink, as a design rule, does not perform any schema changes in the target database. If the user wants to use the DLQ, they will need to create the destination table. The only requirements on the DLQ table are that it contain certain well-known column names with appropriate types. A basic schema is suggested by cdc-sink, and this suggested schema is used by the DLQ tests. The justification for this hand-waving is that the DLQ becomes, in essence, a part of the user's application and will likely need to be part of a schema-management system. We cannot predict how the DLQ entries will be used, indexed, etc. so integration with a minimum number of well-known columns seems like it should give the user maximum flexibility.
github-merge-queue bot
pushed a commit
that referenced
this issue
Oct 18, 2023
This change is part of #487 to support three-way merges. This change adds, but does not integrate, support for a per-target-schema DLQ. cdc-sink, as a design rule, does not perform any schema changes in the target database. If the user wants to use the DLQ, they will need to create the destination table. The only requirements on the DLQ table are that it contain certain well-known column names with appropriate types. A basic schema is suggested by cdc-sink, and this suggested schema is used by the DLQ tests. The justification for this hand-waving is that the DLQ becomes, in essence, a part of the user's application and will likely need to be part of a schema-management system. We cannot predict how the DLQ entries will be used, indexed, etc. so integration with a minimum number of well-known columns seems like it should give the user maximum flexibility.
bobvawter
added a commit
that referenced
this issue
Oct 18, 2023
This change is part of #487 to support three-way merges. This change add supports for declaring a user-defind merge function within the userscript. The goja version is updated so that we can create a lightweight wrapper around the ident.Map that will store the reified mutation values.
bobvawter
added a commit
that referenced
this issue
Oct 18, 2023
This change is part of #487 to support three-way merges. This change updates the apply package to call into the merge function when targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to return the index of the conflicting data and the contents of the blocking row. The blocking and conflicting row data are then used to drive the merge function. The merge API proposed in PR #534 is refined. The relevent types are extracted into their own package, which contains a new "Bag" type. A Bag holds reified properties and can represent the data in a mutation or in a database row. It additonally classifies properties as being "mapped" or "unmapped" as to whether or not the property maps onto a known column. Some of the bookkeeping previously in the apply code to track missing or extra properties is simplified by letting the Bag keep track of unexpected input properties. The upsert code also becomes recursive. Mutations are reified into Bags and are applied. If a Bag generates a conflict, the merge function will be called to produce a Bag that will be unconditionally applied. Once all conflicts have been resolved, the accumulated Bags will be upserted by a recursive call to the upsert method. There is only ever one level of recursion.
bobvawter
added a commit
that referenced
this issue
Oct 18, 2023
This change is part of #487 to support three-way merges. This change updates the apply package to call into the merge function when targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to return the index of the conflicting data and the contents of the blocking row. The blocking and conflicting row data are then used to drive the merge function. The merge API proposed in PR #534 is refined. The relevent types are extracted into their own package, which contains a new "Bag" type. A Bag holds reified properties and can represent the data in a mutation or in a database row. It additonally classifies properties as being "mapped" or "unmapped" as to whether or not the property maps onto a known column. Some of the bookkeeping previously in the apply code to track missing or extra properties is simplified by letting the Bag keep track of unexpected input properties. The upsert code also becomes recursive. Mutations are reified into Bags and are applied. If a Bag generates a conflict, the merge function will be called to produce a Bag that will be unconditionally applied. Once all conflicts have been resolved, the accumulated Bags will be upserted by a recursive call to the upsert method. There is only ever one level of recursion.
github-merge-queue bot
pushed a commit
that referenced
this issue
Oct 18, 2023
This change is part of #487 to support three-way merges. This change add supports for declaring a user-defind merge function within the userscript. The goja version is updated so that we can create a lightweight wrapper around the ident.Map that will store the reified mutation values.
github-merge-queue bot
pushed a commit
that referenced
this issue
Oct 18, 2023
This change is part of #487 to support three-way merges. This change updates the apply package to call into the merge function when targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to return the index of the conflicting data and the contents of the blocking row. The blocking and conflicting row data are then used to drive the merge function. The merge API proposed in PR #534 is refined. The relevent types are extracted into their own package, which contains a new "Bag" type. A Bag holds reified properties and can represent the data in a mutation or in a database row. It additonally classifies properties as being "mapped" or "unmapped" as to whether or not the property maps onto a known column. Some of the bookkeeping previously in the apply code to track missing or extra properties is simplified by letting the Bag keep track of unexpected input properties. The upsert code also becomes recursive. Mutations are reified into Bags and are applied. If a Bag generates a conflict, the merge function will be called to produce a Bag that will be unconditionally applied. Once all conflicts have been resolved, the accumulated Bags will be upserted by a recursive call to the upsert method. There is only ever one level of recursion.
bobvawter
added a commit
that referenced
this issue
Oct 18, 2023
This change is part of #487 to support three-way merges. PR #540 and #543 were submitted separately, so the apply code did not support writing to the DLQ. This change completes the wiring and allows conflicts to be written to the queue. The unexported apply.newApply() function was made a method on the factory type, to decrease the number of arguments.
bobvawter
added a commit
that referenced
this issue
Oct 18, 2023
This change is part of #487 to support three-way merges. This change adds merge.Standard and exposes it to the userscript. The merge function identifies the properties which have changed between the before and proposed bags. If the value of the property in the before bag equals the value of the property in the target, the change is applied. Property equivalency is presently defined as "serializes to the same JSON bytes". The golang json serializer is deterministic and we have a rather fluid typesystem, so this seems like a reasonable initial implementation. A fallback merge function can be composed with merge.Standard to handle properties with application-specific semantics. The script test shows the composition of the standard merge with a counter-like field that is only ever incremented. Properties which cannot be automatically merged are indicated by a new Conflict.Unmerged field and corresponding userscript binding. This fallback function can also be used to "merge or else" by using a trivial fallback that always returns the name of a dlq. This, too, is demonstrated in the test script. The merge.Conflict.Existing field is renamed to Target. It either contains the existing state of the row in the target database, or it contains the data that merge.Standard determines should be stored in the target. The change in sort order improves readability in the tests: Before, Proposed, Target -> Expected.
github-merge-queue bot
pushed a commit
that referenced
this issue
Oct 19, 2023
This change is part of #487 to support three-way merges. PR #540 and #543 were submitted separately, so the apply code did not support writing to the DLQ. This change completes the wiring and allows conflicts to be written to the queue. The unexported apply.newApply() function was made a method on the factory type, to decrease the number of arguments.
bobvawter
added a commit
that referenced
this issue
Oct 20, 2023
This change is part of #487 to support three-way merges. This change adds merge.Standard and exposes it to the userscript. The merge function identifies the properties which have changed between the before and proposed bags. If the value of the property in the before bag equals the value of the property in the target, the change is applied. Property equivalency is presently defined as "serializes to the same JSON bytes". The golang json serializer is deterministic and we have a rather fluid typesystem, so this seems like a reasonable initial implementation. A fallback merge function can be composed with merge.Standard to handle properties with application-specific semantics. The script test shows the composition of the standard merge with a counter-like field that is only ever incremented. Properties which cannot be automatically merged are indicated by a new Conflict.Unmerged field and corresponding userscript binding. This fallback function can also be used to "merge or else" by using a trivial fallback that always returns the name of a dlq. This, too, is demonstrated in the test script. The merge.Conflict.Existing field is renamed to Target. It either contains the existing state of the row in the target database, or it contains the data that merge.Standard determines should be stored in the target. The change in sort order improves readability in the tests: Before, Proposed, Target -> Expected.
bobvawter
added a commit
that referenced
this issue
Oct 21, 2023
This change is part of #487 to support three-way merges. This change adds merge.Standard and exposes it to the userscript. The merge function identifies the properties which have changed between the before and proposed bags. If the value of the property in the before bag equals the value of the property in the target, the change is applied. Property equivalency is presently defined as "serializes to the same JSON bytes". The golang json serializer is deterministic and we have a rather fluid typesystem, so this seems like a reasonable initial implementation. A fallback merge function can be composed with merge.Standard to handle properties with application-specific semantics. The script test shows the composition of the standard merge with a counter-like field that is only ever incremented. Properties which cannot be automatically merged are indicated by a new Conflict.Unmerged field and corresponding userscript binding. This fallback function can also be used to "merge or else" by using a trivial fallback that always returns the name of a dlq. This, too, is demonstrated in the test script. The merge.Conflict.Existing field is renamed to Target. It either contains the existing state of the row in the target database, or it contains the data that merge.Standard determines should be stored in the target. The change in sort order improves readability in the tests: Before, Proposed, Target -> Expected.
bobvawter
added a commit
that referenced
this issue
Oct 21, 2023
This change is part of #487 to support three-way merges. This change adds merge.Standard and exposes it to the userscript. The merge function identifies the properties which have changed between the before and proposed bags. If the value of the property in the before bag equals the value of the property in the target, the change is applied. Property equivalency is presently defined as "serializes to the same JSON bytes". The golang json serializer is deterministic and we have a rather fluid typesystem, so this seems like a reasonable initial implementation. A fallback merge function can be composed with merge.Standard to handle properties with application-specific semantics. The script test shows the composition of the standard merge with a counter-like field that is only ever incremented. Properties which cannot be automatically merged are indicated by a new Conflict.Unmerged field and corresponding userscript binding. This fallback function can also be used to "merge or else" by using a trivial fallback that always returns the name of a dlq. This, too, is demonstrated in the test script. The merge.Conflict.Existing field is renamed to Target. It either contains the existing state of the row in the target database, or it contains the data that merge.Standard determines should be stored in the target. The change in sort order improves readability in the tests: Before, Proposed, Target -> Expected.
bobvawter
added a commit
that referenced
this issue
Oct 27, 2023
This change is part of #487 to support three-way merges. This change adds merge.Standard and exposes it to the userscript. The merge function identifies the properties which have changed between the before and proposed bags. If the value of the property in the before bag equals the value of the property in the target, the change is applied. Property equivalency is presently defined as "serializes to the same JSON bytes". The golang json serializer is deterministic and we have a rather fluid typesystem, so this seems like a reasonable initial implementation. A fallback merge function can be composed with merge.Standard to handle properties with application-specific semantics. The script test shows the composition of the standard merge with a counter-like field that is only ever incremented. Properties which cannot be automatically merged are indicated by a new Conflict.Unmerged field and corresponding userscript binding. This fallback function can also be used to "merge or else" by using a trivial fallback that always returns the name of a dlq. This, too, is demonstrated in the test script. The merge.Conflict.Existing field is renamed to Target. It either contains the existing state of the row in the target database, or it contains the data that merge.Standard determines should be stored in the target. The change in sort order improves readability in the tests: Before, Proposed, Target -> Expected.
github-merge-queue bot
pushed a commit
that referenced
this issue
Oct 27, 2023
This change is part of #487 to support three-way merges. This change adds merge.Standard and exposes it to the userscript. The merge function identifies the properties which have changed between the before and proposed bags. If the value of the property in the before bag equals the value of the property in the target, the change is applied. Property equivalency is presently defined as "serializes to the same JSON bytes". The golang json serializer is deterministic and we have a rather fluid typesystem, so this seems like a reasonable initial implementation. A fallback merge function can be composed with merge.Standard to handle properties with application-specific semantics. The script test shows the composition of the standard merge with a counter-like field that is only ever incremented. Properties which cannot be automatically merged are indicated by a new Conflict.Unmerged field and corresponding userscript binding. This fallback function can also be used to "merge or else" by using a trivial fallback that always returns the name of a dlq. This, too, is demonstrated in the test script. The merge.Conflict.Existing field is renamed to Target. It either contains the existing state of the row in the target database, or it contains the data that merge.Standard determines should be stored in the target. The change in sort order improves readability in the tests: Before, Proposed, Target -> Expected.
bobvawter
added a commit
that referenced
this issue
Oct 31, 2023
This change moves tracking of partially-applied resolved timestamp windows into the staging tables by adding a new `applied` column. The goal of this change is to move some state-tracking out of the cdc resolver loop into the stage package. Tracking apply status on a per-mutation basis improves idempotency of cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way merge). It also allows us to export monitoring data around mutations which may have slipped through the cracks or to detect when a migration process has completely drained. Fine-grained tracking will also be useful for unifying the non-transactional modes into a single behavior. Many unused methods in the stage API have been deleted. The "unstaging" SQL query is now generated with a golang template and is tested similarly to the apply package. The cdc package performs less work to track partial application of large individual changes. It just persists the contents of the UnstageCursor as a performance enhancement. Exactly-once behavior is provided by the applied column. The change to `server/integration_test.go` is due to the unstage processing being a one-shot. The test being performed duplicates an existing test in `cdc/handler_test.go`. Breaking change: The `--selectBatchSize` flag is deprecated in favor of two different flags `--largeTransactionLimit` and `--timestampWindowSize` which, respectively, enable partial processing of a single, over-sized transaction and a general limit on the total amount of data to be unstaged. Breaking change: A staging schema migraion is required, this is documented in the migrations directory. X-Ref: #487 X-Ref: #504 X-Ref: #565
bobvawter
added a commit
that referenced
this issue
Oct 31, 2023
This change moves tracking of partially-applied resolved timestamp windows into the staging tables by adding a new `applied` column. The goal of this change is to move some state-tracking out of the cdc resolver loop into the stage package. Tracking apply status on a per-mutation basis improves idempotency of cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way merge). It also allows us to export monitoring data around mutations which may have slipped through the cracks or to detect when a migration process has completely drained. Fine-grained tracking will also be useful for unifying the non-transactional modes into a single behavior. Many unused methods in the stage API have been deleted. The "unstaging" SQL query is now generated with a golang template and is tested similarly to the apply package. The cdc package performs less work to track partial application of large individual changes. It just persists the contents of the UnstageCursor as a performance enhancement. Exactly-once behavior is provided by the applied column. The change to `server/integration_test.go` is due to the unstage processing being a one-shot. The test being performed duplicates an existing test in `cdc/handler_test.go`. Breaking change: The `--selectBatchSize` flag is deprecated in favor of two different flags `--largeTransactionLimit` and `--timestampWindowSize` which, respectively, enable partial processing of a single, over-sized transaction and a general limit on the total amount of data to be unstaged. Breaking change: A staging schema migraion is required, this is documented in the migrations directory. X-Ref: #487 X-Ref: #504 X-Ref: #565
bobvawter
added a commit
that referenced
this issue
Oct 31, 2023
This change moves tracking of partially-applied resolved timestamp windows into the staging tables by adding a new `applied` column. The goal of this change is to move some state-tracking out of the cdc resolver loop into the stage package. Tracking apply status on a per-mutation basis improves idempotency of cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way merge). It also allows us to export monitoring data around mutations which may have slipped through the cracks or to detect when a migration process has completely drained. Fine-grained tracking will also be useful for unifying the non-transactional modes into a single behavior. Many unused methods in the stage API have been deleted. The "unstaging" SQL query is now generated with a golang template and is tested similarly to the apply package. The cdc package performs less work to track partial application of large individual changes. It just persists the contents of the UnstageCursor as a performance enhancement. Exactly-once behavior is provided by the applied column. The change to `server/integration_test.go` is due to the unstage processing being a one-shot. The test being performed duplicates an existing test in `cdc/handler_test.go`. Breaking change: The `--selectBatchSize` flag is deprecated in favor of two different flags `--largeTransactionLimit` and `--timestampWindowSize` which, respectively, enable partial processing of a single, over-sized transaction and a general limit on the total amount of data to be unstaged. Breaking change: A staging schema migraion is required, this is documented in the migrations directory. X-Ref: #487 X-Ref: #504 X-Ref: #565
bobvawter
added a commit
that referenced
this issue
Oct 31, 2023
This change moves tracking of partially-applied resolved timestamp windows into the staging tables by adding a new `applied` column. The goal of this change is to move some state-tracking out of the cdc resolver loop into the stage package. Tracking apply status on a per-mutation basis improves idempotency of cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way merge). It also allows us to export monitoring data around mutations which may have slipped through the cracks or to detect when a migration process has completely drained. Fine-grained tracking will also be useful for unifying the non-transactional modes into a single behavior. Many unused methods in the stage API have been deleted. The "unstaging" SQL query is now generated with a golang template and is tested similarly to the apply package. The cdc package performs less work to track partial application of large individual changes. It just persists the contents of the UnstageCursor as a performance enhancement. Exactly-once behavior is provided by the applied column. The change to `server/integration_test.go` is due to the unstage processing being a one-shot. The test being performed duplicates an existing test in `cdc/handler_test.go`. Breaking change: The `--selectBatchSize` flag is deprecated in favor of two different flags `--largeTransactionLimit` and `--timestampWindowSize` which, respectively, enable partial processing of a single, over-sized transaction and a general limit on the total amount of data to be unstaged. Breaking change: A staging schema migraion is required, this is documented in the migrations directory. X-Ref: #487 X-Ref: #504 X-Ref: #565
bobvawter
added a commit
that referenced
this issue
Nov 1, 2023
This change moves tracking of partially-applied resolved timestamp windows into the staging tables by adding a new `applied` column. The goal of this change is to move some state-tracking out of the cdc resolver loop into the stage package. Tracking apply status on a per-mutation basis improves idempotency of cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way merge). It also allows us to export monitoring data around mutations which may have slipped through the cracks or to detect when a migration process has completely drained. Fine-grained tracking will also be useful for unifying the non-transactional modes into a single behavior. Many unused methods in the stage API have been deleted. The "unstaging" SQL query is now generated with a golang template and is tested similarly to the apply package. The cdc package performs less work to track partial application of large individual changes. It just persists the contents of the UnstageCursor as a performance enhancement. Exactly-once behavior is provided by the applied column. The change to `server/integration_test.go` is due to the unstage processing being a one-shot. The test being performed duplicates an existing test in `cdc/handler_test.go`. Breaking change: The `--selectBatchSize` flag is deprecated in favor of two different flags `--largeTransactionLimit` and `--timestampWindowSize` which, respectively, enable partial processing of a single, over-sized transaction and a general limit on the total amount of data to be unstaged. Breaking change: A staging schema migraion is required, this is documented in the migrations directory. X-Ref: #487 X-Ref: #504 X-Ref: #565
bobvawter
added a commit
that referenced
this issue
Nov 1, 2023
This change moves tracking of partially-applied resolved timestamp windows into the staging tables by adding a new `applied` column. The goal of this change is to move some state-tracking out of the cdc resolver loop into the stage package. Tracking apply status on a per-mutation basis improves idempotency of cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way merge). It also allows us to export monitoring data around mutations which may have slipped through the cracks or to detect when a migration process has completely drained. Fine-grained tracking will also be useful for unifying the non-transactional modes into a single behavior. Many unused methods in the stage API have been deleted. The "unstaging" SQL query is now generated with a golang template and is tested similarly to the apply package. The cdc package performs less work to track partial application of large individual changes. It just persists the contents of the UnstageCursor as a performance enhancement. Exactly-once behavior is provided by the applied column. The change to `server/integration_test.go` is due to the unstage processing being a one-shot. The test being performed duplicates an existing test in `cdc/handler_test.go`. Breaking change: The `--selectBatchSize` flag is deprecated in favor of two different flags `--largeTransactionLimit` and `--timestampWindowSize` which, respectively, enable partial processing of a single, over-sized transaction and a general limit on the total amount of data to be unstaged. Breaking change: A staging schema migraion is required, this is documented in the migrations directory. X-Ref: #487 X-Ref: #504 X-Ref: #565
bobvawter
added a commit
that referenced
this issue
Nov 1, 2023
This change moves tracking of partially-applied resolved timestamp windows into the staging tables by adding a new `applied` column. The goal of this change is to move some state-tracking out of the cdc resolver loop into the stage package. Tracking apply status on a per-mutation basis improves idempotency of cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way merge). It also allows us to export monitoring data around mutations which may have slipped through the cracks or to detect when a migration process has completely drained. Fine-grained tracking will also be useful for unifying the non-transactional modes into a single behavior. Many unused methods in the stage API have been deleted. The "unstaging" SQL query is now generated with a golang template and is tested similarly to the apply package. The cdc package performs less work to track partial application of large individual changes. It just persists the contents of the UnstageCursor as a performance enhancement. Exactly-once behavior is provided by the applied column. The change to `server/integration_test.go` is due to the unstage processing being a one-shot. The test being performed duplicates an existing test in `cdc/handler_test.go`. Breaking change: The `--selectBatchSize` flag is deprecated in favor of two different flags `--largeTransactionLimit` and `--timestampWindowSize` which, respectively, enable partial processing of a single, over-sized transaction and a general limit on the total amount of data to be unstaged. Breaking change: A staging schema migraion is required, this is documented in the migrations directory. X-Ref: #487 X-Ref: #504 X-Ref: #565
bobvawter
added a commit
that referenced
this issue
Nov 2, 2023
This change moves tracking of partially-applied resolved timestamp windows into the staging tables by adding a new `applied` column. The goal of this change is to move some state-tracking out of the cdc resolver loop into the stage package. Tracking apply status on a per-mutation basis improves idempotency of cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way merge). It also allows us to export monitoring data around mutations which may have slipped through the cracks or to detect when a migration process has completely drained. Fine-grained tracking will also be useful for unifying the non-transactional modes into a single behavior. Many unused methods in the stage API have been deleted. The "unstaging" SQL query is now generated with a golang template and is tested similarly to the apply package. The cdc package performs less work to track partial application of large individual changes. It just persists the contents of the UnstageCursor as a performance enhancement. Exactly-once behavior is provided by the applied column. The change to `server/integration_test.go` is due to the unstage processing being a one-shot. The test being performed duplicates an existing test in `cdc/handler_test.go`. Breaking change: The `--selectBatchSize` flag is deprecated in favor of two different flags `--largeTransactionLimit` and `--timestampWindowSize` which, respectively, enable partial processing of a single, over-sized transaction and a general limit on the total amount of data to be unstaged. Breaking change: A staging schema migraion is required, this is documented in the migrations directory. X-Ref: #487 X-Ref: #504 X-Ref: #565
github-merge-queue bot
pushed a commit
that referenced
this issue
Nov 2, 2023
This change moves tracking of partially-applied resolved timestamp windows into the staging tables by adding a new `applied` column. The goal of this change is to move some state-tracking out of the cdc resolver loop into the stage package. Tracking apply status on a per-mutation basis improves idempotency of cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way merge). It also allows us to export monitoring data around mutations which may have slipped through the cracks or to detect when a migration process has completely drained. Fine-grained tracking will also be useful for unifying the non-transactional modes into a single behavior. Many unused methods in the stage API have been deleted. The "unstaging" SQL query is now generated with a golang template and is tested similarly to the apply package. The cdc package performs less work to track partial application of large individual changes. It just persists the contents of the UnstageCursor as a performance enhancement. Exactly-once behavior is provided by the applied column. The change to `server/integration_test.go` is due to the unstage processing being a one-shot. The test being performed duplicates an existing test in `cdc/handler_test.go`. Breaking change: The `--selectBatchSize` flag is deprecated in favor of two different flags `--largeTransactionLimit` and `--timestampWindowSize` which, respectively, enable partial processing of a single, over-sized transaction and a general limit on the total amount of data to be unstaged. Breaking change: A staging schema migraion is required, this is documented in the migrations directory. X-Ref: #487 X-Ref: #504 X-Ref: #565
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is the top-level tracking issue for work related to implementing a three-way merge operation.
Plan:
before
values in mutationsmerge
userscript callbackThe text was updated successfully, but these errors were encountered: