Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Three-way merge #487

Open
16 of 22 tasks
bobvawter opened this issue Sep 18, 2023 · 0 comments
Open
16 of 22 tasks

Three-way merge #487

bobvawter opened this issue Sep 18, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@bobvawter
Copy link
Member

bobvawter commented Sep 18, 2023

This is the top-level tracking issue for work related to implementing a three-way merge operation.

Plan:

@bobvawter bobvawter added the enhancement New feature or request label Sep 18, 2023
@bobvawter bobvawter self-assigned this Sep 18, 2023
bobvawter added a commit that referenced this issue Oct 6, 2023
This change is part of #487 to support three-way merges. It adds a before field
to types.Mutation and allows this data to be persisted.
bobvawter added a commit that referenced this issue Oct 6, 2023
This change is part of #487 to support three-way merges. It adds a before field
to types.Mutation and allows this data to be persisted. Code related to
gzipping the mutation data has been extracted to helpers to compress both the
before and data fields.
bobvawter added a commit that referenced this issue Oct 6, 2023
This change is part of #487 to support three-way merges. It adds a before field
to types.Mutation and allows this data to be persisted. Code related to
gzipping the mutation data has been extracted to helpers to compress both the
before and data fields.
bobvawter added a commit that referenced this issue Oct 6, 2023
This change is part of #487 to support three-way merges.

This change allows the CDC handler to extract the `before` attribute from
incoming requests to populate the `Mutation.Before` field.

The `server/integration_test.go` is updated to configure a RetireOffset and
to inspect the staged mutation to ensure that the before value was recorded.
Once merging is actually implemented, we can return to this test to verify
end-to-end behavior.
bobvawter added a commit that referenced this issue Oct 6, 2023
This change is part of #487 to support three-way merges.

This change allows the CDC handler to extract the `before` attribute from
incoming requests to populate the `Mutation.Before` field. We use the new
`RetireOffset` setting to allow the staged data to be inspected by
`handler_test.go`.

The `server/integration_test.go` is similarly updated to inspect the staged
mutation to ensure that the before value was recorded.  Once merging is
actually implemented, we can return to this test to verify end-to-end behavior.
bobvawter added a commit that referenced this issue Oct 6, 2023
This change is part of #487 to support three-way merges.

This change allows the CDC handler to extract the `before` attribute from
incoming requests to populate the `Mutation.Before` field. We use the new
`RetireOffset` setting to allow the staged data to be inspected by
`handler_test.go`.

The `server/integration_test.go` is similarly updated to inspect the staged
mutation to ensure that the before value was recorded.  Once merging is
actually implemented, we can return to this test to verify end-to-end behavior.
github-merge-queue bot pushed a commit that referenced this issue Oct 6, 2023
This change is part of #487 to support three-way merges. It adds a before field
to types.Mutation and allows this data to be persisted. Code related to
gzipping the mutation data has been extracted to helpers to compress both the
before and data fields.
bobvawter added a commit that referenced this issue Oct 6, 2023
This change is part of #487 to support three-way merges.

This change allows the CDC handler to extract the `before` attribute from
incoming requests to populate the `Mutation.Before` field. We use the new
`RetireOffset` setting to allow the staged data to be inspected by
`handler_test.go`.

The `server/integration_test.go` is similarly updated to inspect the staged
mutation to ensure that the before value was recorded.  Once merging is
actually implemented, we can return to this test to verify end-to-end behavior.
bobvawter added a commit that referenced this issue Oct 9, 2023
This change is part of #487 to support three-way merges.

This change allows the CDC handler to extract the `before` attribute from
incoming requests to populate the `Mutation.Before` field. We use the new
`RetireOffset` setting to allow the staged data to be inspected by
`handler_test.go`.

The `server/integration_test.go` is similarly updated to inspect the staged
mutation to ensure that the before value was recorded.  Once merging is
actually implemented, we can return to this test to verify end-to-end behavior.
bobvawter added a commit that referenced this issue Oct 9, 2023
This change is part of #487 to support three-way merges.

This change allows the CDC handler to extract the `before` attribute from
incoming requests to populate the `Mutation.Before` field. We use the new
`RetireOffset` setting to allow the staged data to be inspected by
`handler_test.go`.

The `server/integration_test.go` is similarly updated to inspect the staged
mutation to ensure that the before value was recorded.  Once merging is
actually implemented, we can return to this test to verify end-to-end behavior.
bobvawter added a commit that referenced this issue Oct 9, 2023
This change is part of #487 to support three-way merges.

This change allows the CDC handler to extract the `before` or `cdc_prev`
attributes from incoming requests to populate the `Mutation.Before` field. That
is, if a tabular changefeed is created with the `diff` option, or a query
changefeed includes the `cdc_prev` column, the before data will be persisted.

The CDC queries endpoints had not been previously tested in
`server/integration_test.go`, so there's a little bit of churn to cover some
unhandled edge cases around the presence or absense of the `diff` option.

This change uses the new-ish `RetireOffset` configuration to allow the staged
records to be inspected even though there may be concurrent resolved timestamps
being processed.

Uninteresting uses of json.NewDecoder() have been replaced with
json.Unmarshal(). Some error messages have been edited for clarity.
bobvawter added a commit that referenced this issue Oct 9, 2023
This change is part of #487 to support three-way merges.

This change removes support for storing apply configurations in the staging
database. The userscript is now integrated into all cdc-sink modes and provides
superior ergonomics. The upcoming merge function would have to be configured
through the userscript and does not make sense to persist. Futhermore, having
two distinct ways of accomplishing a task is confusing.

Breaking Change: Any deployments using table-based configuration of data
application behaviors must instead switch to the userscript for configuration.

The wiki pages have been scrubbed for any references to the table-based approach:
X-Ref: https://github.com/cockroachdb/cdc-sink/wiki/Data-Behaviors
X-Ref: https://github.com/cockroachdb/cdc-sink/wiki/User-Scripts
bobvawter added a commit that referenced this issue Oct 9, 2023
This change is part of #487 to support three-way merges.

This change allows the CDC handler to extract the `before` or `cdc_prev`
attributes from incoming requests to populate the `Mutation.Before` field. That
is, if a tabular changefeed is created with the `diff` option, or a query
changefeed includes the `cdc_prev` column, the before data will be persisted.

The CDC queries endpoints had not been previously tested in
`server/integration_test.go`, so there's a little bit of churn to cover some
unhandled edge cases around the presence or absense of the `diff` option.

This change uses the new-ish `RetireOffset` configuration to allow the staged
records to be inspected even though there may be concurrent resolved timestamps
being processed.

Uninteresting uses of json.NewDecoder() have been replaced with
json.Unmarshal(). Some error messages have been edited for clarity.
github-merge-queue bot pushed a commit that referenced this issue Oct 9, 2023
This change is part of #487 to support three-way merges.

This change allows the CDC handler to extract the `before` or `cdc_prev`
attributes from incoming requests to populate the `Mutation.Before` field. That
is, if a tabular changefeed is created with the `diff` option, or a query
changefeed includes the `cdc_prev` column, the before data will be persisted.

The CDC queries endpoints had not been previously tested in
`server/integration_test.go`, so there's a little bit of churn to cover some
unhandled edge cases around the presence or absense of the `diff` option.

This change uses the new-ish `RetireOffset` configuration to allow the staged
records to be inspected even though there may be concurrent resolved timestamps
being processed.

Uninteresting uses of json.NewDecoder() have been replaced with
json.Unmarshal(). Some error messages have been edited for clarity.
github-merge-queue bot pushed a commit that referenced this issue Oct 9, 2023
This change is part of #487 to support three-way merges.

This change removes support for storing apply configurations in the staging
database. The userscript is now integrated into all cdc-sink modes and provides
superior ergonomics. The upcoming merge function would have to be configured
through the userscript and does not make sense to persist. Futhermore, having
two distinct ways of accomplishing a task is confusing.

Breaking Change: Any deployments using table-based configuration of data
application behaviors must instead switch to the userscript for configuration.

The wiki pages have been scrubbed for any references to the table-based approach:
X-Ref: https://github.com/cockroachdb/cdc-sink/wiki/Data-Behaviors
X-Ref: https://github.com/cockroachdb/cdc-sink/wiki/User-Scripts
bobvawter added a commit that referenced this issue Oct 11, 2023
This change is part of #487 to support three-way merges.

This change add supports for declaring a user-defind merge function within the
userscript. The goja version is updated so that we can create a lightweight
wrapper around the ident.Map that will store the reified mutation values.
bobvawter added a commit that referenced this issue Oct 11, 2023
This change is part of #487 to support three-way merges.

This change add supports for declaring a user-defind merge function within the
userscript. The goja version is updated so that we can create a lightweight
wrapper around the ident.Map that will store the reified mutation values.
bobvawter added a commit that referenced this issue Oct 14, 2023
This change is part of #487 to support three-way merges.

This change updates the apply package to call into the merge function when
targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to
return the index of the conflicting data and the contents of the blocking row.
The blocking and conflicting row data are then used to drive the merge function.

The merge API added in PR #534 is refined. The relevent types are extracted
into their own package, which contains a new "Bag" type. A Bag holds reified
properties and can represent the data in a mutation or in a database row. It
additonally classifies properties as being "mapped" or "unmapped" as to whether
or not the property maps onto a known column. Some of the bookkeeping previously
in the apply code to track missing or extra properties is simplified.

The upsert code also becomes recursive. Mutations are reified into Bags and are
applied. If a Bag generates a conflict, the merge function will be called to
produce a Bag that will be unconditionally applied. Once all conflicts have been
resolved, the accumulated Bags will be upserted by a recursive call to the
upsert method. There is only ever one level of recursion.
bobvawter added a commit that referenced this issue Oct 14, 2023
This change is part of #487 to support three-way merges.

This change updates the apply package to call into the merge function when
targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to
return the index of the conflicting data and the contents of the blocking row.
The blocking and conflicting row data are then used to drive the merge function.

The merge API added in PR #534 is refined. The relevent types are extracted
into their own package, which contains a new "Bag" type. A Bag holds reified
properties and can represent the data in a mutation or in a database row. It
additonally classifies properties as being "mapped" or "unmapped" as to whether
or not the property maps onto a known column. Some of the bookkeeping previously
in the apply code to track missing or extra properties is simplified.

The upsert code also becomes recursive. Mutations are reified into Bags and are
applied. If a Bag generates a conflict, the merge function will be called to
produce a Bag that will be unconditionally applied. Once all conflicts have been
resolved, the accumulated Bags will be upserted by a recursive call to the
upsert method. There is only ever one level of recursion.
bobvawter added a commit that referenced this issue Oct 14, 2023
This change is part of #487 to support three-way merges.

This change updates the apply package to call into the merge function when
targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to
return the index of the conflicting data and the contents of the blocking row.
The blocking and conflicting row data are then used to drive the merge function.

The merge API added in PR #534 is refined. The relevent types are extracted
into their own package, which contains a new "Bag" type. A Bag holds reified
properties and can represent the data in a mutation or in a database row. It
additonally classifies properties as being "mapped" or "unmapped" as to whether
or not the property maps onto a known column. Some of the bookkeeping previously
in the apply code to track missing or extra properties is simplified.

The upsert code also becomes recursive. Mutations are reified into Bags and are
applied. If a Bag generates a conflict, the merge function will be called to
produce a Bag that will be unconditionally applied. Once all conflicts have been
resolved, the accumulated Bags will be upserted by a recursive call to the
upsert method. There is only ever one level of recursion.
bobvawter added a commit that referenced this issue Oct 14, 2023
This change is part of #487 to support three-way merges.

This change updates the apply package to call into the merge function when
targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to
return the index of the conflicting data and the contents of the blocking row.
The blocking and conflicting row data are then used to drive the merge function.

The merge API added in PR #534 is refined. The relevent types are extracted
into their own package, which contains a new "Bag" type. A Bag holds reified
properties and can represent the data in a mutation or in a database row. It
additonally classifies properties as being "mapped" or "unmapped" as to whether
or not the property maps onto a known column. Some of the bookkeeping previously
in the apply code to track missing or extra properties is simplified.

The upsert code also becomes recursive. Mutations are reified into Bags and are
applied. If a Bag generates a conflict, the merge function will be called to
produce a Bag that will be unconditionally applied. Once all conflicts have been
resolved, the accumulated Bags will be upserted by a recursive call to the
upsert method. There is only ever one level of recursion.
bobvawter added a commit that referenced this issue Oct 14, 2023
This change is part of #487 to support three-way merges.

This change add supports for declaring a user-defind merge function within the
userscript. The goja version is updated so that we can create a lightweight
wrapper around the ident.Map that will store the reified mutation values.
bobvawter added a commit that referenced this issue Oct 14, 2023
This change is part of #487 to support three-way merges.

This change updates the apply package to call into the merge function when
targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to
return the index of the conflicting data and the contents of the blocking row.
The blocking and conflicting row data are then used to drive the merge function.

The merge API added in PR #534 is refined. The relevent types are extracted
into their own package, which contains a new "Bag" type. A Bag holds reified
properties and can represent the data in a mutation or in a database row. It
additonally classifies properties as being "mapped" or "unmapped" as to whether
or not the property maps onto a known column. Some of the bookkeeping previously
in the apply code to track missing or extra properties is simplified.

The upsert code also becomes recursive. Mutations are reified into Bags and are
applied. If a Bag generates a conflict, the merge function will be called to
produce a Bag that will be unconditionally applied. Once all conflicts have been
resolved, the accumulated Bags will be upserted by a recursive call to the
upsert method. There is only ever one level of recursion.
bobvawter added a commit that referenced this issue Oct 14, 2023
This change is part of #487 to support three-way merges.

This change updates the apply package to call into the merge function when
targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to
return the index of the conflicting data and the contents of the blocking row.
The blocking and conflicting row data are then used to drive the merge function.

The merge API added in PR #534 is refined. The relevent types are extracted
into their own package, which contains a new "Bag" type. A Bag holds reified
properties and can represent the data in a mutation or in a database row. It
additonally classifies properties as being "mapped" or "unmapped" as to whether
or not the property maps onto a known column. Some of the bookkeeping previously
in the apply code to track missing or extra properties is simplified by letting
the Bag keep track of unexpected input properties.

The upsert code also becomes recursive. Mutations are reified into Bags and are
applied. If a Bag generates a conflict, the merge function will be called to
produce a Bag that will be unconditionally applied. Once all conflicts have been
resolved, the accumulated Bags will be upserted by a recursive call to the
upsert method. There is only ever one level of recursion.
bobvawter added a commit that referenced this issue Oct 16, 2023
This change is part of #487 to support three-way merges.

This change updates the apply package to call into the merge function when
targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to
return the index of the conflicting data and the contents of the blocking row.
The blocking and conflicting row data are then used to drive the merge function.

The merge API proposed in PR #534 is refined. The relevent types are extracted
into their own package, which contains a new "Bag" type. A Bag holds reified
properties and can represent the data in a mutation or in a database row. It
additonally classifies properties as being "mapped" or "unmapped" as to whether
or not the property maps onto a known column. Some of the bookkeeping previously
in the apply code to track missing or extra properties is simplified by letting
the Bag keep track of unexpected input properties.

The upsert code also becomes recursive. Mutations are reified into Bags and are
applied. If a Bag generates a conflict, the merge function will be called to
produce a Bag that will be unconditionally applied. Once all conflicts have been
resolved, the accumulated Bags will be upserted by a recursive call to the
upsert method. There is only ever one level of recursion.
bobvawter added a commit that referenced this issue Oct 16, 2023
This change is part of #487 to support three-way merges.

This change adds, but does not integrate, support for a per-target-schema DLQ.
cdc-sink, as a design rule, does not perform any schema changes in the target
database. If the user wants to use the DLQ, they will need to create the
destination table. The only requirements on the DLQ table are that it contain
certain well-known column names with appropriate types. A basic schema is
suggested by cdc-sink, and this suggested schema is used by the DLQ tests.

The justification for this hand-waving is that the DLQ becomes, in essence, a
part of the user's application and will likely need to be part of a
schema-management system. We cannot predict how the DLQ entries will be used,
indexed, etc. so integration with a minimum number of well-known columns seems
like it should give the user maximum flexibility.
bobvawter added a commit that referenced this issue Oct 16, 2023
This change is part of #487 to support three-way merges.

This change adds, but does not integrate, support for a per-target-schema DLQ.
cdc-sink, as a design rule, does not perform any schema changes in the target
database. If the user wants to use the DLQ, they will need to create the
destination table. The only requirements on the DLQ table are that it contain
certain well-known column names with appropriate types. A basic schema is
suggested by cdc-sink, and this suggested schema is used by the DLQ tests.

The justification for this hand-waving is that the DLQ becomes, in essence, a
part of the user's application and will likely need to be part of a
schema-management system. We cannot predict how the DLQ entries will be used,
indexed, etc. so integration with a minimum number of well-known columns seems
like it should give the user maximum flexibility.
github-merge-queue bot pushed a commit that referenced this issue Oct 18, 2023
This change is part of #487 to support three-way merges.

This change adds, but does not integrate, support for a per-target-schema DLQ.
cdc-sink, as a design rule, does not perform any schema changes in the target
database. If the user wants to use the DLQ, they will need to create the
destination table. The only requirements on the DLQ table are that it contain
certain well-known column names with appropriate types. A basic schema is
suggested by cdc-sink, and this suggested schema is used by the DLQ tests.

The justification for this hand-waving is that the DLQ becomes, in essence, a
part of the user's application and will likely need to be part of a
schema-management system. We cannot predict how the DLQ entries will be used,
indexed, etc. so integration with a minimum number of well-known columns seems
like it should give the user maximum flexibility.
bobvawter added a commit that referenced this issue Oct 18, 2023
This change is part of #487 to support three-way merges.

This change add supports for declaring a user-defind merge function within the
userscript. The goja version is updated so that we can create a lightweight
wrapper around the ident.Map that will store the reified mutation values.
bobvawter added a commit that referenced this issue Oct 18, 2023
This change is part of #487 to support three-way merges.

This change updates the apply package to call into the merge function when
targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to
return the index of the conflicting data and the contents of the blocking row.
The blocking and conflicting row data are then used to drive the merge function.

The merge API proposed in PR #534 is refined. The relevent types are extracted
into their own package, which contains a new "Bag" type. A Bag holds reified
properties and can represent the data in a mutation or in a database row. It
additonally classifies properties as being "mapped" or "unmapped" as to whether
or not the property maps onto a known column. Some of the bookkeeping previously
in the apply code to track missing or extra properties is simplified by letting
the Bag keep track of unexpected input properties.

The upsert code also becomes recursive. Mutations are reified into Bags and are
applied. If a Bag generates a conflict, the merge function will be called to
produce a Bag that will be unconditionally applied. Once all conflicts have been
resolved, the accumulated Bags will be upserted by a recursive call to the
upsert method. There is only ever one level of recursion.
bobvawter added a commit that referenced this issue Oct 18, 2023
This change is part of #487 to support three-way merges.

This change updates the apply package to call into the merge function when
targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to
return the index of the conflicting data and the contents of the blocking row.
The blocking and conflicting row data are then used to drive the merge function.

The merge API proposed in PR #534 is refined. The relevent types are extracted
into their own package, which contains a new "Bag" type. A Bag holds reified
properties and can represent the data in a mutation or in a database row. It
additonally classifies properties as being "mapped" or "unmapped" as to whether
or not the property maps onto a known column. Some of the bookkeeping previously
in the apply code to track missing or extra properties is simplified by letting
the Bag keep track of unexpected input properties.

The upsert code also becomes recursive. Mutations are reified into Bags and are
applied. If a Bag generates a conflict, the merge function will be called to
produce a Bag that will be unconditionally applied. Once all conflicts have been
resolved, the accumulated Bags will be upserted by a recursive call to the
upsert method. There is only ever one level of recursion.
github-merge-queue bot pushed a commit that referenced this issue Oct 18, 2023
This change is part of #487 to support three-way merges.

This change add supports for declaring a user-defind merge function within the
userscript. The goja version is updated so that we can create a lightweight
wrapper around the ident.Map that will store the reified mutation values.
github-merge-queue bot pushed a commit that referenced this issue Oct 18, 2023
This change is part of #487 to support three-way merges.

This change updates the apply package to call into the merge function when
targeting CockroachDB or PostgreSQL. The conditional-upsert SQL is extended to
return the index of the conflicting data and the contents of the blocking row.
The blocking and conflicting row data are then used to drive the merge function.

The merge API proposed in PR #534 is refined. The relevent types are extracted
into their own package, which contains a new "Bag" type. A Bag holds reified
properties and can represent the data in a mutation or in a database row. It
additonally classifies properties as being "mapped" or "unmapped" as to whether
or not the property maps onto a known column. Some of the bookkeeping previously
in the apply code to track missing or extra properties is simplified by letting
the Bag keep track of unexpected input properties.

The upsert code also becomes recursive. Mutations are reified into Bags and are
applied. If a Bag generates a conflict, the merge function will be called to
produce a Bag that will be unconditionally applied. Once all conflicts have been
resolved, the accumulated Bags will be upserted by a recursive call to the
upsert method. There is only ever one level of recursion.
bobvawter added a commit that referenced this issue Oct 18, 2023
This change is part of #487 to support three-way merges.

PR #540 and #543 were submitted separately, so the apply code did not support
writing to the DLQ. This change completes the wiring and allows conflicts to be
written to the queue.

The unexported apply.newApply() function was made a method on the factory type,
to decrease the number of arguments.
bobvawter added a commit that referenced this issue Oct 18, 2023
This change is part of #487 to support three-way merges.

This change adds merge.Standard and exposes it to the userscript. The merge
function identifies the properties which have changed between the before and
proposed bags. If the value of the property in the before bag equals the value
of the property in the target, the change is applied.

Property equivalency is presently defined as "serializes to the same JSON
bytes". The golang json serializer is deterministic and we have a rather fluid
typesystem, so this seems like a reasonable initial implementation.

A fallback merge function can be composed with merge.Standard to handle
properties with application-specific semantics. The script test shows the
composition of the standard merge with a counter-like field that is only ever
incremented. Properties which cannot be automatically merged are indicated by a
new Conflict.Unmerged field and corresponding userscript binding. This fallback
function can also be used to "merge or else" by using a trivial fallback that
always returns the name of a dlq. This, too, is demonstrated in the test script.

The merge.Conflict.Existing field is renamed to Target. It either contains the
existing state of the row in the target database, or it contains the data that
merge.Standard determines should be stored in the target. The change in sort
order improves readability in the tests: Before, Proposed, Target -> Expected.
github-merge-queue bot pushed a commit that referenced this issue Oct 19, 2023
This change is part of #487 to support three-way merges.

PR #540 and #543 were submitted separately, so the apply code did not support
writing to the DLQ. This change completes the wiring and allows conflicts to be
written to the queue.

The unexported apply.newApply() function was made a method on the factory type,
to decrease the number of arguments.
bobvawter added a commit that referenced this issue Oct 20, 2023
This change is part of #487 to support three-way merges.

This change adds merge.Standard and exposes it to the userscript. The merge
function identifies the properties which have changed between the before and
proposed bags. If the value of the property in the before bag equals the value
of the property in the target, the change is applied.

Property equivalency is presently defined as "serializes to the same JSON
bytes". The golang json serializer is deterministic and we have a rather fluid
typesystem, so this seems like a reasonable initial implementation.

A fallback merge function can be composed with merge.Standard to handle
properties with application-specific semantics. The script test shows the
composition of the standard merge with a counter-like field that is only ever
incremented. Properties which cannot be automatically merged are indicated by a
new Conflict.Unmerged field and corresponding userscript binding. This fallback
function can also be used to "merge or else" by using a trivial fallback that
always returns the name of a dlq. This, too, is demonstrated in the test script.

The merge.Conflict.Existing field is renamed to Target. It either contains the
existing state of the row in the target database, or it contains the data that
merge.Standard determines should be stored in the target. The change in sort
order improves readability in the tests: Before, Proposed, Target -> Expected.
bobvawter added a commit that referenced this issue Oct 21, 2023
This change is part of #487 to support three-way merges.

This change adds merge.Standard and exposes it to the userscript. The merge
function identifies the properties which have changed between the before and
proposed bags. If the value of the property in the before bag equals the value
of the property in the target, the change is applied.

Property equivalency is presently defined as "serializes to the same JSON
bytes". The golang json serializer is deterministic and we have a rather fluid
typesystem, so this seems like a reasonable initial implementation.

A fallback merge function can be composed with merge.Standard to handle
properties with application-specific semantics. The script test shows the
composition of the standard merge with a counter-like field that is only ever
incremented. Properties which cannot be automatically merged are indicated by a
new Conflict.Unmerged field and corresponding userscript binding. This fallback
function can also be used to "merge or else" by using a trivial fallback that
always returns the name of a dlq. This, too, is demonstrated in the test script.

The merge.Conflict.Existing field is renamed to Target. It either contains the
existing state of the row in the target database, or it contains the data that
merge.Standard determines should be stored in the target. The change in sort
order improves readability in the tests: Before, Proposed, Target -> Expected.
bobvawter added a commit that referenced this issue Oct 21, 2023
This change is part of #487 to support three-way merges.

This change adds merge.Standard and exposes it to the userscript. The merge
function identifies the properties which have changed between the before and
proposed bags. If the value of the property in the before bag equals the value
of the property in the target, the change is applied.

Property equivalency is presently defined as "serializes to the same JSON
bytes". The golang json serializer is deterministic and we have a rather fluid
typesystem, so this seems like a reasonable initial implementation.

A fallback merge function can be composed with merge.Standard to handle
properties with application-specific semantics. The script test shows the
composition of the standard merge with a counter-like field that is only ever
incremented. Properties which cannot be automatically merged are indicated by a
new Conflict.Unmerged field and corresponding userscript binding. This fallback
function can also be used to "merge or else" by using a trivial fallback that
always returns the name of a dlq. This, too, is demonstrated in the test script.

The merge.Conflict.Existing field is renamed to Target. It either contains the
existing state of the row in the target database, or it contains the data that
merge.Standard determines should be stored in the target. The change in sort
order improves readability in the tests: Before, Proposed, Target -> Expected.
bobvawter added a commit that referenced this issue Oct 27, 2023
This change is part of #487 to support three-way merges.

This change adds merge.Standard and exposes it to the userscript. The merge
function identifies the properties which have changed between the before and
proposed bags. If the value of the property in the before bag equals the value
of the property in the target, the change is applied.

Property equivalency is presently defined as "serializes to the same JSON
bytes". The golang json serializer is deterministic and we have a rather fluid
typesystem, so this seems like a reasonable initial implementation.

A fallback merge function can be composed with merge.Standard to handle
properties with application-specific semantics. The script test shows the
composition of the standard merge with a counter-like field that is only ever
incremented. Properties which cannot be automatically merged are indicated by a
new Conflict.Unmerged field and corresponding userscript binding. This fallback
function can also be used to "merge or else" by using a trivial fallback that
always returns the name of a dlq. This, too, is demonstrated in the test script.

The merge.Conflict.Existing field is renamed to Target. It either contains the
existing state of the row in the target database, or it contains the data that
merge.Standard determines should be stored in the target. The change in sort
order improves readability in the tests: Before, Proposed, Target -> Expected.
github-merge-queue bot pushed a commit that referenced this issue Oct 27, 2023
This change is part of #487 to support three-way merges.

This change adds merge.Standard and exposes it to the userscript. The merge
function identifies the properties which have changed between the before and
proposed bags. If the value of the property in the before bag equals the value
of the property in the target, the change is applied.

Property equivalency is presently defined as "serializes to the same JSON
bytes". The golang json serializer is deterministic and we have a rather fluid
typesystem, so this seems like a reasonable initial implementation.

A fallback merge function can be composed with merge.Standard to handle
properties with application-specific semantics. The script test shows the
composition of the standard merge with a counter-like field that is only ever
incremented. Properties which cannot be automatically merged are indicated by a
new Conflict.Unmerged field and corresponding userscript binding. This fallback
function can also be used to "merge or else" by using a trivial fallback that
always returns the name of a dlq. This, too, is demonstrated in the test script.

The merge.Conflict.Existing field is renamed to Target. It either contains the
existing state of the row in the target database, or it contains the data that
merge.Standard determines should be stored in the target. The change in sort
order improves readability in the tests: Before, Proposed, Target -> Expected.
bobvawter added a commit that referenced this issue Oct 31, 2023
This change moves tracking of partially-applied resolved timestamp windows into
the staging tables by adding a new `applied` column. The goal of this change is
to move some state-tracking out of the cdc resolver loop into the stage
package. Tracking apply status on a per-mutation basis improves idempotency of
cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way
merge). It also allows us to export monitoring data around mutations which may
have slipped through the cracks or to detect when a migration process has
completely drained. Fine-grained tracking will also be useful for unifying the
non-transactional modes into a single behavior.

Many unused methods in the stage API have been deleted. The "unstaging" SQL
query is now generated with a golang template and is tested similarly to the
apply package.

The cdc package performs less work to track partial application of large
individual changes. It just persists the contents of the UnstageCursor as a
performance enhancement. Exactly-once behavior is provided by the applied
column.

The change to `server/integration_test.go` is due to the unstage processing
being a one-shot. The test being performed duplicates an existing test in
`cdc/handler_test.go`.

Breaking change: The `--selectBatchSize` flag is deprecated in favor of two
different flags `--largeTransactionLimit` and `--timestampWindowSize` which,
respectively, enable partial processing of a single, over-sized transaction and
a general limit on the total amount of data to be unstaged.

Breaking change: A staging schema migraion is required, this is documented in
the migrations directory.

X-Ref: #487
X-Ref: #504
X-Ref: #565
bobvawter added a commit that referenced this issue Oct 31, 2023
This change moves tracking of partially-applied resolved timestamp windows into
the staging tables by adding a new `applied` column. The goal of this change is
to move some state-tracking out of the cdc resolver loop into the stage
package. Tracking apply status on a per-mutation basis improves idempotency of
cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way
merge). It also allows us to export monitoring data around mutations which may
have slipped through the cracks or to detect when a migration process has
completely drained. Fine-grained tracking will also be useful for unifying the
non-transactional modes into a single behavior.

Many unused methods in the stage API have been deleted. The "unstaging" SQL
query is now generated with a golang template and is tested similarly to the
apply package.

The cdc package performs less work to track partial application of large
individual changes. It just persists the contents of the UnstageCursor as a
performance enhancement. Exactly-once behavior is provided by the applied
column.

The change to `server/integration_test.go` is due to the unstage processing
being a one-shot. The test being performed duplicates an existing test in
`cdc/handler_test.go`.

Breaking change: The `--selectBatchSize` flag is deprecated in favor of two
different flags `--largeTransactionLimit` and `--timestampWindowSize` which,
respectively, enable partial processing of a single, over-sized transaction and
a general limit on the total amount of data to be unstaged.

Breaking change: A staging schema migraion is required, this is documented in
the migrations directory.

X-Ref: #487
X-Ref: #504
X-Ref: #565
bobvawter added a commit that referenced this issue Oct 31, 2023
This change moves tracking of partially-applied resolved timestamp windows into
the staging tables by adding a new `applied` column. The goal of this change is
to move some state-tracking out of the cdc resolver loop into the stage
package. Tracking apply status on a per-mutation basis improves idempotency of
cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way
merge). It also allows us to export monitoring data around mutations which may
have slipped through the cracks or to detect when a migration process has
completely drained. Fine-grained tracking will also be useful for unifying the
non-transactional modes into a single behavior.

Many unused methods in the stage API have been deleted. The "unstaging" SQL
query is now generated with a golang template and is tested similarly to the
apply package.

The cdc package performs less work to track partial application of large
individual changes. It just persists the contents of the UnstageCursor as a
performance enhancement. Exactly-once behavior is provided by the applied
column.

The change to `server/integration_test.go` is due to the unstage processing
being a one-shot. The test being performed duplicates an existing test in
`cdc/handler_test.go`.

Breaking change: The `--selectBatchSize` flag is deprecated in favor of two
different flags `--largeTransactionLimit` and `--timestampWindowSize` which,
respectively, enable partial processing of a single, over-sized transaction and
a general limit on the total amount of data to be unstaged.

Breaking change: A staging schema migraion is required, this is documented in
the migrations directory.

X-Ref: #487
X-Ref: #504
X-Ref: #565
bobvawter added a commit that referenced this issue Oct 31, 2023
This change moves tracking of partially-applied resolved timestamp windows into
the staging tables by adding a new `applied` column. The goal of this change is
to move some state-tracking out of the cdc resolver loop into the stage
package. Tracking apply status on a per-mutation basis improves idempotency of
cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way
merge). It also allows us to export monitoring data around mutations which may
have slipped through the cracks or to detect when a migration process has
completely drained. Fine-grained tracking will also be useful for unifying the
non-transactional modes into a single behavior.

Many unused methods in the stage API have been deleted. The "unstaging" SQL
query is now generated with a golang template and is tested similarly to the
apply package.

The cdc package performs less work to track partial application of large
individual changes. It just persists the contents of the UnstageCursor as a
performance enhancement. Exactly-once behavior is provided by the applied
column.

The change to `server/integration_test.go` is due to the unstage processing
being a one-shot. The test being performed duplicates an existing test in
`cdc/handler_test.go`.

Breaking change: The `--selectBatchSize` flag is deprecated in favor of two
different flags `--largeTransactionLimit` and `--timestampWindowSize` which,
respectively, enable partial processing of a single, over-sized transaction and
a general limit on the total amount of data to be unstaged.

Breaking change: A staging schema migraion is required, this is documented in
the migrations directory.

X-Ref: #487
X-Ref: #504
X-Ref: #565
bobvawter added a commit that referenced this issue Nov 1, 2023
This change moves tracking of partially-applied resolved timestamp windows into
the staging tables by adding a new `applied` column. The goal of this change is
to move some state-tracking out of the cdc resolver loop into the stage
package. Tracking apply status on a per-mutation basis improves idempotency of
cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way
merge). It also allows us to export monitoring data around mutations which may
have slipped through the cracks or to detect when a migration process has
completely drained. Fine-grained tracking will also be useful for unifying the
non-transactional modes into a single behavior.

Many unused methods in the stage API have been deleted. The "unstaging" SQL
query is now generated with a golang template and is tested similarly to the
apply package.

The cdc package performs less work to track partial application of large
individual changes. It just persists the contents of the UnstageCursor as a
performance enhancement. Exactly-once behavior is provided by the applied
column.

The change to `server/integration_test.go` is due to the unstage processing
being a one-shot. The test being performed duplicates an existing test in
`cdc/handler_test.go`.

Breaking change: The `--selectBatchSize` flag is deprecated in favor of two
different flags `--largeTransactionLimit` and `--timestampWindowSize` which,
respectively, enable partial processing of a single, over-sized transaction and
a general limit on the total amount of data to be unstaged.

Breaking change: A staging schema migraion is required, this is documented in
the migrations directory.

X-Ref: #487
X-Ref: #504
X-Ref: #565
bobvawter added a commit that referenced this issue Nov 1, 2023
This change moves tracking of partially-applied resolved timestamp windows into
the staging tables by adding a new `applied` column. The goal of this change is
to move some state-tracking out of the cdc resolver loop into the stage
package. Tracking apply status on a per-mutation basis improves idempotency of
cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way
merge). It also allows us to export monitoring data around mutations which may
have slipped through the cracks or to detect when a migration process has
completely drained. Fine-grained tracking will also be useful for unifying the
non-transactional modes into a single behavior.

Many unused methods in the stage API have been deleted. The "unstaging" SQL
query is now generated with a golang template and is tested similarly to the
apply package.

The cdc package performs less work to track partial application of large
individual changes. It just persists the contents of the UnstageCursor as a
performance enhancement. Exactly-once behavior is provided by the applied
column.

The change to `server/integration_test.go` is due to the unstage processing
being a one-shot. The test being performed duplicates an existing test in
`cdc/handler_test.go`.

Breaking change: The `--selectBatchSize` flag is deprecated in favor of two
different flags `--largeTransactionLimit` and `--timestampWindowSize` which,
respectively, enable partial processing of a single, over-sized transaction and
a general limit on the total amount of data to be unstaged.

Breaking change: A staging schema migraion is required, this is documented in
the migrations directory.

X-Ref: #487
X-Ref: #504
X-Ref: #565
bobvawter added a commit that referenced this issue Nov 1, 2023
This change moves tracking of partially-applied resolved timestamp windows into
the staging tables by adding a new `applied` column. The goal of this change is
to move some state-tracking out of the cdc resolver loop into the stage
package. Tracking apply status on a per-mutation basis improves idempotency of
cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way
merge). It also allows us to export monitoring data around mutations which may
have slipped through the cracks or to detect when a migration process has
completely drained. Fine-grained tracking will also be useful for unifying the
non-transactional modes into a single behavior.

Many unused methods in the stage API have been deleted. The "unstaging" SQL
query is now generated with a golang template and is tested similarly to the
apply package.

The cdc package performs less work to track partial application of large
individual changes. It just persists the contents of the UnstageCursor as a
performance enhancement. Exactly-once behavior is provided by the applied
column.

The change to `server/integration_test.go` is due to the unstage processing
being a one-shot. The test being performed duplicates an existing test in
`cdc/handler_test.go`.

Breaking change: The `--selectBatchSize` flag is deprecated in favor of two
different flags `--largeTransactionLimit` and `--timestampWindowSize` which,
respectively, enable partial processing of a single, over-sized transaction and
a general limit on the total amount of data to be unstaged.

Breaking change: A staging schema migraion is required, this is documented in
the migrations directory.

X-Ref: #487
X-Ref: #504
X-Ref: #565
bobvawter added a commit that referenced this issue Nov 2, 2023
This change moves tracking of partially-applied resolved timestamp windows into
the staging tables by adding a new `applied` column. The goal of this change is
to move some state-tracking out of the cdc resolver loop into the stage
package. Tracking apply status on a per-mutation basis improves idempotency of
cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way
merge). It also allows us to export monitoring data around mutations which may
have slipped through the cracks or to detect when a migration process has
completely drained. Fine-grained tracking will also be useful for unifying the
non-transactional modes into a single behavior.

Many unused methods in the stage API have been deleted. The "unstaging" SQL
query is now generated with a golang template and is tested similarly to the
apply package.

The cdc package performs less work to track partial application of large
individual changes. It just persists the contents of the UnstageCursor as a
performance enhancement. Exactly-once behavior is provided by the applied
column.

The change to `server/integration_test.go` is due to the unstage processing
being a one-shot. The test being performed duplicates an existing test in
`cdc/handler_test.go`.

Breaking change: The `--selectBatchSize` flag is deprecated in favor of two
different flags `--largeTransactionLimit` and `--timestampWindowSize` which,
respectively, enable partial processing of a single, over-sized transaction and
a general limit on the total amount of data to be unstaged.

Breaking change: A staging schema migraion is required, this is documented in
the migrations directory.

X-Ref: #487
X-Ref: #504
X-Ref: #565
github-merge-queue bot pushed a commit that referenced this issue Nov 2, 2023
This change moves tracking of partially-applied resolved timestamp windows into
the staging tables by adding a new `applied` column. The goal of this change is
to move some state-tracking out of the cdc resolver loop into the stage
package. Tracking apply status on a per-mutation basis improves idempotency of
cdc-sink when the userscript has non-idempotent behaviors (e.g.: three-way
merge). It also allows us to export monitoring data around mutations which may
have slipped through the cracks or to detect when a migration process has
completely drained. Fine-grained tracking will also be useful for unifying the
non-transactional modes into a single behavior.

Many unused methods in the stage API have been deleted. The "unstaging" SQL
query is now generated with a golang template and is tested similarly to the
apply package.

The cdc package performs less work to track partial application of large
individual changes. It just persists the contents of the UnstageCursor as a
performance enhancement. Exactly-once behavior is provided by the applied
column.

The change to `server/integration_test.go` is due to the unstage processing
being a one-shot. The test being performed duplicates an existing test in
`cdc/handler_test.go`.

Breaking change: The `--selectBatchSize` flag is deprecated in favor of two
different flags `--largeTransactionLimit` and `--timestampWindowSize` which,
respectively, enable partial processing of a single, over-sized transaction and
a general limit on the total amount of data to be unstaged.

Breaking change: A staging schema migraion is required, this is documented in
the migrations directory.

X-Ref: #487
X-Ref: #504
X-Ref: #565
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant