Skip to content

Data Behaviors

Bram Gruneir edited this page May 7, 2024 · 5 revisions

Data Application Behaviors

Replicator supports a number of behaviors that modify how mutations are applied to the target database. These behaviors can be enabled for both cluster-to-cluster and logical-replication uses of Replicator.

The preferred way of configuring these behaviors is via the user script. Each mode described below is set via api.configureTable(). The active apply configuration is available from the /_/diag endpoint.

The behaviors described below can be combined within a single table. For instance, the CAS and Extras behavior can be combined for conditional updates, where unmapped properties are stored as JSON in a designated column.

Compare-and-set

A compare-and-set (CAS) mode allows Replicator to discard some updates, based on version- or time-like fields.

The use of CAS mode intentionally discards mutations and may cause data inconsistencies.

Consider a table with a version INT column. If Replicator receives a message to update some row in this table, the update will only be applied if there is no pre-existing row in the destination table, or if the update's version value is strictly greater than the existing row in the destination table. That is, updates are applied only if they would increase the version column or if they would insert a new row.

api.configureTable("my_schema.public.some_table", {
  cas: [ "version" ]
});

When multiple CAS columns are present, they are compared as a tuple. That is, the second column is only compared if the first column value is equal between the source and destination clusters. In this multi-column case, the following pseudo-sql clause is applied to each incoming update:

api.configureTable("my_schema.public.some_table", {
  cas: [ "version_major", "version_minor", "version_patch" ]
});

Will translate into approximately:

WHERE existing IS NULL OR (
  (incoming.version_major, incoming.version_minor, incoming.version_patch) >
  (existing.version_major, existing.version_minor, existing.version_patch)
)

Deletes from the source cluster are always applied.

Deadlines

A deadline mode allows Replicator to discard some updates, by comparing a time-like field to the destination cluster's time. This is useful when a feed's data is advisory only or changefeed downtime is expected.

The use of deadlines intentionally discards mutations and may cause data inconsistencies.

To opt into deadline mode for a table, set the deadlines property for one or more columns in a table:

api.configureTable("my_schema.public.some_table", {
  "deadlines": {
    "updated_at": "1m",
  }
});

Given the above configuration, each incoming update would behave as though it were filtered with a clause such as WHERE incoming.updated_at > now() - '1m'::INTERVAL.

Deletes from the source cluster are always applied.

Extras column

By default, Replicator will reject any incoming data that cannot be mapped onto a column in the target table. Users may specify a textual or JSONB column in a target table to receive otherwise-unmapped properties. This is useful in logical-replication scenarios where the source data has a variable schema (e.g. migrations from document stores). Values in a JSONB column can be extracted in subsequent schema-change operations using computed columns.

To enable extras mode for a table, set the extras property to the name of a column. The name of the extras column should be chosen to avoid any conflicts with properties in incoming mutations.

api.configureTable("my_schema.public.some_table", {
  extras: "overflow_column"
});

Ignore columns

By default, Replicator will reject incoming mutations that have columns which do not map to a column in the destination table.

The use of ignore mode intentionally discards columns and may cause data inconsistencies.

api.configureTable("my_schema.public.some_table", {
  ignore: {
    "dont_care": true,
  }
});

Ignore mode is mostly useful when using logical replication modes and performing schema changes.

Merge operator

See Merge.

Rename columns

Renaming of columns should be accomplished either via the schema-wide dispatch() function or the per-destination-table map() function.

Renaming columns is mostly useful when using logical replication modes and performing schema changes.

Substitute expressions

Substitute expressions allow incoming column data to be transformed with arbitrary SQL expressions when being applied to the target table. This can also be accomplished via the dispatch() or map() if the value makes sense to compute within the userscript.

The use of substitute expressions may cause data inconsistencies.

Consider the following, contrived, case of multiplying a value by two before applying it:

api.configureTable("my_schema.public.some_table", {
  exprs: {
    "some_column": "2 * $0"
  }
});

The substitution string $0 will expand at runtime to the value of the incoming data. Expressions may repeat the $0 expression. For example $0 || $0 would concatenate a value with itself. It is also permissable to use a constant value that does not depend on an input property:

api.configureTable("my_schema.public.some_table", {
  exprs: {
    "did_migrate": "true"
  }
});

In cases where a column exists only in the target database, using a DEFAULT for the column is preferable to using a substitute expression.