New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postgres / Debezium messages presented "out of order" #5262
Comments
@umanwizard and I spent some time today digging into the Debezium/PostgreSQL source code on this front. Recording those learnings here for posterity. Debezium uses the PgJDBC logical streaming replication protocol to read the change feed. You can learn more about the protocol here: https://www.postgresql.org/docs/10/protocol-logical-replication.html. The gist is that WAL records are emitted via a CopyData stream—much like the way our There are three plugins that are currently available for use with Debezium: wal2json, decoderbuf, and pgoutput. I think all of our stuff uses pgoutput, since it ships by default with PostgreSQL. The important part of this interface is described thusly:
That means that transaction commit messages will be presented in LSN order, but on a per-message level, we'll often see out-of-order LSNs. Per Brennan, a WAL like this
will produce Debezium records like so:
The information we need to put things back into the right order is available in the transaction topic, but it's a bit of a pain to wire that up presently. If we wanted to instead tackle this from the Debezium side, we could include the |
Is
|
In a perfect world, yes. Unfortunately xids in PostgreSQL are 32 bits and wrap around occasionally: https://info.crunchydata.com/blog/managing-transaction-id-wraparound-in-postgresql It's not theoretical, either, sadly. Nearly every production PostgreSQL database has some horror story whose punchline is "xid wraparound". See for example: https://blog.sentry.io/2015/07/23/transaction-id-wraparound-in-postgres |
Could we detect wraparound? If this is for the purpose of deduplication, would it be sufficient to assume that a transaction can not be long enough for the value to wrap around more than once? |
I don't understand why txId would work, even if it weren't for the wraparound issue. We don't expect to see those in order, do we? Since Debezium sees transactions in commit order, not in the order they were created. Or is txId not allocated until a transaction is committed? If that is the case, we could try using it and detecting wraparound, as Chris suggests. |
Ah, yeah, you're right: xids are assigned at the moment you write any data. So they're out for two reasons. |
I filed this upstream: https://issues.redhat.com/browse/DBZ-2911 |
It's possible to fix this for people on newer Debezium versions now that @JLDLaughlin 's changes to the source coordinates have landed. |
@umanwizard what is the latest on this issue? |
It depends on #6553 , which will involve fixing an edge-case in how Debezium emits sequence numbers. After we do that (requires upstream work in Debezium) and the release goes out, then we can start reading those sequence numbers in Materialize which will finally fix this issue. Alternatively, we could fix this now, and it should make postgres/debezium sources correct in more cases than they are now, but it won't fully solve the problem until the issue I linked above is fixed. Anyway, the fix on the Mz end is relatively easy (<1 day of work); me or @quodlibetor can probably pick it up when we get a chance. |
Added to the Sources and Sinks project board so we don't lose track of this |
This is a P1 issue and lacks an owner. Does this make sense for @JLDLaughlin to own, since she did the earlier upstream PR? |
@quodlibetor , are you interested in picking this up? It should be a relatively simple change -- we need to start reading the |
I spoke to Brandon; he will tentatively pick this up. He already has a lot on his plate and is out next week, so we can reassign if necessary if it doesn't get done. |
Essentially a duplicate of #5668, which @JLDLaughlin is working on |
Our logic for deduplicating messages from Debezium assumes that Postgres records are presented in
order of increasing LSN. However, Postgres' logical replication mechanism appears to be present
records in commit order, not WAL order.
For now, this implies that we need to run with
deduplication = 'full'
for sources where themessages are postgres records from Debezium.
The text was updated successfully, but these errors were encountered: