DBZ-55 Corrected filtering of DDL statements based upon affected database #49

rhauch · 2016-05-23T19:56:05Z

Previously, the DDL statements were being filtered and recorded based upon the name of the database that appeared in the binlog. That database name, however, is actually the name of the database to which the client submitting the operation is connected, and is not necessarily the database affected by the operation (e.g., when an operation includes a fully-qualified table name not in the connected-to database).

With these changes, the table/database affected by the DDL statements is now being used to filter the recording of the statements. The order of the DDL statements in the binlog is still maintained, but since each DDL statement can apply to a separate database, the DDL statements are batched (in the same original order) based upon the affected database. For example, two statements affecting db1 will get batched together into one schema change record, followed by one statement affecting db2 as a second schema change record, followed by another statement affecting db1 as a third schema record. Of course, if db2 is excluded for some reason from the connector's configuration, then that second schema change record would not be written.

To determine the affected database for each DDL statement required changes to the DDL parsing framework. Although a listener mechanism was recently added, this PR adds a reusable listener implementation that accumulates 1 or more DDL statements and allows the caller (in this case the MySQL connector) to consume the sequences of statements and the database names to which they apply. Consecutive statements that apply to the same database are grouped/batched together. The MySQL connector uses this to process each QUERY event in the binlog, which may contain 1 or more DDL statements. The MySQL DDL parser was also enhanced to properly parse and handle CREATE DATABASE, ALTER DATABASE, and DROP DATABASE statements, since the parser needs to identify the affected database for these statements so they can be properly filtered.

Meanwhile, this change does not affect how the database history records the statements: it still records them exactly as submitted without regard to filtering and using a single record for each separate binlog QUERY event. IOW, the database history continues to record every DDL statement in the same order as found in the binlog, and all DDL statements found in a single binlog event are written atomically to the history stream. However, this commit does change the order that the database history and schema change records are written, so that the latter are now first and the database history is written second. Under nominal operation each is written exactly once, but the database history records are now written after any schema change record so that, upon recovery after failure, no schema change records are lost (and instead have at-least-once delivery guarantees).

…base Previously, the DDL statements were being filtered and recorded based upon the name of the database that appeared in the binlog. However, that database name is actually the name of the database to which the client submitting the operation is connected, and is not necessarily the database _affected_ by the operation (e.g., when an operation includes a fully-qualified table name not in the connected-to database). With these changes, the table/database affected by the DDL statements is now being used to filter the recording of the statements. The order of the DDL statements is still maintained, but since each DDL statement can apply to a separate database the DDL statements are batched (in the same original order) based upon the affected database. For example, two statements affecting "db1" will get batched together into one schema change record, followed by one statement affecting "db2" as a second schema change record, followed by another statement affecting "db1" as a third schema record. Meanwhile, this change does not affect how the database history records the changes: it still records them as submitted using a single record for each separate binlog event/position. This is much safer as each binlog event (with specific position) is written atomically to the history stream. Also, since the database history stream is what the connector uses upon recovery, the database history records are now written _after_ any schema change records to ensure that, upon recovery after failure, no schema change records are lost (and instead have at-least-once delivery guarantees).

DBZ-3452: source.timestamp.mode=commit imposes a significant performance penalty

* kafka connect 2.6 & dbz 1.3.1 * update jars

<img width="1665" alt="image" src="https://git.corp.stripe.com/storage/user/6616/files/56cefafd-b3a8-49e2-896e-8b27495d1af4"> <img width="1623" alt="image" src="https://git.corp.stripe.com/storage/user/6616/files/5bc0966b-a093-4de2-b083-b6e563cf2ca8"> r? binh shichao (Squashed by Merge Queue - Original PR: https://git.corp.stripe.com/stripe-private-oss-forks/debezium/pull/49)

* kafka connect 2.6 & dbz 1.3.1 * update jars

rhauch merged commit 57e6c73 into debezium:master May 24, 2016

rhauch deleted the dbz-55 branch May 24, 2016 00:46

mikekamornikov pushed a commit to mikekamornikov/debezium that referenced this pull request Apr 30, 2021

Merge pull request debezium#49 from morozov/DBZ-3452

11fdef6

DBZ-3452: source.timestamp.mode=commit imposes a significant performance penalty

morozov mentioned this pull request Apr 30, 2022

DBZ-5075: Clean up MySQL connector codebase #3447

Merged

bdbene pushed a commit to bdbene/debezium that referenced this pull request Jun 23, 2023

kafka connect 2.6 & dbz 1.3.1 (debezium#49)

a91ca78

* kafka connect 2.6 & dbz 1.3.1 * update jars

bdbene pushed a commit to bdbene/debezium that referenced this pull request Jun 23, 2023

kafka connect 2.6 & dbz 1.3.1 (debezium#49)

ed276db

* kafka connect 2.6 & dbz 1.3.1 * update jars

xinbinhuang mentioned this pull request Jun 27, 2023

binh/parallel tasks #4644

Closed

methodmissing pushed a commit to methodmissing/debezium that referenced this pull request Apr 6, 2024

kafka connect 2.6 & dbz 1.3.1 (debezium#49)

f1e2e27

* kafka connect 2.6 & dbz 1.3.1 * update jars

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DBZ-55 Corrected filtering of DDL statements based upon affected database #49

DBZ-55 Corrected filtering of DDL statements based upon affected database #49

rhauch commented May 23, 2016

DBZ-55 Corrected filtering of DDL statements based upon affected database #49

DBZ-55 Corrected filtering of DDL statements based upon affected database #49

Conversation

rhauch commented May 23, 2016