DBZ-55 Corrected filtering of DDL statements based upon affected database #49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, the DDL statements were being filtered and recorded based upon the name of the database that appeared in the binlog. That database name, however, is actually the name of the database to which the client submitting the operation is connected, and is not necessarily the database affected by the operation (e.g., when an operation includes a fully-qualified table name not in the connected-to database).
With these changes, the table/database affected by the DDL statements is now being used to filter the recording of the statements. The order of the DDL statements in the binlog is still maintained, but since each DDL statement can apply to a separate database, the DDL statements are batched (in the same original order) based upon the affected database. For example, two statements affecting
db1
will get batched together into one schema change record, followed by one statement affectingdb2
as a second schema change record, followed by another statement affectingdb1
as a third schema record. Of course, ifdb2
is excluded for some reason from the connector's configuration, then that second schema change record would not be written.To determine the affected database for each DDL statement required changes to the DDL parsing framework. Although a listener mechanism was recently added, this PR adds a reusable listener implementation that accumulates 1 or more DDL statements and allows the caller (in this case the MySQL connector) to consume the sequences of statements and the database names to which they apply. Consecutive statements that apply to the same database are grouped/batched together. The MySQL connector uses this to process each QUERY event in the binlog, which may contain 1 or more DDL statements. The MySQL DDL parser was also enhanced to properly parse and handle
CREATE DATABASE
,ALTER DATABASE
, andDROP DATABASE
statements, since the parser needs to identify the affected database for these statements so they can be properly filtered.Meanwhile, this change does not affect how the database history records the statements: it still records them exactly as submitted without regard to filtering and using a single record for each separate binlog QUERY event. IOW, the database history continues to record every DDL statement in the same order as found in the binlog, and all DDL statements found in a single binlog event are written atomically to the history stream. However, this commit does change the order that the database history and schema change records are written, so that the latter are now first and the database history is written second. Under nominal operation each is written exactly once, but the database history records are now written after any schema change record so that, upon recovery after failure, no schema change records are lost (and instead have at-least-once delivery guarantees).