Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-6828 Ad-hoc blocking snaps trigger emits schema changes of all tables #4824

Merged
merged 2 commits into from Sep 20, 2023

Conversation

mfvitale
Copy link
Member

@mfvitale mfvitale commented Sep 1, 2023

With these changes the blocking snapshot will generate schema events only for signaled tables

closes: https://issues.redhat.com/browse/DBZ-6828

addSchemaEvent(snapshotContext, "", connection.setStatementFor(connection.readMySqlCharsetSystemVariables()));
if (!snapshottingTask.isBlocking()) {
// Record default charset
addSchemaEvent(snapshotContext, "", connection.setStatementFor(connection.readMySqlCharsetSystemVariables()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpechane I noted that for Charset statement no database name is provided. Is it right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, IIRC it does not depnd on database

@Override
protected void assertDdl(List<String> schemaChangesDdls) {

assertThat(schemaChangesDdls.get(schemaChangesDdls.size() - 2)).isEqualTo("DROP TABLE IF EXISTS `blocking_snapshot_test_1`.`b`");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this will fail because for the DROP events we don't have information about the table. See https://github.com/debezium/debezium/pull/4824/files#diff-63840528f25c3335e78d8b91c2ea67602750af012c1620d9981f11cb476b480aR642

@@ -635,6 +639,10 @@ protected void createSchemaChangeEventsForTables(ChangeEventSourceContext source
LOGGER.debug("Processing schema event {}", event);

final TableId tableId = event.getTables().isEmpty() ? null : event.getTables().iterator().next().id();
if (snapshottingTask.isBlocking() && !snapshotContext.capturedTables.contains(tableId)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpechane Here I need to avoid to resend already processed events and process the only ones requested in bloking snapshot signal. Is there a way to check the same for the DROP statements for which tableId is null?

I mean something more robust then checking the name of the table in the ddl field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfvitale Under which conditions the DROP even will not contain tableId?

Copy link
Member Author

@mfvitale mfvitale Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpechane Any DROP events both from snapshot and streaming. Do you know if it's for some limitation?

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpechane I think I have found an acceptable solution. Please check commit f478c7e

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfvitale It is good workaround. But I'd argu that the missing table change is bug in this case. Would you be able to dig into it a bit deeper? If not we can merge this PR and create another Jira just for that problem.

Copy link
Member Author

@mfvitale mfvitale Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same feeling but not 100% sure.

I think in this case is better to create a separate Jira issue to investigate it.

@mfvitale mfvitale force-pushed the DBZ-6828 branch 2 times, most recently from 5110249 to 1936ad9 Compare September 1, 2023 13:24
@jpechane jpechane merged commit 68eb4b2 into debezium:main Sep 20, 2023
25 of 29 checks passed
@jpechane
Copy link
Contributor

@mfvitale Applied, thanks! Please create the follow-up Jira for DROP events

@mfvitale mfvitale deleted the DBZ-6828 branch September 20, 2023 07:17
@mfvitale
Copy link
Member Author

@mfvitale Applied, thanks! Please create the follow-up Jira for DROP events

https://issues.redhat.com/browse/DBZ-6945

Comment on lines +327 to +329
List<SchemaChangeEvent> missingSchemaChangeEvents = schemaChangeEvents.stream()
.filter(not(schemaEvents::contains))
.collect(Collectors.toList());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfvitale Hi here have a performance bottleneck. time complexity is O(databaseSchemaNumber²), We have 140+ databases on single instance, here process need 2 hour+.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @minchowang can you please open a separate issue? Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants