DBZ-6365 Support streaming a list of shards/gtids with multiple tasks #135

twthorn · 2023-04-20T00:50:08Z

Summary

We added support previously for single thread mode for reading in a string of csv shards. Now we extend that to also work with multiple tasks

Verification

Added unit tests & acceptance tests for verifying things work as expected when multiple tasks are used with the shard config csv string.
Note: for the registration of the metrics in the integration test, they were always registered under task_0_1_0 despite the fact that task_0_2_0 would be the expected name for a connector with 2 max tasks. When I checked the logs the taskConfigs function was always being called with 1 even if our config said 2. From what I can tell this is a limitation of the integration tests using the embedded engine, so I just overrode the numTasks for generating the task ID to always be 1.

github-actions · 2023-04-20T00:50:24Z

Welcome as a new contributor to Debezium, @twthorn. Reviewers, please add missing author name(s) and alias name(s) to the COPYRIGHT.txt and Aliases.txt respectively.

HenryCaiHaiying · 2023-04-20T01:21:45Z

src/main/java/io/debezium/connector/vitess/VitessConnector.java

@@ -142,17 +143,22 @@ public List<Map<String, String>> taskConfigs(int maxTasks, List<String> currentS
                        prevGtidsPerShard.keySet(), currentShards);
            }
            final String keyspace = connectorConfig.getKeyspace();
+            // Check the configs in case there is a user specified GTID override
+            verifyShardGtidConfig();
+            Map<String, String> gtidsPerShard = getGtidsPerShardFromConfig();


This logic is not correct, the shard/grids will need to come from the Kafka storage instead of from static config on the normal run.

The static config value is only used for the first run when there is no history in Kafka storage.

Please don't merge in this PR yet, I will work with Tom separately to sort out some of the issues.

Updated with this feedback, thanks!

jpechane · 2023-04-20T03:36:50Z

src/main/java/io/debezium/connector/vitess/VitessConnector.java

+
+    private void verifyShardGtidConfig() {
+        final List<String> gtids = connectorConfig.getGtid();
+        if (connectorConfig.getShard() != null &&


This is re-used check - maybe it would be good to extract it it to a separate method at VitessConnectorConfig. You can use a validate method for it.

Added, thanks!

github-actions · 2023-04-20T21:51:06Z

Hi @twthorn, thanks for your contribution. Please prefix the commit message(s) with the DBZ-xxx JIRA issue key.

…g more tests

twthorn · 2023-04-20T21:57:35Z

Updated to use GTIDs only on initial run (not stored in previous or current shard-gtid map). Also use shard list whenever specified. Added more tests & some refactoring to clean things up. Also force-pushed to fix commit message formatting (no other change).

HenryCaiHaiying

Looks good overall, one comment on whether we should support shrinking shard list.

HenryCaiHaiying · 2023-04-20T22:42:58Z

src/main/java/io/debezium/connector/vitess/VitessConnector.java

            if (prevGtidsPerShard != null && !hasSameShards(prevGtidsPerShard.keySet(), currentShards)) {
                LOGGER.warn("Some shards for the previous generation {} are not persisted.  Expected shards: {}",
                        prevGtidsPerShard.keySet(), currentShards);
+                if (prevGtidsPerShard.keySet().containsAll(currentShards)) {


Should we throw Exception in this case? People usually don't look at the logs until problem happens. If we let this pass, we won't have the old gtids for shrinked shards anymore since we don't look for older gtids more than one generation old. If we think shrinking shards is a valid use case, then we probably need to a config flag to indicate whether we should halt on shard list shrinking or not, the default value for that config value should be false.

Thanks for the update, it looks good to me now.

Thanks for the feedback. Updated to throw an exception.

If we end up with a use case later, we can implement functionality for contracting the shard list, but for now we will opt for the more cautious path of preventing any lost state (plus this is a new feature, so there's no use cases that do this contracting shard list, and we only know of a use case for expansion i.e., ours).

HenryCaiHaiying · 2023-04-20T23:15:51Z

src/main/java/io/debezium/connector/vitess/VitessConnector.java

            if (prevGtidsPerShard != null && !hasSameShards(prevGtidsPerShard.keySet(), currentShards)) {
                LOGGER.warn("Some shards for the previous generation {} are not persisted.  Expected shards: {}",
                        prevGtidsPerShard.keySet(), currentShards);
+                if (prevGtidsPerShard.keySet().containsAll(currentShards)) {


Thanks for the update, it looks good to me now.

… reproduce

github-actions · 2023-04-22T00:34:06Z

Hi @twthorn, thanks for your contribution. Please prefix the commit message(s) with the DBZ-xxx JIRA issue key.

twthorn · 2023-04-22T00:37:30Z

Fixed a bug where the shards were not being passed in from the config correctly between the two taskConfigs methods. Updated tests to reproduce the error, which is now fixed.

Not sure why commit message is failing, I have the correct prefix.

HenryCaiHaiying · 2023-04-22T00:46:25Z

@twthorn Not sure if you title the commit as 'DBZ-6365: ' instead of 'DBZ-6365 ' would make the commit check pass.

jpechane · 2023-04-23T18:16:55Z

@twthorn Applied, thanks!

DBZ-6365 Support streaming a list of shards/gtids with multiple tasks

5fa0c97

HenryCaiHaiying reviewed Apr 20, 2023

View reviewed changes

jpechane reviewed Apr 20, 2023

View reviewed changes

DBZ-6365 Fix logic to use GTIDs/shards correctly, refactoring & addin…

e1a2450

…g more tests

twthorn force-pushed the DBZ-6365 branch from 4b031f9 to e1a2450 Compare April 20, 2023 21:55

twthorn requested review from jpechane and HenryCaiHaiying April 20, 2023 21:57

HenryCaiHaiying reviewed Apr 20, 2023

View reviewed changes

DBZ-6365 Throw exception if contracting shard list, update test

3857803

HenryCaiHaiying approved these changes Apr 20, 2023

View reviewed changes

DBZ-6365 Fix bug where shards is not set from config, update tests to…

b352ec7

… reproduce

jpechane merged commit 82af51b into debezium:main Apr 23, 2023
2 of 3 checks passed

twthorn mentioned this pull request Oct 17, 2023

DBZ-7050 Add automatic retry for snapshots #163

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DBZ-6365 Support streaming a list of shards/gtids with multiple tasks #135

DBZ-6365 Support streaming a list of shards/gtids with multiple tasks #135

twthorn commented Apr 20, 2023

github-actions bot commented Apr 20, 2023

HenryCaiHaiying Apr 20, 2023 •

edited

HenryCaiHaiying Apr 20, 2023

twthorn Apr 20, 2023

jpechane Apr 20, 2023 •

edited

twthorn Apr 20, 2023

github-actions bot commented Apr 20, 2023

twthorn commented Apr 20, 2023 •

edited

HenryCaiHaiying left a comment

HenryCaiHaiying Apr 20, 2023

HenryCaiHaiying Apr 20, 2023

twthorn Apr 20, 2023

HenryCaiHaiying Apr 20, 2023

github-actions bot commented Apr 22, 2023

twthorn commented Apr 22, 2023

HenryCaiHaiying commented Apr 22, 2023

jpechane commented Apr 23, 2023

DBZ-6365 Support streaming a list of shards/gtids with multiple tasks #135

DBZ-6365 Support streaming a list of shards/gtids with multiple tasks #135

Conversation

twthorn commented Apr 20, 2023

Summary

Verification

github-actions bot commented Apr 20, 2023

HenryCaiHaiying Apr 20, 2023 • edited

Choose a reason for hiding this comment

HenryCaiHaiying Apr 20, 2023

Choose a reason for hiding this comment

twthorn Apr 20, 2023

Choose a reason for hiding this comment

jpechane Apr 20, 2023 • edited

Choose a reason for hiding this comment

twthorn Apr 20, 2023

Choose a reason for hiding this comment

github-actions bot commented Apr 20, 2023

twthorn commented Apr 20, 2023 • edited

HenryCaiHaiying left a comment

Choose a reason for hiding this comment

HenryCaiHaiying Apr 20, 2023

Choose a reason for hiding this comment

HenryCaiHaiying Apr 20, 2023

Choose a reason for hiding this comment

twthorn Apr 20, 2023

Choose a reason for hiding this comment

HenryCaiHaiying Apr 20, 2023

Choose a reason for hiding this comment

github-actions bot commented Apr 22, 2023

twthorn commented Apr 22, 2023

HenryCaiHaiying commented Apr 22, 2023

jpechane commented Apr 23, 2023

HenryCaiHaiying Apr 20, 2023 •

edited

jpechane Apr 20, 2023 •

edited

twthorn commented Apr 20, 2023 •

edited