DBZ-4783: Support for multiple databases and tasks in the SQL Server connector #3261

morozov · 2022-02-23T23:23:58Z

Change summary:

Manage change tables for each partition individually.
Expose the actual task id to the metrics in order to be able to run multiple tasks.
Run the connector with multiple databases and multiple tasks.

Note, running multiple tasks is not necessary for capturing multiple databases. Furthermore, running multiple tasks cannot be tested in the current test suite. The only reason why it's contributed here is that extracting the support for multiple tasks in a separate patch would require extra work (it was originally implemented like this). I can remove it from the patch, if necessary.

TODO:

Provide an integration test for the multi-partition scenario.

Manual testing of the multi-database and multi-task configuration

Modify docker-compose-sqlserver.yaml from the tutorial to use the Docker image built from this PR.

Start the services:

export DEBEZIUM_VERSION=1.9
docker-compose -f docker-compose-sqlserver.yaml up

Initialize test databases:

docker-compose -f docker-compose-sqlserver.yaml exec -T sqlserver bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD' << 'EOF'
CREATE DATABASE testDB1;
GO
USE testDB1;
EXEC sys.sp_cdc_enable_db;

CREATE TABLE products (
  id INTEGER IDENTITY(101,1) NOT NULL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  description VARCHAR(512),
  weight FLOAT
);
INSERT INTO products(name,description,weight)
  VALUES ('scooter','Small 2-wheel scooter',3.14);

EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'products', @role_name = NULL, @supports_net_changes = 0;
GO

CREATE DATABASE testDB2;
GO
USE testDB2;
EXEC sys.sp_cdc_enable_db;

CREATE TABLE customers (
  id INTEGER IDENTITY(1001,1) NOT NULL PRIMARY KEY,
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL UNIQUE
);
INSERT INTO customers(first_name,last_name,email)
  VALUES ('Sally','Thomas','sally.thomas@acme.com');

EXEC sys.sp_cdc_enable_table @source_schema = 'dbo', @source_name = 'customers', @role_name = NULL, @supports_net_changes = 0;
GO
EOF

Start the connector:

curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d@- << 'EOF'
{
    "name": "inventory-connector",
    "config": {
        "connector.class" : "io.debezium.connector.sqlserver.SqlServerConnector",
        "tasks.max" : "2",
        "database.server.name" : "server1",
        "database.hostname" : "sqlserver",
        "database.port" : "1433",
        "database.user" : "sa",
        "database.password" : "Password!",
        "database.names" : "testDB1,testDB2",
        "database.history.kafka.bootstrap.servers" : "kafka:9092",
        "database.history.kafka.topic": "schema-changes.inventory"
    }
}
EOF

Check that the connector and both tasks are running:

curl -s http://localhost:8083/connectors/inventory-connector/status

Check that each task exposes metrics for its subset of databases:

Run a consumer and confirm that both databases were snapshotted:

docker-compose -f docker-compose-sqlserver.yaml exec kafka /kafka/bin/kafka-console-consumer.sh \
  --bootstrap-server kafka:9092 \
  --from-beginning \
  --property print.key=true \
  --whitelist 'server1\.(testDB\d+)\..*'

Make changes in both databases and confirm that the changes from both databases are captured:

docker-compose -f docker-compose-sqlserver.yaml exec -T sqlserver bash -c '/opt/mssql-tools/bin/sqlcmd -U sa -P $SA_PASSWORD' << 'EOF'
UPDATE testDB1.dbo.products SET weight = 3.15 WHERE id = 101;
UPDATE testDB2.dbo.customers SET first_name = 'Molly' WHERE id = 1001;
EOF

github-actions · 2022-02-23T23:24:14Z

Welcome as a new contributor to Debezium, @morozov. Reviewers, please add missing author name(s) and alias name(s) to the COPYRIGHT.txt and Aliases.txt respectively.

gunnarmorling · 2022-02-24T06:26:52Z

Thanks a lot, @morozov! Will take a loop asap.

@jpechane, as per Sergei's mail, could you suggest a few key tests which should be adapted (well, copied, I suppose) to ensure the multi-partition mode works?

gunnarmorling

Thanks a lot, @morozov! A few stylistic comments inline. Still needs to do some testing using Compose (thanks for sharing the instructions for that!).

...ver/src/main/java/io/debezium/connector/sqlserver/SqlServerChangeEventSourceCoordinator.java

...um-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerConnector.java

gunnarmorling · 2022-02-24T17:46:36Z

...um-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerConnector.java

@@ -90,8 +128,7 @@ protected void validateConnection(Map<String, ConfigValue> configValues, Configu
        final SqlServerConnectorConfig sqlServerConfig = new SqlServerConnectorConfig(config);

        if (Strings.isNullOrEmpty(sqlServerConfig.getDatabaseName())) {
-            throw new IllegalArgumentException("Either '" + SqlServerConnectorConfig.DATABASE_NAME
-                    + "' or '" + SqlServerConnectorConfig.DATABASE_NAMES
+            throw new IllegalArgumentException("Either '" + DATABASE_NAME + "' or '" + DATABASE_NAMES


Rather a configuration error should be added (see connection validation failure in this method below).

There is actually already validation of both parameters:

For DATABASE_NAME:

debezium/debezium-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerConnectorConfig.java

Line 235 in 2d14b5b

.withValidation(SqlServerConnectorConfig::validateDatabaseName);

And for DATABASE_NAMES:

debezium/debezium-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerConnectorConfig.java

Line 243 in 2d14b5b

.withValidation(SqlServerConnectorConfig::validateDatabaseNames)

So we just need to early return if any of these fields is invalid as it was done prior to #2604:

debezium/debezium-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerConnector.java

Lines 82 to 85 in ade15cd

final ConfigValue databaseValue = configValues.get(RelationalDatabaseConnectorConfig.DATABASE_NAME.name());

if (!databaseValue.errorMessages().isEmpty()) {

return;

}

gunnarmorling · 2022-02-24T17:47:19Z

...um-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/SqlServerPartition.java

+
+            List<String> databaseNames;
+
+            if (multiPartitionMode) {


…connector Co-authored-by: Sergei Morozov <morozov@tut.by>

morozov · 2022-02-28T17:17:07Z

@gunnarmorling I addressed your review comments and added a basic snapshotting/streaming test. Please take a look.

gunnarmorling

LGTM, merging.

@morozov, is there anything else missing before we can declare multi-partitioning mode as functional? I.e. could we announce it as a new feature in the Beta1 release with this PR in?

morozov · 2022-03-01T17:38:41Z

There are a couple of minor things left:

Add task id and partition to logging context. Originally implemented in DBZ-2975 - Add task id and partition to logging context sugarcrm/debezium#80, the current state is sugarcrm@e3774be. It may need some minor tweaking to make it work with DBZ-2224 Test logging based on logback #3103.

The partition-scoped CapturedTables metric will likely expose all tables captured by the task, not only the ones that belong to the corresponding partition:

debezium/debezium-connector-sqlserver/src/main/java/io/debezium/connector/sqlserver/metrics/SqlServerStreamingPartitionMetrics.java

Lines 26 to 29 in 2d14b5b

    
           @Override 
        
           public String[] getCapturedTables() { 
        
               return streamingMeter.getCapturedTables(); 
        
           }

debezium/debezium-core/src/main/java/io/debezium/pipeline/meters/StreamingMeter.java

Lines 43 to 46 in 2d14b5b

    
           @Override 
        
           public String[] getCapturedTables() { 
        
               return taskContext.capturedDataCollections(); 
        
           }

We haven't attempted to address or even reproduce this issue since we're not using this metric. Some of our MySQL connectors capture a couple of hundred thousand tables.

And this is it, there are no more changes related to the multi-partition mode in our fork. I will file Jira cases for each of them but I believe it shouldn't block the announcement.

UPD: DBZ-4808, DBZ-4809.

gunnarmorling · 2022-03-04T11:28:48Z

Thx a lot for logging these follow-up, issues, @morozov! Also a big thank you for that awesome PR description, it just came in super-handy for creating a screenshot of the metrics in JMC.

DBZ-4783: Manage change tables per partition

0cb49e3

DBZ-4783: Add task id label to metrics

182bd62

morozov marked this pull request as ready for review February 24, 2022 00:52

morozov mentioned this pull request Feb 24, 2022

DBZ-4541: Promote failure to register metrics to exception #3119

Merged

gunnarmorling reviewed Feb 24, 2022

View reviewed changes

sugarcrm-jgminder and others added 3 commits February 26, 2022 10:24

DBZ-4783: Support for multiple databases and tasks in the SQL Server …

456d7d9

…connector Co-authored-by: Sergei Morozov <morozov@tut.by>

DBZ-4783: Refactoring and cleanup

d855aae

DBZ-4783: Add a multi-partition integration test

6833ff2

gunnarmorling approved these changes Mar 1, 2022

View reviewed changes

gunnarmorling merged commit 2952b7a into debezium:main Mar 1, 2022

morozov deleted the DBZ-4783 branch March 1, 2022 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DBZ-4783: Support for multiple databases and tasks in the SQL Server connector #3261

DBZ-4783: Support for multiple databases and tasks in the SQL Server connector #3261

morozov commented Feb 23, 2022 •

edited

github-actions bot commented Feb 23, 2022

gunnarmorling commented Feb 24, 2022

gunnarmorling left a comment

gunnarmorling Feb 24, 2022

morozov Feb 26, 2022

gunnarmorling Feb 24, 2022

morozov commented Feb 28, 2022

gunnarmorling left a comment

morozov commented Mar 1, 2022 •

edited

gunnarmorling commented Mar 4, 2022

	final ConfigValue databaseValue = configValues.get(RelationalDatabaseConnectorConfig.DATABASE_NAME.name());
	if (!databaseValue.errorMessages().isEmpty()) {
	return;
	}

DBZ-4783: Support for multiple databases and tasks in the SQL Server connector #3261

DBZ-4783: Support for multiple databases and tasks in the SQL Server connector #3261

Conversation

morozov commented Feb 23, 2022 • edited

Manual testing of the multi-database and multi-task configuration

github-actions bot commented Feb 23, 2022

gunnarmorling commented Feb 24, 2022

gunnarmorling left a comment

Choose a reason for hiding this comment

gunnarmorling Feb 24, 2022

Choose a reason for hiding this comment

morozov Feb 26, 2022

Choose a reason for hiding this comment

gunnarmorling Feb 24, 2022

Choose a reason for hiding this comment

morozov commented Feb 28, 2022

gunnarmorling left a comment

Choose a reason for hiding this comment

morozov commented Mar 1, 2022 • edited

gunnarmorling commented Mar 4, 2022

morozov commented Feb 23, 2022 •

edited

morozov commented Mar 1, 2022 •

edited