Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBZ-6731 Blocking snapshot takes configuration from signal payload #4741

Merged
merged 1 commit into from
Aug 15, 2023

Conversation

mfvitale
Copy link
Member

@mfvitale mfvitale commented Aug 3, 2023

@mfvitale mfvitale force-pushed the DBZ-6731 branch 3 times, most recently from 44dac1a to bb6c592 Compare August 7, 2023 14:20
@@ -1482,25 +1482,25 @@ public static <T> T querySingleValue(Connection connection, String queryString,
}
}

public String buildSelectWithRowLimits(TableId tableId, int limit, String projection, Optional<String> condition,
Optional<String> additionalCondition, String orderBy) {
public String buildSelectWithRowLimits(TableId tableId, int limit, String projection, String condition,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfvitale Why is this change needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removed the Optional as parameters since it's not a good practice. Now, in case of missing value is simply empty ""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I agree on that one. We use it in the codebase elsewhere as well. The Optional is a way how to conway the message in code that value might be present. Even if originally it was told that Optional should be mostly return value there are differing opinions on that rule too.

Copy link
Member Author

@mfvitale mfvitale Aug 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main "issue" (it's more about clean code) with Optional in the parameter is that it causes conditional logic inside methods, like passing boolean. This means that the method will have two different flow inside and so doing too much things. Leveraging on method overloading or method with different name can be better.

But it is just a different opinion on that. We can discuss it apart.

@@ -39,11 +38,11 @@ public T getId() {
return id;
}

public Optional<String> getAdditionalCondition() {
public String getAdditionalCondition() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In these cases the Optional is valid even in the strict sense

public SnapshottingTask getSnapshottingTask(MongoDbPartition partition, MongoDbOffsetContext offsetContext) {

Set<Pattern> dataCollectionsToBeSnapshotted = connectorConfig.getDataCollectionsToBeSnapshotted();
Map<TableId, String> snapshotSelectOverridesByTable = Map.of();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use better name as Mongo does not use neither select nor tables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately, your comment revealed a missing. I was convinced that additional conditions was not supported at all but this is not the case. It is unsupported for incremental snapshot but is supported for initial snapshot.

if (isBlockingSnapshot) {
return new MongoDbSnapshottingTask(replicaSets.all());
}
Map<TableId, String> filtersByTable = snapshotConfiguration.getAdditionalConditions().stream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableId is relational, it should not leak into MongoDB.

protected static final String FIELD_SURROGATE_KEY = "surrogate-key";
protected static final String FIELD_FILTER = "filter";

protected static Set<Pattern> getDataCollectionPatterns(List<String> dataCollections) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doe it mean that the user cannot prescribe order in which tables will be snapshotted?

Copy link
Member Author

@mfvitale mfvitale Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there was different behavior:

  • initial snapshot: the snapshot.include.collection.list is converted to a Set<Pattern> so the order is no preserved and the order is eventually controlled by snapshot.tables.order.by.row.count property. In this case the Set prevent same table to be scanned multiple times (for wrong patterns).

  • Incremental snapshot: the data-collections property is a list of string (but they are patterns), they are expanded in order. For wrong pattern this can lead to snapshot a table multiple times.

Is it correct that we want to maintain this behaviors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please dig a bit deeper into initial snapshot? It seems to me that the order is driven by the table.include.list property.
For the increment snapshot - yes deduplication of matchin tables should be done - could you please create a se parate Jira for it?

Copy link
Member Author

@mfvitale mfvitale Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please dig a bit deeper into initial snapshot? It seems to me that the order is driven by the table.include.list property.

private void determineCapturedTables(RelationalSnapshotContext<P, O> ctx, Set<Pattern> dataCollectionsToBeSnapshotted) throws Exception {
        Set<TableId> allTableIds = getAllTableIds(ctx);
        Set<TableId> snapshottedTableIds = determineDataCollectionsToBeSnapshotted(allTableIds, dataCollectionsToBeSnapshotted).collect(Collectors.toSet());

        Set<TableId> capturedTables = new HashSet<>();
        Set<TableId> capturedSchemaTables = new HashSet<>();

        for (TableId tableId : allTableIds) {
            if (connectorConfig.getTableFilters().eligibleForSchemaDataCollectionFilter().isIncluded(tableId)) {
                LOGGER.info("Adding table {} to the list of capture schema tables", tableId);
                capturedSchemaTables.add(tableId);
            }
        }

        for (TableId tableId : snapshottedTableIds) {
            if (connectorConfig.getTableFilters().dataCollectionFilter().isIncluded(tableId)) {
                LOGGER.trace("Adding table {} to the list of captured tables for which the data will be snapshotted", tableId);
                capturedTables.add(tableId);
            }
            else {
                LOGGER.trace("Ignoring table {} for data snapshotting as it's not included in the filter configuration", tableId);
            }
        }

        ctx.capturedTables = addSignalingCollectionAndSort(capturedTables);
        ctx.capturedSchemaTables = capturedSchemaTables
                .stream()
                .sorted()
                .collect(Collectors.toCollection(LinkedHashSet::new));
    }

All tables are read from the database and then filtered by the pattern specified in the snapshot.include.collection.list properties, if provided. Otherwise all tables will be returned. After that, only tables that are included in table.include.list will be used for the snapshot.

For the increment snapshot - yes deduplication of matchin tables should be done - could you please create a se parate Jira for it?

https://issues.redhat.com/browse/DBZ-6787

Copy link
Contributor

@jpechane jpechane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfvitale Overall LGTM. My main comments are

  • Removal of Optional - we can discuss it later if'd wish but IMHO it is out of scope of this PR and make it quite hard to review. If we decide to drop it then it should be a separate PR
  • MongoDB - it seems that relational aspects are leaking to it
  • Plase make sure the user can prescribe the order of tables for snapshot

@mfvitale mfvitale force-pushed the DBZ-6731 branch 2 times, most recently from 9d9b03d to fdbbd09 Compare August 9, 2023 09:21
@jpechane jpechane merged commit 542b361 into debezium:main Aug 15, 2023
26 of 29 checks passed
@jpechane
Copy link
Contributor

@mfvitale Applied, great work, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants