-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBZ-6731 Blocking snapshot takes configuration from signal payload #4741
Conversation
44dac1a
to
bb6c592
Compare
@@ -1482,25 +1482,25 @@ public static <T> T querySingleValue(Connection connection, String queryString, | |||
} | |||
} | |||
|
|||
public String buildSelectWithRowLimits(TableId tableId, int limit, String projection, Optional<String> condition, | |||
Optional<String> additionalCondition, String orderBy) { | |||
public String buildSelectWithRowLimits(TableId tableId, int limit, String projection, String condition, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfvitale Why is this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just removed the Optional
as parameters since it's not a good practice. Now, in case of missing value is simply empty ""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I agree on that one. We use it in the codebase elsewhere as well. The Optional
is a way how to conway the message in code that value might be present. Even if originally it was told that Optional should be mostly return value there are differing opinions on that rule too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main "issue" (it's more about clean code) with Optional in the parameter is that it causes conditional logic inside methods, like passing boolean. This means that the method will have two different flow inside and so doing too much things. Leveraging on method overloading or method with different name can be better.
But it is just a different opinion on that. We can discuss it apart.
@@ -39,11 +38,11 @@ public T getId() { | |||
return id; | |||
} | |||
|
|||
public Optional<String> getAdditionalCondition() { | |||
public String getAdditionalCondition() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In these cases the Optional
is valid even in the strict sense
public SnapshottingTask getSnapshottingTask(MongoDbPartition partition, MongoDbOffsetContext offsetContext) { | ||
|
||
Set<Pattern> dataCollectionsToBeSnapshotted = connectorConfig.getDataCollectionsToBeSnapshotted(); | ||
Map<TableId, String> snapshotSelectOverridesByTable = Map.of(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use better name as Mongo does not use neither select nor tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fortunately, your comment revealed a missing. I was convinced that additional conditions was not supported at all but this is not the case. It is unsupported for incremental snapshot but is supported for initial snapshot.
if (isBlockingSnapshot) { | ||
return new MongoDbSnapshottingTask(replicaSets.all()); | ||
} | ||
Map<TableId, String> filtersByTable = snapshotConfiguration.getAdditionalConditions().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableId
is relational, it should not leak into MongoDB.
protected static final String FIELD_SURROGATE_KEY = "surrogate-key"; | ||
protected static final String FIELD_FILTER = "filter"; | ||
|
||
protected static Set<Pattern> getDataCollectionPatterns(List<String> dataCollections) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doe it mean that the user cannot prescribe order in which tables will be snapshotted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems there was different behavior:
-
initial snapshot: the
snapshot.include.collection.list
is converted to aSet<Pattern>
so the order is no preserved and the order is eventually controlled bysnapshot.tables.order.by.row.count
property. In this case theSet
prevent same table to be scanned multiple times (for wrong patterns). -
Incremental snapshot: the
data-collections
property is a list of string (but they are patterns), they are expanded in order. For wrong pattern this can lead to snapshot a table multiple times.
Is it correct that we want to maintain this behaviors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please dig a bit deeper into initial snapshot? It seems to me that the order is driven by the table.include.list
property.
For the increment snapshot - yes deduplication of matchin tables should be done - could you please create a se parate Jira for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please dig a bit deeper into initial snapshot? It seems to me that the order is driven by the
table.include.list
property.
private void determineCapturedTables(RelationalSnapshotContext<P, O> ctx, Set<Pattern> dataCollectionsToBeSnapshotted) throws Exception {
Set<TableId> allTableIds = getAllTableIds(ctx);
Set<TableId> snapshottedTableIds = determineDataCollectionsToBeSnapshotted(allTableIds, dataCollectionsToBeSnapshotted).collect(Collectors.toSet());
Set<TableId> capturedTables = new HashSet<>();
Set<TableId> capturedSchemaTables = new HashSet<>();
for (TableId tableId : allTableIds) {
if (connectorConfig.getTableFilters().eligibleForSchemaDataCollectionFilter().isIncluded(tableId)) {
LOGGER.info("Adding table {} to the list of capture schema tables", tableId);
capturedSchemaTables.add(tableId);
}
}
for (TableId tableId : snapshottedTableIds) {
if (connectorConfig.getTableFilters().dataCollectionFilter().isIncluded(tableId)) {
LOGGER.trace("Adding table {} to the list of captured tables for which the data will be snapshotted", tableId);
capturedTables.add(tableId);
}
else {
LOGGER.trace("Ignoring table {} for data snapshotting as it's not included in the filter configuration", tableId);
}
}
ctx.capturedTables = addSignalingCollectionAndSort(capturedTables);
ctx.capturedSchemaTables = capturedSchemaTables
.stream()
.sorted()
.collect(Collectors.toCollection(LinkedHashSet::new));
}
All tables are read from the database and then filtered by the pattern specified in the snapshot.include.collection.list
properties, if provided. Otherwise all tables will be returned. After that, only tables that are included in table.include.list
will be used for the snapshot.
For the increment snapshot - yes deduplication of matchin tables should be done - could you please create a se parate Jira for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfvitale Overall LGTM. My main comments are
- Removal of
Optional
- we can discuss it later if'd wish but IMHO it is out of scope of this PR and make it quite hard to review. If we decide to drop it then it should be a separate PR - MongoDB - it seems that relational aspects are leaking to it
- Plase make sure the user can prescribe the order of tables for snapshot
9d9b03d
to
fdbbd09
Compare
@mfvitale Applied, great work, thanks! |
closes: https://issues.redhat.com/browse/DBZ-6731