Conversation
| HoodieClusteringConfig clusteringConfig = HoodieClusteringConfig.newBuilder().withClusteringMaxNumGroups(10) | ||
| .withClusteringTargetPartitions(0).withInlineClusteringNumCommits(1) | ||
| .withClusteringUpdatesStrategy("org.apache.hudi.client.clustering.update.strategy.SparkAllowUpdateStrategy") | ||
| .withRollbackPendingClustering(true) |
There was a problem hiding this comment.
this change can be ignored.. i was trying to check whether explicitly setting rollback pending clustering reverts the inflight replacecommit or not. But, this config does not have any effect.
|
|
||
| // Do a rollback on the replacecommit that is failed | ||
| clusteringWriteClient.rollback(clusteringCommitTime); | ||
| // clusteringWriteClient.rollback(clusteringCommitTime); |
There was a problem hiding this comment.
as per my understanding, there should not be a need to rollback the clustering commit explicitly.. it should be done automatically if a conflict is detected by the new strategy. But, both replacecommit inflight and requested remain in the timeline.
There was a problem hiding this comment.
Actually, rollback of these clustering commits is handled separately, so it will leave out the inflight in the timeline. We use SparkAllowUpdateStrategy so only those cases you are allowed to use IngestionPrimaryWriterBasedConflictResolutionStrategy, we are using couple of approaches to clean these inflights, one by explicitly assigning a rollback for the failed commit and another approach is by including replacecommits as part of rollbackFailedWrites that way ingestion takes care of clearing them.
I think we need to make immutable nature of clustering commits as a table property i.e. store it in hoodie.properties. That way ingestion knows whether a clustering commits can be rolled back or not and accordingly it can either use SparkRejectUpdateStrategy or SparkAllowUpdateStrategy implementations and cleanup can. be done separately.
|
@codope is this PR still needed? |
|
Closing this PR. @codope @suryaprasanna Feel free to reopen this if needed. |
Change Logs
Describe context and summary for this change. Highlight if any code was copied.
Impact
Describe any public API or user-facing feature change or any performance impact.
Risk level (write none, low medium or high below)
If medium or high, explain what verification was done to mitigate the risks.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist