-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-6328] Flink support generate resize plan for consistent bucket index #9030
Conversation
12d0ac3
to
7c9c54e
Compare
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java
Outdated
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieClusteringConfig.java
Outdated
Show resolved
Hide resolved
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/ClusteringUtil.java
Outdated
Show resolved
Hide resolved
|
||
Configuration conf = getDefaultConfiguration(); | ||
conf.setBoolean(FlinkOptions.BUCKET_CLUSTERING_SORT_ENABLED, true); | ||
HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure when to trigger the plan scheduling in Flink pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existed StreamWriteOperatorCoordinator#scheduleTableServices
already cover this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the user should config the plan strategy differently then? Do they need manual config options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
If the index is a consistent bucket index, the value of hoodie.clustering.plan.strategy.class
must be org.apache.hudi.client.clustering.plan.strategy.FlinkConsistentBucketClusteringPlanStrategy
. hoodie.bucket.clustering.merge.enabled
must be set to false
.
If the index is a consistent bucket index, these two configuration values will be automatically set by the engine. The engine will also check the parameter values before submitting the job to ensure they meet the expected criteria and prevent user errors.
The only advanced configure that users might need to consider is hoodie.bucket.clustering.sort.enabled
, which controls whether to generate regular clustering plans for buckets that are not involved in merge or split within the consistent hashing bucket index clustering plan. The default value is false
to avoid unnecessary clustering services. If users need to, they can enable this configure.
There are some cases in ITTestFlinkConsistentHashingClustering
, you could find the use don't need manual config options.
891553a
to
9d72420
Compare
2. Improve validation logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I'm fine with the change, let's see the test outputs.
Change Logs
This pr is the second subtask of HUDI-4373.
It focuses on generating resize plan.
It would not cover resolving the resizing cases in the write pipelines, would be done in the following subtasks.
ps: This work is follow up of #6737. Thanks for contribution @YuweiXiao
Impact
NA
Risk level (write none, low medium or high below)
NA
Documentation Update
All documents update would be introduced in the final subtask of HUDI-4373.
Contributor's checklist