Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-6328] Flink support generate resize plan for consistent bucket index #9030

Merged
merged 2 commits into from
Jun 26, 2023

Conversation

beyond1920
Copy link
Contributor

@beyond1920 beyond1920 commented Jun 21, 2023

Change Logs

This pr is the second subtask of HUDI-4373.
It focuses on generating resize plan.
It would not cover resolving the resizing cases in the write pipelines, would be done in the following subtasks.

ps: This work is follow up of #6737. Thanks for contribution @YuweiXiao

Impact

NA

Risk level (write none, low medium or high below)

NA

Documentation Update

All documents update would be introduced in the final subtask of HUDI-4373.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed


Configuration conf = getDefaultConfiguration();
conf.setBoolean(FlinkOptions.BUCKET_CLUSTERING_SORT_ENABLED, true);
HoodieFlinkWriteClient writeClient = FlinkWriteClients.createWriteClient(conf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure when to trigger the plan scheduling in Flink pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existed StreamWriteOperatorCoordinator#scheduleTableServices already cover this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the user should config the plan strategy differently then? Do they need manual config options?

Copy link
Contributor Author

@beyond1920 beyond1920 Jun 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.
If the index is a consistent bucket index, the value of hoodie.clustering.plan.strategy.class must be org.apache.hudi.client.clustering.plan.strategy.FlinkConsistentBucketClusteringPlanStrategy. hoodie.bucket.clustering.merge.enabled must be set to false.
If the index is a consistent bucket index, these two configuration values will be automatically set by the engine. The engine will also check the parameter values before submitting the job to ensure they meet the expected criteria and prevent user errors.
The only advanced configure that users might need to consider is hoodie.bucket.clustering.sort.enabled, which controls whether to generate regular clustering plans for buckets that are not involved in merge or split within the consistent hashing bucket index clustering plan. The default value is false to avoid unnecessary clustering services. If users need to, they can enable this configure.
There are some cases in ITTestFlinkConsistentHashingClustering, you could find the use don't need manual config options.

2. Improve validation logic
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 closed this Jun 26, 2023
@danny0405 danny0405 reopened this Jun 26, 2023
Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I'm fine with the change, let's see the test outputs.

@danny0405
Copy link
Contributor

@danny0405 danny0405 merged commit dbc0b43 into apache:master Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants