Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-6329] Adjust the partitioner automatically for flink consistent hashing index #9087

Merged
merged 4 commits into from
Jul 5, 2023

Conversation

beyond1920
Copy link
Contributor

Change Logs

This pr is the third subtask of HUDI-4373.
It focuses on resolving the resizing cases in the write pipelines. It would detect whether clustering service occurs and automatically adjust the partitioner and write function if clustering service happens.

ps: This work is follow up of #6737. Thanks for contribution @YuweiXiao

Impact

NA

Risk level (write none, low medium or high below)

NA

Documentation Update

All documents update would be introduced in the final subtask of HUDI-4373.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

…uld detect whether clustering service occurs and automatically adjust the partitioner and write function if clustering service happens.
HoodieFlinkTable table = writeClient.getHoodieTable();
Option<HoodieInstant> latestPendingReplaceInstant = table.getActiveTimeline().filterPendingReplaceTimeline().lastInstant();
if (latestPendingReplaceInstant.isPresent() && latestPendingReplaceInstant.get().getTimestamp().compareTo(lastRefreshInstant) > 0) {
LOG.info("Found new pending replacement commit. Last pending replacement commit is {}.", latestPendingReplaceInstant);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make sure this is a CLUSTERING instant, you may also needs to check the plan, because clustering and insert overwrite share the same kind of instant type: REPLACE_COMMIT. There is a util method can do this: ClusteringUtil#isClusteringInstant

Copy link
Contributor Author

@beyond1920 beyond1920 Jul 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I could add a check here, using ClusteringUtils#getPendingClusteringInstantTimes to filter out pending clustering instants.

… bucket index

2. Check replacement is clustering
@danny0405 danny0405 changed the title [HUDI-6329] Write pipelines for table with consistent bucket index would detect whether clustering service occurs and automatically adjust the partitioner and write function if clustering service happens. [HUDI-6329] Aadjust the partitioner automatically for flink consistent hashing index Jul 3, 2023
@danny0405 danny0405 self-assigned this Jul 3, 2023
@danny0405 danny0405 added the flink Issues related to flink label Jul 3, 2023
@danny0405
Copy link
Contributor

6329.patch.zip
Thanks for the contribution, I have reviewed and applied a patch.

private List<String> indexKeyFields;
private Map<String, Pair<String, ConsistentBucketIdentifier>> partitionToIdentifier;
private String lastRefreshInstant = HoodieTimeline.INIT_INSTANT_TS;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see lastRefreshInstant is used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to miss that, it is used to check whether there is new pending clustering request.

@hudi-bot
Copy link

hudi-bot commented Jul 4, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 changed the title [HUDI-6329] Aadjust the partitioner automatically for flink consistent hashing index [HUDI-6329] Adjust the partitioner automatically for flink consistent hashing index Jul 4, 2023
Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, fine with the change, may need more tests in production~

@danny0405
Copy link
Contributor

@danny0405 danny0405 merged commit e8b1ddd into apache:master Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flink Issues related to flink
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants