Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-6326] Flink write supports consistent bucket index #8896

Closed
wants to merge 4 commits into from

Conversation

beyond1920
Copy link
Contributor

@beyond1920 beyond1920 commented Jun 7, 2023

Change Logs

This pr is the first subtask of HUDI-4373.
It focuses on 2 things:

  1. Refactor the code of consistent hashing bucket index, extract common utility to client common module
  2. Flink write progress support consistent hashing bucket index
    It would not cover (would be done in the following subtasks):
  3. generate resize plan
  4. resolve the case which the resize happen during the write process

ps: This work is follow up of pr6737. Thanks for contribution @YuweiXiao

Impact

NA

Risk level (write none, low medium or high below)

NA

Documentation Update

All documents update would be introduced in the final subtask of HUDI-4373.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@beyond1920 beyond1920 force-pushed the flink-chi-subtask1 branch 2 times, most recently from 6432438 to e7680ee Compare June 7, 2023 10:24
@danny0405
Copy link
Contributor

Thanks for the contribution @beyond1920 , I have reviewed and attached a patch:
6236.patch.zip

record.unseal();
record.setCurrentLocation(nodeToRecordLocation.computeIfAbsent(node,
n -> new HoodieRecordLocation("U", FSUtils.createNewFileId(n.getFileIdPrefix(), 0))));
record.seal();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate a little more why the writiting bucket is always UPDATE (U stands for an update bucket), and what the constant 0 represents here?

Copy link
Contributor Author

@beyond1920 beyond1920 Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate a little more why the writiting bucket is always UPDATE (U stands for an update bucket)

For consistent bucket index, which file group that each record should be written into is already decided by HoodieConsistentHashingMetadata of the given partition.
If we don't need to know whether the file group exists or not, then we could skip load data path from storage which is time consuming.

Copy link
Contributor Author

@beyond1920 beyond1920 Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what the constant 0 represents here.

Nothing. In fact for consistent bucket index, each bucket could only map to one file group. I think it's ok using n.getFileIdPrefix() directly.
Keep the 0 here just in order to keep the same logical with that in spark adapter to Consistent bucket index.

@danny0405 danny0405 self-assigned this Jun 8, 2023
@danny0405 danny0405 added release-0.14.0 flink Issues related to flink writer-core Issues relating to core transactions/write actions labels Jun 8, 2023
@beyond1920 beyond1920 force-pushed the flink-chi-subtask1 branch 2 times, most recently from 29d4900 to 1f2bf75 Compare June 13, 2023 01:20
@beyond1920
Copy link
Contributor Author

@hudi-bot run azure

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

* - part2/part3 -> 0000-0000-0000-00part2part3
* - part1/part2/part3 -> 0000-0000-0par-t1part2part3
*
* @VisibleForTesting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the method visible?

@beyond1920
Copy link
Contributor Author

The pr is replaced by #9012, close this one.

@beyond1920 beyond1920 closed this Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flink Issues related to flink release-0.14.0 writer-core Issues relating to core transactions/write actions
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants