-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-6326] Flink write supports consistent bucket index #8896
Conversation
6432438
to
e7680ee
Compare
Thanks for the contribution @beyond1920 , I have reviewed and attached a patch: |
record.unseal(); | ||
record.setCurrentLocation(nodeToRecordLocation.computeIfAbsent(node, | ||
n -> new HoodieRecordLocation("U", FSUtils.createNewFileId(n.getFileIdPrefix(), 0)))); | ||
record.seal(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate a little more why the writiting bucket is always UPDATE
(U
stands for an update bucket), and what the constant 0
represents here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate a little more why the writiting bucket is always UPDATE (U stands for an update bucket)
For consistent bucket index, which file group that each record should be written into is already decided by HoodieConsistentHashingMetadata
of the given partition.
If we don't need to know whether the file group exists or not, then we could skip load data path from storage which is time consuming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what the constant
0
represents here.
Nothing. In fact for consistent bucket index, each bucket could only map to one file group. I think it's ok using n.getFileIdPrefix() directly.
Keep the 0 here just in order to keep the same logical with that in spark adapter to Consistent bucket index.
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/ITTestDataStreamWrite.java
Outdated
Show resolved
Hide resolved
e7680ee
to
280e58f
Compare
29d4900
to
1f2bf75
Compare
…etadata for a new partition
1f2bf75
to
2cf8f74
Compare
@hudi-bot run azure |
2cf8f74
to
019919f
Compare
* - part2/part3 -> 0000-0000-0000-00part2part3 | ||
* - part1/part2/part3 -> 0000-0000-0par-t1part2part3 | ||
* | ||
* @VisibleForTesting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the method visible?
The pr is replaced by #9012, close this one. |
Change Logs
This pr is the first subtask of HUDI-4373.
It focuses on 2 things:
It would not cover (would be done in the following subtasks):
ps: This work is follow up of pr6737. Thanks for contribution @YuweiXiao
Impact
NA
Risk level (write none, low medium or high below)
NA
Documentation Update
All documents update would be introduced in the final subtask of HUDI-4373.
Contributor's checklist