[HUDI-6326] Flink write supports consistent bucket index #8896

beyond1920 · 2023-06-07T09:40:10Z

Change Logs

This pr is the first subtask of HUDI-4373.
It focuses on 2 things:

Refactor the code of consistent hashing bucket index, extract common utility to client common module
Flink write progress support consistent hashing bucket index
It would not cover (would be done in the following subtasks):
generate resize plan
resolve the case which the resize happen during the write process

ps: This work is follow up of pr6737. Thanks for contribution @YuweiXiao

Impact

NA

Risk level (write none, low medium or high below)

NA

Documentation Update

All documents update would be introduced in the final subtask of HUDI-4373.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

danny0405 · 2023-06-08T03:54:57Z

Thanks for the contribution @beyond1920 , I have reviewed and attached a patch:
6236.patch.zip

danny0405 · 2023-06-08T03:56:43Z

...udi-flink/src/main/java/org/apache/hudi/sink/bucket/ConsistentBucketStreamWriteFunction.java

+    record.unseal();
+    record.setCurrentLocation(nodeToRecordLocation.computeIfAbsent(node,
+        n -> new HoodieRecordLocation("U", FSUtils.createNewFileId(n.getFileIdPrefix(), 0))));
+    record.seal();


Can you elaborate a little more why the writiting bucket is always UPDATE (U stands for an update bucket), and what the constant 0 represents here?

Can you elaborate a little more why the writiting bucket is always UPDATE (U stands for an update bucket)

For consistent bucket index, which file group that each record should be written into is already decided by HoodieConsistentHashingMetadata of the given partition.
If we don't need to know whether the file group exists or not, then we could skip load data path from storage which is time consuming.

what the constant 0 represents here.

Nothing. In fact for consistent bucket index, each bucket could only map to one file group. I think it's ok using n.getFileIdPrefix() directly.
Keep the 0 here just in order to keep the same logical with that in spark adapter to Consistent bucket index.

hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/ITTestDataStreamWrite.java

…etadata for a new partition

beyond1920 · 2023-06-13T01:31:42Z

@hudi-bot run azure

hudi-bot · 2023-06-13T06:35:58Z

CI report:

019919f Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

danny0405 · 2023-06-14T04:48:42Z

hudi-common/src/main/java/org/apache/hudi/common/model/HoodieConsistentHashingMetadata.java

+   *  - part2/part3       -> 0000-0000-0000-00part2part3
+   *  - part1/part2/part3 -> 0000-0000-0par-t1part2part3
+   *
+   *  @VisibleForTesting


Is the method visible?

beyond1920 · 2023-06-19T12:46:57Z

The pr is replaced by #9012, close this one.

beyond1920 force-pushed the flink-chi-subtask1 branch 2 times, most recently from 6432438 to e7680ee Compare June 7, 2023 10:24

danny0405 reviewed Jun 8, 2023

View reviewed changes

hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/sink/ITTestDataStreamWrite.java Outdated Show resolved Hide resolved

danny0405 self-assigned this Jun 8, 2023

danny0405 added release-0.14.0 flink Issues related to flink writer-core Issues relating to core transactions/write actions labels Jun 8, 2023

beyond1920 force-pushed the flink-chi-subtask1 branch from e7680ee to 280e58f Compare June 12, 2023 12:41

beyond1920 added 2 commits June 13, 2023 00:51

[HUDI-6326] Flink write supports consistent bucket index

6dd000f

Apply Danny's patch

376f784

beyond1920 force-pushed the flink-chi-subtask1 branch 2 times, most recently from 29d4900 to 1f2bf75 Compare June 13, 2023 01:20

Avoid create different file id when multiple subtasks try to create m…

bb5cf50

…etadata for a new partition

beyond1920 force-pushed the flink-chi-subtask1 branch from 1f2bf75 to 2cf8f74 Compare June 13, 2023 01:30

Add testcases

019919f

beyond1920 force-pushed the flink-chi-subtask1 branch from 2cf8f74 to 019919f Compare June 13, 2023 02:46

danny0405 reviewed Jun 14, 2023

View reviewed changes

beyond1920 closed this Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-6326] Flink write supports consistent bucket index #8896

[HUDI-6326] Flink write supports consistent bucket index #8896

beyond1920 commented Jun 7, 2023 •

edited

Loading

danny0405 commented Jun 8, 2023

danny0405 Jun 8, 2023

beyond1920 Jun 12, 2023 •

edited

Loading

beyond1920 Jun 12, 2023 •

edited

Loading

beyond1920 commented Jun 13, 2023

hudi-bot commented Jun 13, 2023

danny0405 Jun 14, 2023

beyond1920 commented Jun 19, 2023

[HUDI-6326] Flink write supports consistent bucket index #8896

[HUDI-6326] Flink write supports consistent bucket index #8896

Conversation

beyond1920 commented Jun 7, 2023 • edited Loading

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

danny0405 commented Jun 8, 2023

danny0405 Jun 8, 2023

Choose a reason for hiding this comment

beyond1920 Jun 12, 2023 • edited Loading

Choose a reason for hiding this comment

beyond1920 Jun 12, 2023 • edited Loading

Choose a reason for hiding this comment

beyond1920 commented Jun 13, 2023

hudi-bot commented Jun 13, 2023

CI report:

danny0405 Jun 14, 2023

Choose a reason for hiding this comment

beyond1920 commented Jun 19, 2023

beyond1920 commented Jun 7, 2023 •

edited

Loading

beyond1920 Jun 12, 2023 •

edited

Loading

beyond1920 Jun 12, 2023 •

edited

Loading