Skip to content

feat: add Flink source reader function for cdc splits#18361

Merged
danny0405 merged 4 commits intoapache:masterfrom
HuangZhenQiu:cdc-split-reader
Mar 27, 2026
Merged

feat: add Flink source reader function for cdc splits#18361
danny0405 merged 4 commits intoapache:masterfrom
HuangZhenQiu:cdc-split-reader

Conversation

@HuangZhenQiu
Copy link
Copy Markdown
Collaborator

@HuangZhenQiu HuangZhenQiu commented Mar 21, 2026

Describe the issue this Pull Request addresses

Support read from cdc splits in Hudi Flink Source V2

Close #18020

Summary and Changelog

  1. Add HoodieCdcSourceSplit
  2. Add HoodieCdcSplitReaderFunction for read from HoodieCdcSourceSplit with Hudi APIs
  3. Integrate CDC mode in HoodieTableSource for flink source v2
  4. Add test cases for new classes

Impact

none

Risk Level

none

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label Mar 21, 2026
Copy link
Copy Markdown
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good overall, cc @cshuo to take another look

@HuangZhenQiu HuangZhenQiu force-pushed the cdc-split-reader branch 4 times, most recently from c026cc5 to 315776c Compare March 23, 2026 05:18
Copy link
Copy Markdown
Collaborator

@cshuo cshuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thks for contribution. Left a comment.

Copy link
Copy Markdown
Collaborator

@cshuo cshuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found several discrepancies between CdcInputFormat and HoodieCdcSplitReaderFunction.

@HuangZhenQiu HuangZhenQiu force-pushed the cdc-split-reader branch 2 times, most recently from aba869f to ddc24b5 Compare March 25, 2026 05:37
@HuangZhenQiu HuangZhenQiu force-pushed the cdc-split-reader branch 2 times, most recently from 6012c92 to e76ad65 Compare March 25, 2026 18:39
@HuangZhenQiu
Copy link
Copy Markdown
Collaborator Author

@cshuo @danny0405 Thanks for the kind review.

@cshuo
Copy link
Copy Markdown
Collaborator

cshuo commented Mar 26, 2026

+1

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 15.36842% with 402 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.17%. Comparing base (69fa35b) to head (05b0641).

Files with missing lines Patch % Lines
.../reader/function/HoodieCdcSplitReaderFunction.java 3.94% 385 Missing and 5 partials ⚠️
.../java/org/apache/hudi/table/HoodieTableSource.java 16.66% 10 Missing ⚠️
...hudi/source/split/HoodieSourceSplitSerializer.java 93.54% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18361      +/-   ##
============================================
- Coverage     68.37%   68.17%   -0.21%     
- Complexity    27573    27596      +23     
============================================
  Files          2433     2435       +2     
  Lines        133268   133743     +475     
  Branches      16034    16073      +39     
============================================
+ Hits          91122    91178      +56     
- Misses        35093    35500     +407     
- Partials       7053     7065      +12     
Flag Coverage Δ
common-and-other-modules 44.24% <15.36%> (-0.11%) ⬇️
hadoop-mr-java-client 45.15% <ø> (-0.01%) ⬇️
spark-client-hadoop-common 48.57% <ø> (ø)
spark-java-tests 48.70% <ø> (-0.05%) ⬇️
spark-scala-tests 45.38% <ø> (+<0.01%) ⬆️
utilities 38.53% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...apache/hudi/source/split/HoodieCdcSourceSplit.java 100.00% <100.00%> (ø)
.../hudi/source/split/HoodieContinuousSplitBatch.java 97.36% <100.00%> (+12.75%) ⬆️
.../hudi/table/format/mor/MergeOnReadInputFormat.java 90.62% <ø> (ø)
...e/hudi/table/format/mor/MergeOnReadTableState.java 100.00% <100.00%> (ø)
...hudi/source/split/HoodieSourceSplitSerializer.java 98.36% <93.54%> (-1.64%) ⬇️
.../java/org/apache/hudi/table/HoodieTableSource.java 57.38% <16.66%> (-1.42%) ⬇️
.../reader/function/HoodieCdcSplitReaderFunction.java 3.94% <3.94%> (ø)

... and 17 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit f15e1d0 into apache:master Mar 27, 2026
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL PR with lines of changes > 1000

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support Flink Hudi CDC reader function

6 participants