feat(flink): Support data skipping based on column stats for source V2#18706
Conversation
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR wires ColumnStatsProbe from HoodieTableSource through HoodieScanContext into the Source V2 FileIndex, enabling existing column-stats-based data skipping on the V2 path. The change is small and the test updates exercise both the partition stats and column stats pruning paths. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18706 +/- ##
============================================
- Coverage 68.14% 68.08% -0.07%
+ Complexity 29077 29040 -37
============================================
Files 2522 2522
Lines 141177 141179 +2
Branches 17514 17514
============================================
- Hits 96208 96116 -92
- Misses 37061 37147 +86
- Partials 7908 7916 +8
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Describe the issue this Pull Request addresses
Flink Source V2 did not propagate the pushed-down
ColumnStatsProbeinto itsFileIndex, so column stats based data skipping was not applied on the Source V2 path even when filter pushdown and data skipping were enabled.This PR wires the existing column stats pruning context through Source V2 so it can use the same
FileStatsIndexpruning path already available to the file index, fixes #18703Summary and Changelog
ColumnStatsProbetoHoodieScanContext.columnStatsProbefromHoodieTableSourceinto the Source V2 scan context.ColumnStatsProbeintoFileIndexfromHoodieSource.buildFileIndex().TestHoodieSourcecoverage to validate Source V2 split pruning with partition stats and column stats paths.Impact
Risk Level
low
Documentation Update
Contributor's checklist