[MINOR] Fix NPE in getInputFileSlices when RO path filter returns empty partition#18639
[MINOR] Fix NPE in getInputFileSlices when RO path filter returns empty partition#18639prashantwason wants to merge 1 commit into
Conversation
…tion When hoodie.datasource.read.file.index.list.file.statuses.using.ro.path.filter is enabled on a COW (or READ_OPTIMIZED) table that contains a partition with zero base files, querying the table throws NullPointerException inside Collectors.uniqKeysMapAccumulator. generatePartitionFileSlicesPostROTablePathFilter built its result map by iterating over the file list, so a partition with no files produced no entry. The caller getInputFileSlices then did Collectors.toMap(identity, cache::get) where cache::get returned null for the missing partition, and Collectors.toMap rejects null values via Objects.requireNonNull. Pre-populate the result map with empty file-slice lists for every input partition before processing files. This restores the contract already honored by filterFiles (the non-RO path), which iterates over partitions and naturally returns an entry per partition. Added a regression test in BaseHoodieTableFileIndexTest covering an input that mixes a partition with files and partitions with none. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR fixes an NPE in getInputFileSlices when the RO path filter optimization is enabled and a partition has no base files, by pre-populating the result map with empty file-slice lists for every input partition. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One minor inconsistency in the test; the production change and its comment are clean.
cc @yihua
| @Test | ||
| public void testGeneratePartitionFileSlicesPostROTablePathFilterIncludesEmptyPartitions() throws Exception { | ||
| BaseHoodieTableFileIndex fileIndex = mock(BaseHoodieTableFileIndex.class, | ||
| org.mockito.Mockito.CALLS_REAL_METHODS); |
There was a problem hiding this comment.
🤖 nit: mock on the line above is already statically imported, so it might be worth adding import static org.mockito.Mockito.CALLS_REAL_METHODS; and just writing CALLS_REAL_METHODS here for consistency.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| // Without this, the caller's Collectors.toMap(identity, cache::get) NPEs on empty partitions | ||
| // because cache.get returns null and toMap rejects null values. This matches the contract | ||
| // already honored by filterFiles, which iterates over partitions rather than over files. | ||
| partitions.forEach(p -> partitionToFileSlices.put(p, new ArrayList<>())); |
There was a problem hiding this comment.
+1, can we use Collecitons.emptyList instead of new ArrayList<>() ?
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18639 +/- ##
=========================================
Coverage 68.06% 68.07%
+ Complexity 28919 28911 -8
=========================================
Files 2518 2518
Lines 140570 140571 +1
Branches 17416 17419 +3
=========================================
+ Hits 95680 95692 +12
+ Misses 37033 37022 -11
Partials 7857 7857
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Describe the issue this Pull Request addresses
Closes #18638
When
hoodie.datasource.read.file.index.list.file.statuses.using.ro.path.filter=trueis set on a COW (orREAD_OPTIMIZED) table that contains a partition with no base files,BaseHoodieTableFileIndex.getInputFileSlicesthrowsNullPointerExceptioninsideCollectors.uniqKeysMapAccumulatorduring Spark query planning.Summary and Changelog
generatePartitionFileSlicesPostROTablePathFilter(added in #18136) builds its result map by iterating over the file list, so a partition with zero files produces no entry. The callergetInputFileSlicesthen doesCollectors.toMap(identity, cache::get)—cache::getreturnsnullfor the missing partition, andCollectors.toMaprejects null values viaObjects.requireNonNull.This change pre-populates the result map with empty file-slice lists for every input partition before processing files, restoring the contract already honored by
filterFiles(the non-RO path), which iterates over partitions rather than over files.Changes:
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java: pre-populatepartitionToFileSlicesingeneratePartitionFileSlicesPostROTablePathFilterso empty partitions appear in the returned map.hudi-common/src/test/java/org/apache/hudi/BaseHoodieTableFileIndexTest.java: regression testtestGeneratePartitionFileSlicesPostROTablePathFilterIncludesEmptyPartitionsexercising a mix of partitions with and without files.Impact
Fixes NPE on queries against COW (and
READ_OPTIMIZED) tables that have empty partitions when the RO path filter listing optimization is enabled. No behavior change when no partitions are empty or when the optimization is disabled (default).Risk Level
low
The fix is a one-line additive pre-population of a
HashMapthat downstream code already keys into. It strengthens the existing contract of the method without changing the file-filtering logic or the non-RO code path.Documentation Update
none
Contributor's checklist