fix: Skip pre-compaction rollback metadata reads in getValidInstantTimestamps#18544
fix: Skip pre-compaction rollback metadata reads in getValidInstantTimestamps#18544yihua wants to merge 2 commits into
Conversation
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR avoids unnecessary reads of pre-MDT-compaction rollback metadata in getValidInstantTimestamps by clamping the rollback filter threshold to max(earliestInstantTime, latestMdtCompactionTime), since rolled-back log blocks before the latest MDT compaction are already merged into base files. The fallback to SOLO_COMMIT_TIMESTAMP correctly preserves original behavior when no MDT compaction exists. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One truncated comment in the test — the rest of the change looks clean.
cc @yihua
e32a618 to
67eac7a
Compare
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR avoids unnecessary sequential reads of pre-MDT-compaction rollback metadata in getValidInstantTimestamps by clamping the rollback filter threshold to max(earliestInstantTime, latestMdtCompactionTime). The fallback to SOLO_COMMIT_TIMESTAMP when no MDT compaction exists correctly preserves the original behavior. One question on the inline comment about consistency with the existing getLatestCompactionTime() helper. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. A couple of small readability issues in the new test — one truncated comment and one variable name that implies the wrong shape of data.
| // reads for old rollback instants that can cause long latency during metadata table reading. | ||
| final String earliestInstantTime = validInstantTimestamps.isEmpty() ? SOLO_COMMIT_TIMESTAMP : Collections.min(validInstantTimestamps); | ||
| final String latestMdtCompactionTime = metadataMetaClient.getActiveTimeline() | ||
| .getCommitTimeline() |
There was a problem hiding this comment.
🤖 The existing HoodieBackedTableMetadata.getLatestCompactionTime() (line 808) uses getCommitAndReplaceTimeline() which also includes REPLACE_COMMIT_ACTION / CLUSTERING_ACTION, while this new code uses only getCommitTimeline() (just COMMIT_ACTION). Was this intentional? It's safe today since MDT only emits compaction commits as COMMIT_ACTION, but the inconsistency is a small future-proofing risk if MDT ever gains clustering/replace semantics — and reusing/sharing the existing helper would also avoid the duplicated lookup logic.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| mdtTestTable.addDeltaCommit("20260101020101000"); | ||
|
|
||
| metaClient = HoodieTableMetaClient.reload(metaClient); | ||
| mdtMetaClient = HoodieTableMetaClient.reload(mdtMetaClient); |
There was a problem hiding this comment.
🤖 nit: the comment is cut off mid-sentence — "rolled-back commits appear" appears to be missing the end of the thought. Could you complete it, e.g. "rolled-back commits appear in the valid timestamps"?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| assertFalse(validTimestamps.contains(commit3), "commit3 should NOT be in valid timestamps (pre-compaction rollback skipped)"); | ||
| } | ||
|
|
||
| private void addCompletedRollback(HoodieTestTable testTable, String rollbackTime, String rolledBackCommit) throws Exception { |
There was a problem hiding this comment.
🤖 nit: emptyPartitionFiles reads as though the map is empty, but it actually contains a partition1 entry. Something like partitionFiles or partitionToFiles might be less surprising.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18544 +/- ##
============================================
- Coverage 68.14% 67.30% -0.84%
+ Complexity 29077 28665 -412
============================================
Files 2522 2522
Lines 141177 141185 +8
Branches 17514 17515 +1
============================================
- Hits 96208 95028 -1180
- Misses 37061 38148 +1087
- Partials 7908 8009 +101
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Describe the issue this Pull Request addresses
Summary and Changelog
Impact
Risk Level
Documentation Update
Contributor's checklist