Skip to content

fix: Skip pre-compaction rollback metadata reads in getValidInstantTimestamps#18544

Open
yihua wants to merge 2 commits into
apache:masterfrom
yihua:fix-skip-pre-compaction-rollbacks
Open

fix: Skip pre-compaction rollback metadata reads in getValidInstantTimestamps#18544
yihua wants to merge 2 commits into
apache:masterfrom
yihua:fix-skip-pre-compaction-rollbacks

Conversation

@yihua
Copy link
Copy Markdown
Contributor

@yihua yihua commented Apr 22, 2026

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR avoids unnecessary reads of pre-MDT-compaction rollback metadata in getValidInstantTimestamps by clamping the rollback filter threshold to max(earliestInstantTime, latestMdtCompactionTime), since rolled-back log blocks before the latest MDT compaction are already merged into base files. The fallback to SOLO_COMMIT_TIMESTAMP correctly preserves original behavior when no MDT compaction exists. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One truncated comment in the test — the rest of the change looks clean.

cc @yihua

@github-actions github-actions Bot added the size:M PR with lines of changes in (100, 300] label Apr 23, 2026
@yihua yihua force-pushed the fix-skip-pre-compaction-rollbacks branch from e32a618 to 67eac7a Compare May 9, 2026 02:16
@hudi-bot
Copy link
Copy Markdown
Collaborator

hudi-bot commented May 9, 2026

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR avoids unnecessary sequential reads of pre-MDT-compaction rollback metadata in getValidInstantTimestamps by clamping the rollback filter threshold to max(earliestInstantTime, latestMdtCompactionTime). The fallback to SOLO_COMMIT_TIMESTAMP when no MDT compaction exists correctly preserves the original behavior. One question on the inline comment about consistency with the existing getLatestCompactionTime() helper. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. A couple of small readability issues in the new test — one truncated comment and one variable name that implies the wrong shape of data.

// reads for old rollback instants that can cause long latency during metadata table reading.
final String earliestInstantTime = validInstantTimestamps.isEmpty() ? SOLO_COMMIT_TIMESTAMP : Collections.min(validInstantTimestamps);
final String latestMdtCompactionTime = metadataMetaClient.getActiveTimeline()
.getCommitTimeline()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 The existing HoodieBackedTableMetadata.getLatestCompactionTime() (line 808) uses getCommitAndReplaceTimeline() which also includes REPLACE_COMMIT_ACTION / CLUSTERING_ACTION, while this new code uses only getCommitTimeline() (just COMMIT_ACTION). Was this intentional? It's safe today since MDT only emits compaction commits as COMMIT_ACTION, but the inconsistency is a small future-proofing risk if MDT ever gains clustering/replace semantics — and reusing/sharing the existing helper would also avoid the duplicated lookup logic.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

mdtTestTable.addDeltaCommit("20260101020101000");

metaClient = HoodieTableMetaClient.reload(metaClient);
mdtMetaClient = HoodieTableMetaClient.reload(mdtMetaClient);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: the comment is cut off mid-sentence — "rolled-back commits appear" appears to be missing the end of the thought. Could you complete it, e.g. "rolled-back commits appear in the valid timestamps"?

- AI-generated; verify before applying. React 👍/👎 to flag quality.

assertFalse(validTimestamps.contains(commit3), "commit3 should NOT be in valid timestamps (pre-compaction rollback skipped)");
}

private void addCompletedRollback(HoodieTestTable testTable, String rollbackTime, String rolledBackCommit) throws Exception {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: emptyPartitionFiles reads as though the map is empty, but it actually contains a partition1 entry. Something like partitionFiles or partitionToFiles might be less surprising.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.30%. Comparing base (47bf4e4) to head (67eac7a).

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18544      +/-   ##
============================================
- Coverage     68.14%   67.30%   -0.84%     
+ Complexity    29077    28665     -412     
============================================
  Files          2522     2522              
  Lines        141177   141185       +8     
  Branches      17514    17515       +1     
============================================
- Hits          96208    95028    -1180     
- Misses        37061    38148    +1087     
- Partials       7908     8009     +101     
Flag Coverage Δ
common-and-other-modules 44.42% <100.00%> (+<0.01%) ⬆️
hadoop-mr-java-client 45.01% <100.00%> (+<0.01%) ⬆️
spark-client-hadoop-common 21.36% <0.00%> (-26.99%) ⬇️
spark-java-tests 49.00% <100.00%> (+<0.01%) ⬆️
spark-scala-tests 44.91% <100.00%> (+<0.01%) ⬆️
utilities 37.65% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../apache/hudi/metadata/HoodieTableMetadataUtil.java 82.08% <100.00%> (-0.28%) ⬇️

... and 165 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yihua yihua added this to the release-1.2.0 milestone May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants