Skip to content

perf: optimize rollback validation by checking lazy rollback policy before clustering validation#17537

Merged
nsivabalan merged 2 commits intoapache:masterfrom
suryaprasanna:rollback-improvements
Dec 30, 2025
Merged

perf: optimize rollback validation by checking lazy rollback policy before clustering validation#17537
nsivabalan merged 2 commits intoapache:masterfrom
suryaprasanna:rollback-improvements

Conversation

@suryaprasanna
Copy link
Contributor

Describe the issue this Pull Request addresses

This PR optimizes the rollback validation process by reordering conditional checks to avoid unnecessary file I/O operations. Currently, determining whether an instant is a clustering instant requires opening and reading the .requested file. By checking if the lazy rollback is enabled first, we can skip the expensive clustering instant validation for lazy rollback scenarios.

Summary and Changelog

Improves rollback performance by reducing unnecessary file reads during validation.

Changes:

  • Moved lazy rollback policy check (config.getFailedWritesCleanPolicy().isEager()) earlier in the validation flow
  • Restructured validateRollbackCommitSequence() to only execute when needed (eager clean policy on non-metadata tables)

Impact

Minor performance improvement for rollback operations when lazy rollback is enabled. No API changes or breaking changes.

Risk Level

none - This is a code restructuring that maintains the same validation logic but changes the order of execution to optimize performance. The validation behavior remains unchanged.

Documentation Update

None. This is an internal optimization with no user-facing changes or new configurations.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

…g and execute validateRollbackSequence method.

To determine whether an instant is Clustering instant or not needs opening the requested file. By doing the eager checks in advance we can skip reading .requested files for replacecommit files.
@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Dec 9, 2025
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua yihua self-assigned this Dec 10, 2025
@nsivabalan nsivabalan self-assigned this Dec 29, 2025
@nsivabalan nsivabalan merged commit 23df870 into apache:master Dec 30, 2025
70 checks passed
PavithranRick pushed a commit to PavithranRick/hudi that referenced this pull request Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants