Skip to content

fix: Fail metadata bootstrap early in presence of 0 byte file#18209

Open
suryaprasanna wants to merge 3 commits intoapache:masterfrom
suryaprasanna:zero-byte-ri-bootstrap
Open

fix: Fail metadata bootstrap early in presence of 0 byte file#18209
suryaprasanna wants to merge 3 commits intoapache:masterfrom
suryaprasanna:zero-byte-ri-bootstrap

Conversation

@suryaprasanna
Copy link
Contributor

Describe the issue this Pull Request addresses

This PR addresses an issue where metadata bootstrap can fail silently or with unclear errors when encountering 0-byte base files in the table. During metadata table rebootstrap, especially with record-level index enabled, the presence of empty files can lead to corruption and unclear error messages. This change adds early validation to detect and report such issues with clear context.

Summary and Changelog

Users will now get a clear, early failure message when metadata bootstrap encounters a 0-byte file, making debugging easier and preventing silent corruption.

Changes:

  • Added validation in HoodieMetadataPayload.java to check that files being added have positive size
  • Added assertion with descriptive error message indicating the specific 0-byte file
  • Added comprehensive test testRecordIndexRebootstrapWithZeroByteBaseFile in TestRecordLevelIndex.scala that:
    • Simulates corruption by replacing a base file with an empty file
    • Verifies that metadata rebootstrap with record index fails with appropriate exception
    • Confirms error message contains the corrupted file name for easy debugging

Impact

User-facing impact: Users will receive clearer error messages when metadata operations encounter 0-byte files, making debugging significantly easier.

API changes: None

Performance impact: Minimal - adds a single size check during metadata bootstrap.

Risk Level

Low - This change only adds validation to fail fast on already corrupt/invalid state. It does not change any successful code paths. The validation prevents proceeding with corrupt data that would fail later with unclear errors.

Verification:

  • Added unit test that specifically exercises the new validation path
  • Test confirms proper error message with file name is surfaced
  • Existing tests continue to pass

Documentation Update

None - this is an internal validation improvement that doesn't introduce new configs or user-facing features.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Feb 16, 2026
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments