feat: Adding support to block archival on last known ECTR for v6 tables#18380
Open
nsivabalan wants to merge 1 commit intoapache:masterfrom
Open
feat: Adding support to block archival on last known ECTR for v6 tables#18380nsivabalan wants to merge 1 commit intoapache:masterfrom
nsivabalan wants to merge 1 commit intoapache:masterfrom
Conversation
Collaborator
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18380 +/- ##
=========================================
Coverage 68.36% 68.36%
+ Complexity 27566 27554 -12
=========================================
Files 2432 2432
Lines 133175 133204 +29
Branches 16023 16029 +6
=========================================
+ Hits 91047 91068 +21
- Misses 35068 35074 +6
- Partials 7060 7062 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe the issue this Pull Request addresses
This PR adds support to block archival based on the Earliest Commit To Retain (ECTR) from the last completed clean operation, preventing potential data leaks when cleaning configurations change between clean and archival runs.
Problem: Currently, archival recomputes ECTR independently based on cleaning configs at archival time, rather than reading it from the last clean plan. When cleaning configs change between clean and archival operations, archival may archive commits whose data files haven't been cleaned yet, leading to timeline metadata loss for existing data files.
Example scenario:
Summary and Changelog
User-facing summary: Users can now optionally enable archival blocking based on ECTR from the last clean to prevent archiving commits whose data files haven't been cleaned. This is useful when cleaning configurations may change over time or when strict data retention guarantees are needed.
Detailed changelog:
Configuration Changes:
Implementation Changes:
Test Changes:
a. testArchivalBlocksOnCleanECTRWhenEnabled - Core blocking functionality
b. testArchivalProceedsNormallyWhenECTRBlockingDisabled - Backward compatibility
c. testArchivalMakesProgressWhenECTRIsLaterThanArchivalWindow - Progress validation
d. testArchivalContinuesWhenCleanMetadataIsMissing - Missing metadata handling
e. testArchivalHandlesEmptyECTRInCleanMetadata - Empty ECTR handling
f. testArchivalProceedsWhenCleanHasFileVersionsPolicyWithNullECTR - FILE_VERSIONS policy compatibility
g. testArchivalBlocksOnCleanECTRWithTimelineArchiverV2AndVersion9 - Version 9 / LSM timeline compatibility
Impact
Public API Changes:
User-facing changes:
Performance impact:
Breaking changes: None - opt-in feature with no default behavior changes
Risk Level
low
Documentation Update
Config documentation:
The new config hoodie.archive.block.on.latest.clean.ectr is documented inline:
.withDocumentation("If enabled, archival will block on latest ECTR from last known clean")
Website documentation needed:
Contributor's checklist