Core: Coalesce consecutive position deletes into range inserts (Iceberg V2)#16052
Draft
Baunsgaard wants to merge 1 commit intoapache:mainfrom
Draft
Core: Coalesce consecutive position deletes into range inserts (Iceberg V2)#16052Baunsgaard wants to merge 1 commit intoapache:mainfrom
Baunsgaard wants to merge 1 commit intoapache:mainfrom
Conversation
06105f9 to
8130781
Compare
e16d218 to
fa10273
Compare
Add PositionDeleteRangeConsumer that coalesces runs of consecutive positions into a single delete(start, end) call, and use it from Deletes.toPositionIndex() so sorted position delete files are inserted into the bitmap as ranges instead of one position at a time.
fa10273 to
24545db
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add PositionDeleteRangeConsumer, a small stateless utility that walks a sequence of positions and dispatches consecutive runs as a single PositionDeleteIndex.delete(start, end) call instead of one delete(pos) per position. Sorted or partially sorted input yields maximal coalescing; unsorted input simply flushes more often with negligible overhead (one comparison per position).
This optimisation primarily benefits Iceberg V2 tables, where positional delete files are read row-by-row through Deletes.toPositionIndex(); V3 deletion vectors are already loaded as a serialised RoaringBitmap and bypass this path entirely.
forEach(Iterable<Long>, ...)the path wired intoDeletes.toPositionIndex()forEach(long[], ...)for callers that already hold a primitive array