feat: optimize BTree index read with io buffering and boundary skip#263
Merged
Conversation
lxy-9602
commented
May 8, 2026
There was a problem hiding this comment.
Pull request overview
This PR optimizes B-tree global index range-query performance by reducing per-entry overhead (bulk bitmap inserts, fewer key deserializations/comparisons) and improving I/O efficiency via optional buffered reads.
Changes:
- Add
RoaringBitmap64::AddManyand use it in B-tree range scans to batch row-id insertion per data block. - Optimize
BTreeGlobalIndexReader::RangeQuerywith boundary short-circuiting and a newBlockIterator::SkipKeyAndReadValuefast path. - Improve
BufferedInputStream::Seekwith an in-buffer fast path and addbtree-index.read-buffer-sizeto enable buffered reads.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| third_party/roaring_bitmap/roaring.hh | Exposes an inner-bitmap “get-or-create” helper used for bulk insertion. |
| include/paimon/utils/roaring_bitmap64.h | Adds public AddMany API documentation/signature. |
| src/paimon/common/utils/roaring_bitmap64.cpp | Implements RoaringBitmap64::AddMany via bucketing + 32-bit addMany. |
| src/paimon/common/utils/roaring_bitmap64_test.cpp | Adds unit tests for AddMany. |
| src/paimon/common/sst/block_iterator.h | Adds SkipKeyAndReadValue API for fast iteration. |
| src/paimon/common/sst/block_iterator.cpp | Implements key-skipping value reads. |
| include/paimon/io/buffered_input_stream.h | (Context) Buffered input stream interface used by new option. |
| src/paimon/common/io/buffered_input_stream.cpp | Adds buffered-window seek fast path. |
| src/paimon/common/io/buffered_input_stream_test.cpp | Adds seek-path coverage (currently problematic due to private access). |
| src/paimon/common/global_index/btree/btree_defs.h | Documents and introduces btree-index.read-buffer-size option key. |
| src/paimon/common/global_index/btree/btree_global_indexer.cpp | Parses new read-buffer option and passes it to reader creation. |
| src/paimon/common/global_index/btree/lazy_filtered_btree_reader.h | Threads optional read-buffer-size into reader. |
| src/paimon/common/global_index/btree/lazy_filtered_btree_reader.cpp | Wraps index input stream with BufferedInputStream when configured; defers min/max key deserialize to reader factory. |
| src/paimon/common/global_index/btree/lazy_filtered_btree_reader_test.cpp | Updates construction to pass read_buffer_size. |
| src/paimon/common/global_index/btree/btree_global_index_reader.h | Adds factory Create(...), caches serialized min/max key slices, and adjusts row-id deserialization API. |
| src/paimon/common/global_index/btree/btree_global_index_reader.cpp | Implements the new factory + optimized range-query paths using buffering, key-skip, and batched inserts. |
| src/paimon/common/global_index/btree/btree_global_index_integration_test.cpp | Expands integration test to validate behavior across multiple read-buffer-size configurations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lxy-9602
commented
May 8, 2026
lszskye
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: #38
RoaringBitmap64::AddManyfor bulk insertion, replacing per-rowAddinRangeQueryRangeQueryshort-circuits lower/upper bound checks: whenfrom == min_keyskips allfromcomparisons, whento == max_keyskips alltocomparisons, and markspassed_from_boundafter the first block so subsequent blocks skip lower-bound checks entirelyBlockIterator::SkipKeyAndReadValuefor fast-path iteration without key deserializationBufferedInputStream::Seekwith in-buffer fast path (adjustpos_only when target is within cached window)kBtreeIndexReadBufferSizeoption to wrapInputStreamwithBufferedInputStreamfor range query I/OTests
RoaringBitmap64Test.TestAddMany
BufferedInputStreamTest.TestSeek
BTreeGlobalIndexIntegrationTest.WriteAndReadLargeDataWithSmallBlocks
API and Format
Documentation
Generative AI tooling
Generated-by: Claude-4.6-Opus