Skip to content

feat: optimize BTree index read with io buffering and boundary skip#263

Merged
lszskye merged 6 commits into
alibaba:mainfrom
lxy-9602:improve-btree-efficiency2
May 8, 2026
Merged

feat: optimize BTree index read with io buffering and boundary skip#263
lszskye merged 6 commits into
alibaba:mainfrom
lxy-9602:improve-btree-efficiency2

Conversation

@lxy-9602
Copy link
Copy Markdown
Collaborator

@lxy-9602 lxy-9602 commented May 7, 2026

Purpose

Linked issue: #38

  • Add RoaringBitmap64::AddMany for bulk insertion, replacing per-row Add in RangeQuery
  • RangeQuery short-circuits lower/upper bound checks: when from == min_key skips all from comparisons, when to == max_key skips all to comparisons, and marks passed_from_bound after the first block so subsequent blocks skip lower-bound checks entirely
  • Add BlockIterator::SkipKeyAndReadValue for fast-path iteration without key deserialization
  • Optimize BufferedInputStream::Seek with in-buffer fast path (adjust pos_ only when target is within cached window)
  • Add kBtreeIndexReadBufferSize option to wrap InputStream with BufferedInputStream for range query I/O

Tests

RoaringBitmap64Test.TestAddMany
BufferedInputStreamTest.TestSeek
BTreeGlobalIndexIntegrationTest.WriteAndReadLargeDataWithSmallBlocks

API and Format

Documentation

Generative AI tooling

Generated-by: Claude-4.6-Opus

Comment thread src/paimon/common/global_index/btree/btree_global_index_reader.cpp Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes B-tree global index range-query performance by reducing per-entry overhead (bulk bitmap inserts, fewer key deserializations/comparisons) and improving I/O efficiency via optional buffered reads.

Changes:

  • Add RoaringBitmap64::AddMany and use it in B-tree range scans to batch row-id insertion per data block.
  • Optimize BTreeGlobalIndexReader::RangeQuery with boundary short-circuiting and a new BlockIterator::SkipKeyAndReadValue fast path.
  • Improve BufferedInputStream::Seek with an in-buffer fast path and add btree-index.read-buffer-size to enable buffered reads.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
third_party/roaring_bitmap/roaring.hh Exposes an inner-bitmap “get-or-create” helper used for bulk insertion.
include/paimon/utils/roaring_bitmap64.h Adds public AddMany API documentation/signature.
src/paimon/common/utils/roaring_bitmap64.cpp Implements RoaringBitmap64::AddMany via bucketing + 32-bit addMany.
src/paimon/common/utils/roaring_bitmap64_test.cpp Adds unit tests for AddMany.
src/paimon/common/sst/block_iterator.h Adds SkipKeyAndReadValue API for fast iteration.
src/paimon/common/sst/block_iterator.cpp Implements key-skipping value reads.
include/paimon/io/buffered_input_stream.h (Context) Buffered input stream interface used by new option.
src/paimon/common/io/buffered_input_stream.cpp Adds buffered-window seek fast path.
src/paimon/common/io/buffered_input_stream_test.cpp Adds seek-path coverage (currently problematic due to private access).
src/paimon/common/global_index/btree/btree_defs.h Documents and introduces btree-index.read-buffer-size option key.
src/paimon/common/global_index/btree/btree_global_indexer.cpp Parses new read-buffer option and passes it to reader creation.
src/paimon/common/global_index/btree/lazy_filtered_btree_reader.h Threads optional read-buffer-size into reader.
src/paimon/common/global_index/btree/lazy_filtered_btree_reader.cpp Wraps index input stream with BufferedInputStream when configured; defers min/max key deserialize to reader factory.
src/paimon/common/global_index/btree/lazy_filtered_btree_reader_test.cpp Updates construction to pass read_buffer_size.
src/paimon/common/global_index/btree/btree_global_index_reader.h Adds factory Create(...), caches serialized min/max key slices, and adjusts row-id deserialization API.
src/paimon/common/global_index/btree/btree_global_index_reader.cpp Implements the new factory + optimized range-query paths using buffering, key-skip, and batched inserts.
src/paimon/common/global_index/btree/btree_global_index_integration_test.cpp Expands integration test to validate behavior across multiple read-buffer-size configurations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/paimon/common/io/buffered_input_stream_test.cpp
Comment thread src/paimon/common/global_index/btree/btree_global_indexer.cpp Outdated
Comment thread src/paimon/common/utils/roaring_bitmap64.cpp
Comment thread src/paimon/common/utils/roaring_bitmap64_test.cpp
Comment thread src/paimon/common/io/buffered_input_stream.cpp
Comment thread src/paimon/common/global_index/btree/btree_global_indexer.cpp
@lszskye lszskye merged commit ea4c852 into alibaba:main May 8, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants