feat: support BTreeFileMetaSelector & LazyFilteredBTreeReader for btree index#250
Merged
Conversation
lxy-9602
reviewed
Apr 27, 2026
There was a problem hiding this comment.
Pull request overview
This PR extends the global index implementation to better support BTree index reads by adding metadata-based file selection and lazy per-file reader creation, while also evolving the global-index write/read APIs to remove range_end from GlobalIndexIOMeta and pass relative_row_ids through GlobalIndexWriter::AddBatch.
Changes:
- Add
BTreeFileMetaSelectorandLazyFilteredBTreeReaderto enable predicate-based pruning of BTree index files and lazy reader instantiation/caching. - Remove
range_endfromGlobalIndexIOMetaand update global-index scanning/evaluation codepaths to usenullptr(std::shared_ptr<...> == nullptr) to represent “cannot be evaluated by this index”. - Update
GlobalIndexWriter::AddBatchAPI to acceptrelative_row_ids, and plumb row-id reading throughGlobalIndexWriteTask; add float compatibility test data for BTree reads.
Reviewed changes
Copilot reviewed 50 out of 50 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/test_data/global_index/btree/btree_compatibility_data/btree_test_float_50.csv | Adds float ground-truth records (incl. NaN/inf) for compatibility testing. |
| test/test_data/global_index/btree/btree_compatibility_data/btree_test_float_50.bin | Adds serialized float BTree index test artifact via Git LFS pointer. |
| test/test_data/global_index/btree/btree_compatibility_data/btree_test_float_50.bin.meta | Adds Git LFS metadata for the float BTree index test artifact. |
| test/inte/global_index_test.cpp | Updates integration expectations to new “nullptr means unsupported” semantics; removes range-length mismatch test. |
| src/paimon/global_index/lumina/lumina_global_index_test.cpp | Updates tests for new AddBatch signature and removal of range_end assumptions. |
| src/paimon/global_index/lumina/lumina_global_index.h | Updates Lumina writer/reader interfaces to match new global index APIs. |
| src/paimon/global_index/lumina/lumina_global_index.cpp | Implements relative-row-id validation and removes range_end-based meta checks. |
| src/paimon/global_index/lucene/lucene_global_index_writer.h | Updates Lucene writer AddBatch signature to accept relative_row_ids. |
| src/paimon/global_index/lucene/lucene_global_index_writer.cpp | Validates relative row ids (currently partial) and removes range_end from produced I/O meta. |
| src/paimon/global_index/lucene/lucene_global_index_test.cpp | Updates tests for new AddBatch signature and removal of range_end checks. |
| src/paimon/global_index/lucene/lucene_global_index_reader.h | Shifts “unsupported predicate” behavior to return nullptr results (handled upstream). |
| src/paimon/global_index/lucene/lucene_global_index_reader.cpp | Removes dependency on io_meta.range_end in reader creation. |
| src/paimon/core/table/source/data_evolution_batch_scan.h | Changes evaluator result type from optional to nullable shared_ptr. |
| src/paimon/core/table/source/data_evolution_batch_scan.cpp | Adapts scan planning to nullable shared_ptr global index results. |
| src/paimon/core/global_index/row_range_global_index_scanner_impl.cpp | Removes range_end from constructed GlobalIndexIOMeta. |
| src/paimon/core/global_index/global_index_write_task.cpp | Reads _ROW_ID field to compute relative_row_ids and passes them into AddBatch. |
| src/paimon/core/global_index/global_index_scan_impl.h | Updates ParallelScan return type to nullable shared_ptr. |
| src/paimon/core/global_index/global_index_scan_impl.cpp | Implements nullable-result aggregation logic (nullptr => “scan full range”). |
| src/paimon/core/global_index/global_index_evaluator_impl.h | Updates evaluator signatures to nullable shared_ptr results. |
| src/paimon/core/global_index/global_index_evaluator_impl.cpp | Refactors predicate/vector-search evaluation logic to use nullptr-as-unsupported. |
| src/paimon/core/global_index/global_index_evaluator.h | Updates API docs and return type to nullable shared_ptr. |
| src/paimon/common/sst/sst_file_reader.h | Changes API to take BlockCache directly; adds destructor Close(). |
| src/paimon/common/sst/sst_file_reader.cpp | Uses caller-provided BlockCache and updates sort-lookup-store helper accordingly. |
| src/paimon/common/sst/sst_file_io_test.cpp | Updates tests to construct and pass BlockCache. |
| src/paimon/common/lookup/sort/sort_lookup_store_factory.cpp | Constructs BlockCache and passes it into SstFileReader. |
| src/paimon/common/global_index/wrap/file_index_writer_wrapper.h | Updates wrapper to accept/validate relative_row_ids and drops range_end from meta. |
| src/paimon/common/global_index/wrap/file_index_reader_wrapper_test.cpp | Updates tests for new “Remain => nullptr” conversion semantics. |
| src/paimon/common/global_index/wrap/file_index_reader_wrapper.h | Updates conversion logic: Remain now maps to nullptr (unsupported/no pruning). |
| src/paimon/common/global_index/rangebitmap/range_bitmap_global_index_test.cpp | Updates tests to pass relative_row_ids and remove range_end assertions. |
| src/paimon/common/global_index/rangebitmap/range_bitmap_global_index.cpp | Drops range_end usage when transforming file-index results. |
| src/paimon/common/global_index/global_index_utils.h | Adds shared validation helper for relative_row_ids (new file). |
| src/paimon/common/global_index/btree/lazy_filtered_btree_reader.h | Adds lazy multi-file BTree reader (new file). |
| src/paimon/common/global_index/btree/lazy_filtered_btree_reader.cpp | Implements file selection, lazy reader creation, footer/null-bitmap reads, and merging. |
| src/paimon/common/global_index/btree/key_serializer.cpp | Removes stale TODO comments for float/double bit conversion. |
| src/paimon/common/global_index/btree/btree_global_indexer.h | Introduces Create(...) factory and stores a shared CacheManager. |
| src/paimon/common/global_index/btree/btree_global_indexer.cpp | Switches BTree reader creation to LazyFilteredBTreeReader and shared cache manager. |
| src/paimon/common/global_index/btree/btree_global_index_writer.h | Updates writer to implement new AddBatch signature and removes max_row_id_. |
| src/paimon/common/global_index/btree/btree_global_index_writer.cpp | Validates relative_row_ids and emits GlobalIndexIOMeta without range_end. |
| src/paimon/common/global_index/btree/btree_global_index_integration_test.cpp | Updates indexer construction and AddBatch calls for new APIs and meta format. |
| src/paimon/common/global_index/btree/btree_global_index_factory.cpp | Updates factory to use BTreeGlobalIndexer::Create. |
| src/paimon/common/global_index/btree/btree_file_meta_selector_test.cpp | Adds unit tests for new BTree file metadata selector (new file). |
| src/paimon/common/global_index/btree/btree_file_meta_selector.h | Adds BTree file candidate selector based on min/max keys + null metadata (new file). |
| src/paimon/common/global_index/btree/btree_file_meta_selector.cpp | Implements predicate-based file filtering for BTree index metas (new file). |
| src/paimon/common/global_index/btree/btree_compatibility_test.cpp | Adds float compatibility coverage and adapts to new GlobalIndexIOMeta format. |
| src/paimon/common/global_index/bitmap/bitmap_global_index_test.cpp | Updates tests for new AddBatch signature and nullptr-as-unsupported semantics. |
| src/paimon/common/global_index/bitmap/bitmap_global_index.cpp | Drops range_end usage when transforming file-index results. |
| src/paimon/common/global_index/CMakeLists.txt | Registers new BTree selector + lazy reader sources in build. |
| src/paimon/CMakeLists.txt | Registers new BTree selector test in test build. |
| include/paimon/global_index/global_index_writer.h | Updates public API: AddBatch now requires relative_row_ids. |
| include/paimon/global_index/global_index_io_meta.h | Updates public I/O metadata: removes range_end. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3e4e1bb to
65e78a3
Compare
0108a07 to
65469b0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
(1)support BTreeFileMetaSelector & LazyFilteredBTreeReader for btree index
(2)remove range_end in GlobalIndexIOMeta
(3)add relative_row_ids in AddBatch api in GlobalIndexWriter
(4)add compatibility test & test data for float type for btree index read
Linked issue: #38
Tests
src/paimon/common/global_index/btree/btree_file_meta_selector_test.cpp
src/paimon/common/global_index/btree/btree_compatibility_test.cpp
API and Format
GlobalIndexWriter
virtual Status AddBatch(::ArrowArray* arrow_array) = 0; ->
virtual Status AddBatch(::ArrowArray* arrow_array, std::vector<int64_t>&& relative_row_ids) = 0;