Skip to content

feat: support BTreeFileMetaSelector & LazyFilteredBTreeReader for btree index#250

Merged
lszskye merged 4 commits into
alibaba:mainfrom
lszskye:btree_global_index
Apr 27, 2026
Merged

feat: support BTreeFileMetaSelector & LazyFilteredBTreeReader for btree index#250
lszskye merged 4 commits into
alibaba:mainfrom
lszskye:btree_global_index

Conversation

@lszskye
Copy link
Copy Markdown
Collaborator

@lszskye lszskye commented Apr 24, 2026

Purpose

(1)support BTreeFileMetaSelector & LazyFilteredBTreeReader for btree index
(2)remove range_end in GlobalIndexIOMeta
(3)add relative_row_ids in AddBatch api in GlobalIndexWriter
(4)add compatibility test & test data for float type for btree index read

Linked issue: #38

Tests

src/paimon/common/global_index/btree/btree_file_meta_selector_test.cpp
src/paimon/common/global_index/btree/btree_compatibility_test.cpp

API and Format

GlobalIndexWriter
virtual Status AddBatch(::ArrowArray* arrow_array) = 0; ->
virtual Status AddBatch(::ArrowArray* arrow_array, std::vector<int64_t>&& relative_row_ids) = 0;

Comment thread src/paimon/common/global_index/btree/btree_compatibility_test.cpp
Comment thread src/paimon/common/global_index/btree/lazy_filtered_btree_reader.h
Comment thread src/paimon/common/global_index/btree/lazy_filtered_btree_reader.cpp
Comment thread src/paimon/common/global_index/btree/lazy_filtered_btree_reader.cpp Outdated
Comment thread src/paimon/common/global_index/btree/lazy_filtered_btree_reader.cpp Outdated
Comment thread src/paimon/common/global_index/wrap/file_index_writer_wrapper.h
Comment thread src/paimon/common/global_index/global_index_utils.h Outdated
Comment thread src/paimon/common/global_index/global_index_utils.h Outdated
Comment thread src/paimon/common/global_index/global_index_utils.h
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the global index implementation to better support BTree index reads by adding metadata-based file selection and lazy per-file reader creation, while also evolving the global-index write/read APIs to remove range_end from GlobalIndexIOMeta and pass relative_row_ids through GlobalIndexWriter::AddBatch.

Changes:

  • Add BTreeFileMetaSelector and LazyFilteredBTreeReader to enable predicate-based pruning of BTree index files and lazy reader instantiation/caching.
  • Remove range_end from GlobalIndexIOMeta and update global-index scanning/evaluation codepaths to use nullptr (std::shared_ptr<...> == nullptr) to represent “cannot be evaluated by this index”.
  • Update GlobalIndexWriter::AddBatch API to accept relative_row_ids, and plumb row-id reading through GlobalIndexWriteTask; add float compatibility test data for BTree reads.

Reviewed changes

Copilot reviewed 50 out of 50 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/test_data/global_index/btree/btree_compatibility_data/btree_test_float_50.csv Adds float ground-truth records (incl. NaN/inf) for compatibility testing.
test/test_data/global_index/btree/btree_compatibility_data/btree_test_float_50.bin Adds serialized float BTree index test artifact via Git LFS pointer.
test/test_data/global_index/btree/btree_compatibility_data/btree_test_float_50.bin.meta Adds Git LFS metadata for the float BTree index test artifact.
test/inte/global_index_test.cpp Updates integration expectations to new “nullptr means unsupported” semantics; removes range-length mismatch test.
src/paimon/global_index/lumina/lumina_global_index_test.cpp Updates tests for new AddBatch signature and removal of range_end assumptions.
src/paimon/global_index/lumina/lumina_global_index.h Updates Lumina writer/reader interfaces to match new global index APIs.
src/paimon/global_index/lumina/lumina_global_index.cpp Implements relative-row-id validation and removes range_end-based meta checks.
src/paimon/global_index/lucene/lucene_global_index_writer.h Updates Lucene writer AddBatch signature to accept relative_row_ids.
src/paimon/global_index/lucene/lucene_global_index_writer.cpp Validates relative row ids (currently partial) and removes range_end from produced I/O meta.
src/paimon/global_index/lucene/lucene_global_index_test.cpp Updates tests for new AddBatch signature and removal of range_end checks.
src/paimon/global_index/lucene/lucene_global_index_reader.h Shifts “unsupported predicate” behavior to return nullptr results (handled upstream).
src/paimon/global_index/lucene/lucene_global_index_reader.cpp Removes dependency on io_meta.range_end in reader creation.
src/paimon/core/table/source/data_evolution_batch_scan.h Changes evaluator result type from optional to nullable shared_ptr.
src/paimon/core/table/source/data_evolution_batch_scan.cpp Adapts scan planning to nullable shared_ptr global index results.
src/paimon/core/global_index/row_range_global_index_scanner_impl.cpp Removes range_end from constructed GlobalIndexIOMeta.
src/paimon/core/global_index/global_index_write_task.cpp Reads _ROW_ID field to compute relative_row_ids and passes them into AddBatch.
src/paimon/core/global_index/global_index_scan_impl.h Updates ParallelScan return type to nullable shared_ptr.
src/paimon/core/global_index/global_index_scan_impl.cpp Implements nullable-result aggregation logic (nullptr => “scan full range”).
src/paimon/core/global_index/global_index_evaluator_impl.h Updates evaluator signatures to nullable shared_ptr results.
src/paimon/core/global_index/global_index_evaluator_impl.cpp Refactors predicate/vector-search evaluation logic to use nullptr-as-unsupported.
src/paimon/core/global_index/global_index_evaluator.h Updates API docs and return type to nullable shared_ptr.
src/paimon/common/sst/sst_file_reader.h Changes API to take BlockCache directly; adds destructor Close().
src/paimon/common/sst/sst_file_reader.cpp Uses caller-provided BlockCache and updates sort-lookup-store helper accordingly.
src/paimon/common/sst/sst_file_io_test.cpp Updates tests to construct and pass BlockCache.
src/paimon/common/lookup/sort/sort_lookup_store_factory.cpp Constructs BlockCache and passes it into SstFileReader.
src/paimon/common/global_index/wrap/file_index_writer_wrapper.h Updates wrapper to accept/validate relative_row_ids and drops range_end from meta.
src/paimon/common/global_index/wrap/file_index_reader_wrapper_test.cpp Updates tests for new “Remain => nullptr” conversion semantics.
src/paimon/common/global_index/wrap/file_index_reader_wrapper.h Updates conversion logic: Remain now maps to nullptr (unsupported/no pruning).
src/paimon/common/global_index/rangebitmap/range_bitmap_global_index_test.cpp Updates tests to pass relative_row_ids and remove range_end assertions.
src/paimon/common/global_index/rangebitmap/range_bitmap_global_index.cpp Drops range_end usage when transforming file-index results.
src/paimon/common/global_index/global_index_utils.h Adds shared validation helper for relative_row_ids (new file).
src/paimon/common/global_index/btree/lazy_filtered_btree_reader.h Adds lazy multi-file BTree reader (new file).
src/paimon/common/global_index/btree/lazy_filtered_btree_reader.cpp Implements file selection, lazy reader creation, footer/null-bitmap reads, and merging.
src/paimon/common/global_index/btree/key_serializer.cpp Removes stale TODO comments for float/double bit conversion.
src/paimon/common/global_index/btree/btree_global_indexer.h Introduces Create(...) factory and stores a shared CacheManager.
src/paimon/common/global_index/btree/btree_global_indexer.cpp Switches BTree reader creation to LazyFilteredBTreeReader and shared cache manager.
src/paimon/common/global_index/btree/btree_global_index_writer.h Updates writer to implement new AddBatch signature and removes max_row_id_.
src/paimon/common/global_index/btree/btree_global_index_writer.cpp Validates relative_row_ids and emits GlobalIndexIOMeta without range_end.
src/paimon/common/global_index/btree/btree_global_index_integration_test.cpp Updates indexer construction and AddBatch calls for new APIs and meta format.
src/paimon/common/global_index/btree/btree_global_index_factory.cpp Updates factory to use BTreeGlobalIndexer::Create.
src/paimon/common/global_index/btree/btree_file_meta_selector_test.cpp Adds unit tests for new BTree file metadata selector (new file).
src/paimon/common/global_index/btree/btree_file_meta_selector.h Adds BTree file candidate selector based on min/max keys + null metadata (new file).
src/paimon/common/global_index/btree/btree_file_meta_selector.cpp Implements predicate-based file filtering for BTree index metas (new file).
src/paimon/common/global_index/btree/btree_compatibility_test.cpp Adds float compatibility coverage and adapts to new GlobalIndexIOMeta format.
src/paimon/common/global_index/bitmap/bitmap_global_index_test.cpp Updates tests for new AddBatch signature and nullptr-as-unsupported semantics.
src/paimon/common/global_index/bitmap/bitmap_global_index.cpp Drops range_end usage when transforming file-index results.
src/paimon/common/global_index/CMakeLists.txt Registers new BTree selector + lazy reader sources in build.
src/paimon/CMakeLists.txt Registers new BTree selector test in test build.
include/paimon/global_index/global_index_writer.h Updates public API: AddBatch now requires relative_row_ids.
include/paimon/global_index/global_index_io_meta.h Updates public I/O metadata: removes range_end.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/paimon/common/global_index/global_index_utils.h
Comment thread src/paimon/common/global_index/global_index_utils.h Outdated
Comment thread src/paimon/global_index/lumina/lumina_global_index.h
Comment thread src/paimon/common/sst/sst_file_reader.cpp Outdated
Comment thread src/paimon/common/global_index/btree/lazy_filtered_btree_reader.h
@lszskye lszskye force-pushed the btree_global_index branch from 3e4e1bb to 65e78a3 Compare April 27, 2026 08:26
@lszskye lszskye force-pushed the btree_global_index branch from 0108a07 to 65469b0 Compare April 27, 2026 08:58
Copy link
Copy Markdown
Collaborator

@lxy-9602 lxy-9602 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lszskye lszskye merged commit e2a73de into alibaba:main Apr 27, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants