refactor(btree): refactor btree global index and add tests#243
Conversation
…action, type safety and test improvements
|
@SGZW I've addressed the |
thanks for fix it |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR refactors the BTree global index implementation by extracting key serialization into a dedicated module, modernizing SST/footer handle ownership to value/std::optional, and reworking tests to cover cross-language compatibility and broader integration scenarios.
Changes:
- Introduce
KeySerializerto centralize key encode/decode + comparator creation and reuse it across BTree reader/writer/indexer. - Replace
shared_ptrhandles with value/std::optionalfor SST/footer-related handles, and adjust formatting/helpers accordingly. - Consolidate/replace BTree unit tests with compatibility + integration-focused suites and add new
KeySerializertests.
Reviewed changes
Copilot reviewed 48 out of 63 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/core/table/sink/commit_message_impl_test.cpp | Updates expected BinaryRow::ToString() formatting in tests. |
| src/paimon/core/stats/simple_stats.h | Switches SimpleStats::ToString() to fmt::format hex formatting. |
| src/paimon/common/sst/sst_file_writer.h | Changes Bloom filter handle return type to std::optional. |
| src/paimon/common/sst/sst_file_writer.cpp | Implements Bloom filter writing using value handle + std::optional. |
| src/paimon/common/sst/sst_file_utils.h | Removes CRC hex helper (now using fmt formatting elsewhere). |
| src/paimon/common/sst/sst_file_reader.h | Updates APIs to use std::optional and renames creation helper. |
| src/paimon/common/sst/sst_file_reader.cpp | Adapts reader creation and CRC error formatting; removes SstFileIterator. |
| src/paimon/common/sst/sst_file_io_test.cpp | Updates tests to new SstFileReader factory name. |
| src/paimon/common/sst/block_trailer.cpp | Uses fmt::format for hex CRC formatting. |
| src/paimon/common/sst/block_iterator.cpp | Fixes comment describing SeekTo semantics (>=). |
| src/paimon/common/sst/block_cache.h | Makes Close() explicitly clear internal map after invalidation. |
| src/paimon/common/predicate/literal.cpp | Uses FieldsComparator for float/double comparisons. |
| src/paimon/common/lookup/sort/sort_lookup_store_footer.h | Replaces bloom filter handle shared_ptr with std::optional. |
| src/paimon/common/lookup/sort/sort_lookup_store_footer.cpp | Updates footer read/write logic for std::optional handles. |
| src/paimon/common/lookup/sort/sort_lookup_store_factory.cpp | Adapts to new SST reader factory and optional bloom handle. |
| src/paimon/common/io/cache/cache.h | Removes explicit from CacheValue ctor. |
| src/paimon/common/global_index/wrap/file_index_reader_wrapper.h | Drops overrides for removed visitor APIs (AND/OR/BETWEEN). |
| src/paimon/common/global_index/btree/key_serializer_test.cpp | Adds unit tests for key serialization/deserialization + comparator. |
| src/paimon/common/global_index/btree/key_serializer.h | Adds KeySerializer public API. |
| src/paimon/common/global_index/btree/key_serializer.cpp | Implements key encode/decode and comparator via Literal::CompareTo. |
| src/paimon/common/global_index/btree/btree_index_meta_test.cpp | Updates tests for optional-style key presence checks. |
| src/paimon/common/global_index/btree/btree_index_meta.h | Cleans up comment formatting. |
| src/paimon/common/global_index/btree/btree_index_meta.cpp | Minor simplification of has_nulls decode. |
| src/paimon/common/global_index/btree/btree_global_indexer_test.cpp | Removes older, largely “conceptual” tests. |
| src/paimon/common/global_index/btree/btree_global_indexer.h | Uses BtreeDefs, std::optional handles, and simplifies helper APIs. |
| src/paimon/common/global_index/btree/btree_global_indexer.cpp | Rewires reader/writer creation around KeySerializer + optional handles. |
| src/paimon/common/global_index/btree/btree_global_index_writer_test.cpp | Removes older writer tests (replaced by integration/compat coverage). |
| src/paimon/common/global_index/btree/btree_global_index_writer.h | Changes writer API (adds row-id-based batch) + optional handles. |
| src/paimon/common/global_index/btree/btree_global_index_writer.cpp | Implements row-id-based ingestion + new footer/bloom/null-bitmap writing. |
| src/paimon/common/global_index/btree/btree_global_index_reader.h | Refactors reader construction and range query to use Literal + KeySerializer. |
| src/paimon/common/global_index/btree/btree_global_index_reader.cpp | Re-implements predicate evaluation; introduces row-id decoding helper. |
| src/paimon/common/global_index/btree/btree_file_footer_test.cpp | Updates tests for new footer constants and optional handles. |
| src/paimon/common/global_index/btree/btree_file_footer.h | Replaces pointer handles with value + std::optional; renames constants. |
| src/paimon/common/global_index/btree/btree_file_footer.cpp | Updates footer read/write logic to match optional/value handle layout. |
| src/paimon/common/global_index/btree/btree_defs.h | Adds centralized BTree option keys/defaults. |
| src/paimon/common/global_index/btree/btree_compatibility_test.cpp | Refactors and expands Java-compatibility testing (int/string, edge cases). |
| src/paimon/common/global_index/CMakeLists.txt | Adds key_serializer.cpp to build. |
| src/paimon/common/file_index/file_index_reader.cpp | Removes AND/OR visitor implementations. |
| src/paimon/common/file_index/empty/empty_file_index_reader.h | Removes BETWEEN/NOT BETWEEN overrides (visitor API removed). |
| src/paimon/common/defs.cpp | Removes BTree option constants from global Options. |
| src/paimon/common/data/binary_row.h | Moves BinaryRow::ToString() out-of-line. |
| src/paimon/common/data/binary_row.cpp | Implements BinaryRow::ToString() with fmt::format hex formatting. |
| src/paimon/CMakeLists.txt | Updates test targets (drops old tests, adds key_serializer_test). |
| include/paimon/predicate/function_visitor.h | Removes default BETWEEN/NOT BETWEEN and AND/OR compound visitor APIs. |
| include/paimon/global_index/global_index_reader.h | Removes default AND/OR visitor implementations. |
| include/paimon/file_index/file_index_reader.h | Removes AND/OR visitor declarations. |
| include/paimon/defs.h | Removes BTree option constants from public Options. |
Comments suppressed due to low confidence (3)
src/paimon/core/table/sink/commit_message_impl_test.cpp:1
- This test asserts an exact
BinaryRow@...hash string, which is inherently unstable across builds/platforms and can cause flaky failures. Prefer asserting on stable parts of the string (e.g.,FileCommittable {partition = BinaryRow@0xprefix) or matching the hash with a regex-like check rather than hard-coding a specific value.
src/paimon/common/sst/sst_file_reader.cpp:1 - CRC32C is conceptually an unsigned 32-bit value. Formatting
trailer->Crc32c()(likely stored asint32_t) with{:#x}can produce confusing negative hex output on values with the high bit set. Cast both values touint32_tbefore formatting to make the diagnostic unambiguous.
src/paimon/common/global_index/btree/key_serializer_test.cpp:1 - Given the explicit Java-compatibility goal for key serialization, please add tests for floating-point edge cases that commonly differ cross-language: NaN (canonicalization), +0.0 vs -0.0, and infinities. This will prevent silent incompatibilities in
SerializeKey/DeserializeKeyand in comparisons.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Purpose
Linked issue: #38
KeySerializermodule for better separation of concerns.shared_ptrwith value types andstd::optionalforBlockHandle/BloomFilterHandleacross footer and SST classes.btree_defs.h, and rename test data files fromvarchartostring.Tests
btree_compatibility_test: Java-generated BTree index compatibility for int/string keys across multiple data sizes.btree_global_index_integration_test: Multi-type read/write, null handling, range/IN queries, large-scale data, and error scenarios.API and Format
Documentation
Generative AI tooling
Partially Generated-by: Claude-4.6-Opus