Skip to content

refactor(btree): refactor btree global index and add tests#243

Merged
lszskye merged 10 commits into
alibaba:mainfrom
lxy-9602:refactor-btree
Apr 22, 2026
Merged

refactor(btree): refactor btree global index and add tests#243
lszskye merged 10 commits into
alibaba:mainfrom
lxy-9602:refactor-btree

Conversation

@lxy-9602
Copy link
Copy Markdown
Collaborator

Purpose

Linked issue: #38

  • Extract key serialization/deserialization into a standalone KeySerializer module for better separation of concerns.
  • Replace shared_ptr with value types and std::optional for BlockHandle / BloomFilterHandle across footer and SST classes.
  • Consolidate and simplify btree tests, expand integration test coverage for multi-type, large-scale, and error scenarios.
  • Clean up redundant includes, extract btree_defs.h, and rename test data files from varchar to string.

Tests

  • btree_compatibility_test: Java-generated BTree index compatibility for int/string keys across multiple data sizes.
  • btree_global_index_integration_test: Multi-type read/write, null handling, range/IN queries, large-scale data, and error scenarios.

API and Format

Documentation

Generative AI tooling

Partially Generated-by: Claude-4.6-Opus

@lxy-9602 lxy-9602 changed the title refactor(btree): Refactor BTree global index with key serializer extraction, type safety and test improvements refactor(btree): refactor btree global index and add tests Apr 21, 2026
@lxy-9602
Copy link
Copy Markdown
Collaborator Author

@SGZW I've addressed the getcwd usage in this commit. Please review and let me know if anything else needs fixing.

@SGZW
Copy link
Copy Markdown
Contributor

SGZW commented Apr 21, 2026

@SGZW I've addressed the getcwd usage in this commit. Please review and let me know if anything else needs fixing.

thanks for fix it

Comment thread src/paimon/common/global_index/btree/btree_global_index_integration_test.cpp Outdated
Comment thread src/paimon/common/global_index/btree/key_serializer.cpp
Comment thread src/paimon/common/sst/sst_file_reader.h Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR refactors the BTree global index implementation by extracting key serialization into a dedicated module, modernizing SST/footer handle ownership to value/std::optional, and reworking tests to cover cross-language compatibility and broader integration scenarios.

Changes:

  • Introduce KeySerializer to centralize key encode/decode + comparator creation and reuse it across BTree reader/writer/indexer.
  • Replace shared_ptr handles with value/std::optional for SST/footer-related handles, and adjust formatting/helpers accordingly.
  • Consolidate/replace BTree unit tests with compatibility + integration-focused suites and add new KeySerializer tests.

Reviewed changes

Copilot reviewed 48 out of 63 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/paimon/core/table/sink/commit_message_impl_test.cpp Updates expected BinaryRow::ToString() formatting in tests.
src/paimon/core/stats/simple_stats.h Switches SimpleStats::ToString() to fmt::format hex formatting.
src/paimon/common/sst/sst_file_writer.h Changes Bloom filter handle return type to std::optional.
src/paimon/common/sst/sst_file_writer.cpp Implements Bloom filter writing using value handle + std::optional.
src/paimon/common/sst/sst_file_utils.h Removes CRC hex helper (now using fmt formatting elsewhere).
src/paimon/common/sst/sst_file_reader.h Updates APIs to use std::optional and renames creation helper.
src/paimon/common/sst/sst_file_reader.cpp Adapts reader creation and CRC error formatting; removes SstFileIterator.
src/paimon/common/sst/sst_file_io_test.cpp Updates tests to new SstFileReader factory name.
src/paimon/common/sst/block_trailer.cpp Uses fmt::format for hex CRC formatting.
src/paimon/common/sst/block_iterator.cpp Fixes comment describing SeekTo semantics (>=).
src/paimon/common/sst/block_cache.h Makes Close() explicitly clear internal map after invalidation.
src/paimon/common/predicate/literal.cpp Uses FieldsComparator for float/double comparisons.
src/paimon/common/lookup/sort/sort_lookup_store_footer.h Replaces bloom filter handle shared_ptr with std::optional.
src/paimon/common/lookup/sort/sort_lookup_store_footer.cpp Updates footer read/write logic for std::optional handles.
src/paimon/common/lookup/sort/sort_lookup_store_factory.cpp Adapts to new SST reader factory and optional bloom handle.
src/paimon/common/io/cache/cache.h Removes explicit from CacheValue ctor.
src/paimon/common/global_index/wrap/file_index_reader_wrapper.h Drops overrides for removed visitor APIs (AND/OR/BETWEEN).
src/paimon/common/global_index/btree/key_serializer_test.cpp Adds unit tests for key serialization/deserialization + comparator.
src/paimon/common/global_index/btree/key_serializer.h Adds KeySerializer public API.
src/paimon/common/global_index/btree/key_serializer.cpp Implements key encode/decode and comparator via Literal::CompareTo.
src/paimon/common/global_index/btree/btree_index_meta_test.cpp Updates tests for optional-style key presence checks.
src/paimon/common/global_index/btree/btree_index_meta.h Cleans up comment formatting.
src/paimon/common/global_index/btree/btree_index_meta.cpp Minor simplification of has_nulls decode.
src/paimon/common/global_index/btree/btree_global_indexer_test.cpp Removes older, largely “conceptual” tests.
src/paimon/common/global_index/btree/btree_global_indexer.h Uses BtreeDefs, std::optional handles, and simplifies helper APIs.
src/paimon/common/global_index/btree/btree_global_indexer.cpp Rewires reader/writer creation around KeySerializer + optional handles.
src/paimon/common/global_index/btree/btree_global_index_writer_test.cpp Removes older writer tests (replaced by integration/compat coverage).
src/paimon/common/global_index/btree/btree_global_index_writer.h Changes writer API (adds row-id-based batch) + optional handles.
src/paimon/common/global_index/btree/btree_global_index_writer.cpp Implements row-id-based ingestion + new footer/bloom/null-bitmap writing.
src/paimon/common/global_index/btree/btree_global_index_reader.h Refactors reader construction and range query to use Literal + KeySerializer.
src/paimon/common/global_index/btree/btree_global_index_reader.cpp Re-implements predicate evaluation; introduces row-id decoding helper.
src/paimon/common/global_index/btree/btree_file_footer_test.cpp Updates tests for new footer constants and optional handles.
src/paimon/common/global_index/btree/btree_file_footer.h Replaces pointer handles with value + std::optional; renames constants.
src/paimon/common/global_index/btree/btree_file_footer.cpp Updates footer read/write logic to match optional/value handle layout.
src/paimon/common/global_index/btree/btree_defs.h Adds centralized BTree option keys/defaults.
src/paimon/common/global_index/btree/btree_compatibility_test.cpp Refactors and expands Java-compatibility testing (int/string, edge cases).
src/paimon/common/global_index/CMakeLists.txt Adds key_serializer.cpp to build.
src/paimon/common/file_index/file_index_reader.cpp Removes AND/OR visitor implementations.
src/paimon/common/file_index/empty/empty_file_index_reader.h Removes BETWEEN/NOT BETWEEN overrides (visitor API removed).
src/paimon/common/defs.cpp Removes BTree option constants from global Options.
src/paimon/common/data/binary_row.h Moves BinaryRow::ToString() out-of-line.
src/paimon/common/data/binary_row.cpp Implements BinaryRow::ToString() with fmt::format hex formatting.
src/paimon/CMakeLists.txt Updates test targets (drops old tests, adds key_serializer_test).
include/paimon/predicate/function_visitor.h Removes default BETWEEN/NOT BETWEEN and AND/OR compound visitor APIs.
include/paimon/global_index/global_index_reader.h Removes default AND/OR visitor implementations.
include/paimon/file_index/file_index_reader.h Removes AND/OR visitor declarations.
include/paimon/defs.h Removes BTree option constants from public Options.
Comments suppressed due to low confidence (3)

src/paimon/core/table/sink/commit_message_impl_test.cpp:1

  • This test asserts an exact BinaryRow@... hash string, which is inherently unstable across builds/platforms and can cause flaky failures. Prefer asserting on stable parts of the string (e.g., FileCommittable {partition = BinaryRow@0x prefix) or matching the hash with a regex-like check rather than hard-coding a specific value.
    src/paimon/common/sst/sst_file_reader.cpp:1
  • CRC32C is conceptually an unsigned 32-bit value. Formatting trailer->Crc32c() (likely stored as int32_t) with {:#x} can produce confusing negative hex output on values with the high bit set. Cast both values to uint32_t before formatting to make the diagnostic unambiguous.
    src/paimon/common/global_index/btree/key_serializer_test.cpp:1
  • Given the explicit Java-compatibility goal for key serialization, please add tests for floating-point edge cases that commonly differ cross-language: NaN (canonicalization), +0.0 vs -0.0, and infinities. This will prevent silent incompatibilities in SerializeKey/DeserializeKey and in comparisons.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/paimon/common/global_index/btree/key_serializer.cpp
Comment thread src/paimon/common/global_index/btree/key_serializer.cpp
Comment thread src/paimon/common/global_index/btree/btree_global_index_writer.h
Comment thread src/paimon/common/global_index/btree/btree_global_index_reader.cpp
Comment thread src/paimon/common/global_index/btree/key_serializer.cpp Outdated
Comment thread src/paimon/common/global_index/btree/btree_compatibility_test.cpp Outdated
@lszskye lszskye merged commit d474ebc into alibaba:main Apr 22, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants