Skip to content

refactor: unify BinarySection classes to single MemorySegment model and use string_view to avoid copies#196

Merged
lxy-9602 merged 13 commits into
alibaba:mainfrom
lxy-9602:row-compact-string-view
Mar 27, 2026
Merged

refactor: unify BinarySection classes to single MemorySegment model and use string_view to avoid copies#196
lxy-9602 merged 13 commits into
alibaba:mainfrom
lxy-9602:row-compact-string-view

Conversation

@lxy-9602
Copy link
Copy Markdown
Collaborator

@lxy-9602 lxy-9602 commented Mar 23, 2026

Purpose

Linked issue: #93

  1. Refactors the internal row structures (BinarySection, BinaryString, BinaryRow, BinaryArray, BinaryMap) from a multi-segment model (std::vector<MemorySegment>) to a single-segment model (MemorySegment), ensuring contiguous memory layout. It also introduces GetStringView / string_view-based read paths to avoid unnecessary data copies in serialization and comparison operations.
  2. CompactStrategy supports BucketedDvMaintainer.

Tests

CompactStrategyTest, TestPickFullCompaction
DataDefineTest, GetStringView
LookupMergeTreeCompactRewriterTest, TestRewriteWithDvAndAggForStringFields

API and Format

Blob::ArrowField() remove nullable param as blob not support null now.

Generative AI tooling

Partially Generated-by: Claude-4.6-Opus

@lucasfang lucasfang requested a review from Copilot March 24, 2026 01:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors serialization and compaction selection to reduce copies by using std::string_view in hot paths, and extends full-compaction picking logic to account for deletion vectors via BucketedDvMaintainer.

Changes:

  • Enable use_view=true for InternalRow field getters and add string-view based write paths in RowCompactedSerializer / serializer utilities.
  • Extend CompactStrategy::PickFullCompaction to accept a BucketedDvMaintainer and trigger rewrites when deletion vectors exist.
  • Update/add unit tests to cover the new compaction selection behavior and adjust persist processor test setup.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/paimon/core/mergetree/lookup/persist_processor_test.cpp Reworks test fixture setup to build KeyValue from Arrow/ColumnarRow instead of BinaryRowGenerator.
src/paimon/core/mergetree/compact/compact_strategy_test.cpp Updates PickFullCompaction call sites for new signature and adds DV-maintainer coverage.
src/paimon/core/mergetree/compact/compact_strategy.h Adds DV maintainer parameter and logic to include files with deletion vectors in full compaction.
src/paimon/common/data/serializer/row_compacted_serializer.h Adds RowWriter::WriteStringView to write length-prefixed bytes from std::string_view.
src/paimon/common/data/serializer/row_compacted_serializer.cpp Enables view-based getters and removes intermediate allocations/copies when serializing string/binary.
src/paimon/common/data/serializer/binary_serializer_utils.cpp Switches STRING/BINARY serialization to pass views instead of copying.
src/paimon/common/data/generic_row.h Adds conversion path when a GenericRow field is stored as std::string_view but API requires BinaryString.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/paimon/core/mergetree/compact/compact_strategy.h
Comment thread src/paimon/core/mergetree/compact/compact_strategy.h
Comment thread src/paimon/common/data/serializer/binary_serializer_utils.cpp
Comment thread src/paimon/common/data/serializer/binary_serializer_utils.cpp
Comment thread src/paimon/core/mergetree/compact/compact_strategy_test.cpp
Comment thread src/paimon/core/mergetree/compact/compact_strategy_test.cpp
Comment thread src/paimon/core/mergetree/lookup/persist_processor_test.cpp Outdated
Comment thread src/paimon/core/mergetree/lookup/persist_processor_test.cpp Outdated
Comment thread src/paimon/core/mergetree/lookup/persist_processor_test.cpp Outdated
Comment thread src/paimon/core/mergetree/compact/compact_strategy.h
@lxy-9602 lxy-9602 changed the title refactor: improve RowCompactedSerializer by using string_view to avoid data copies refactor: unify BinarySection classes to single MemorySegment model and use string_view to avoid copies Mar 25, 2026
@lxy-9602 lxy-9602 force-pushed the row-compact-string-view branch from 3effb16 to dc9403e Compare March 25, 2026 12:37
Comment thread src/paimon/common/data/binary_map.h
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 66 out of 66 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread include/paimon/data/blob.h
Comment thread src/paimon/common/data/blob.cpp
Comment thread src/paimon/common/data/data_define.h Outdated
Comment thread src/paimon/common/data/abstract_binary_writer.cpp Outdated
Comment thread src/paimon/common/data/abstract_binary_writer.cpp Outdated
Comment thread src/paimon/common/data/abstract_binary_writer.cpp Outdated
Comment thread src/paimon/common/data/abstract_binary_writer.cpp Outdated
Comment thread src/paimon/testing/utils/key_value_checker.h Outdated
@zjw1111 zjw1111 requested a review from Copilot March 26, 2026 09:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 66 out of 66 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (2)

src/paimon/core/mergetree/compact/compact_strategy.h:1

  • Including bucketed_dv_maintainer.h in this header can significantly increase compile-time coupling. Since the API only needs std::shared_ptr<BucketedDvMaintainer>, you can forward-declare class BucketedDvMaintainer; in the paimon namespace and move the heavy include to the corresponding .cpp (or wherever the type’s definition is actually required).
    src/paimon/core/mergetree/compact/compact_strategy.h:1
  • Including bucketed_dv_maintainer.h in this header can significantly increase compile-time coupling. Since the API only needs std::shared_ptr<BucketedDvMaintainer>, you can forward-declare class BucketedDvMaintainer; in the paimon namespace and move the heavy include to the corresponding .cpp (or wherever the type’s definition is actually required).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/paimon/common/data/binary_row.cpp
Comment thread src/paimon/common/data/binary_row.cpp
Comment thread src/paimon/common/data/binary_row.cpp
Comment thread src/paimon/common/data/binary_string.cpp
Comment thread src/paimon/common/data/data_define.h
Comment thread src/paimon/common/data/binary_row.cpp
Comment thread src/paimon/common/data/binary_row.cpp
Comment thread src/paimon/common/sst/block_iterator.cpp Outdated
Comment thread src/paimon/common/sst/sst_file_reader.cpp Outdated
Comment thread src/paimon/common/sst/sst_file_reader.cpp Outdated
Comment thread src/paimon/core/mergetree/compact/interval_partition_test.cpp Outdated
Comment thread src/paimon/core/mergetree/compact/lookup_merge_tree_compact_rewriter_test.cpp Outdated
@lxy-9602 lxy-9602 force-pushed the row-compact-string-view branch from c00cd5b to bb6b2ae Compare March 26, 2026 10:11
Copy link
Copy Markdown
Collaborator

@lucasfang lucasfang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lxy-9602 lxy-9602 merged commit 6ad44ae into alibaba:main Mar 27, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants