refactor: unify BinarySection classes to single MemorySegment model and use string_view to avoid copies#196
Conversation
There was a problem hiding this comment.
Pull request overview
Refactors serialization and compaction selection to reduce copies by using std::string_view in hot paths, and extends full-compaction picking logic to account for deletion vectors via BucketedDvMaintainer.
Changes:
- Enable
use_view=trueforInternalRowfield getters and add string-view based write paths inRowCompactedSerializer/ serializer utilities. - Extend
CompactStrategy::PickFullCompactionto accept aBucketedDvMaintainerand trigger rewrites when deletion vectors exist. - Update/add unit tests to cover the new compaction selection behavior and adjust persist processor test setup.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/core/mergetree/lookup/persist_processor_test.cpp | Reworks test fixture setup to build KeyValue from Arrow/ColumnarRow instead of BinaryRowGenerator. |
| src/paimon/core/mergetree/compact/compact_strategy_test.cpp | Updates PickFullCompaction call sites for new signature and adds DV-maintainer coverage. |
| src/paimon/core/mergetree/compact/compact_strategy.h | Adds DV maintainer parameter and logic to include files with deletion vectors in full compaction. |
| src/paimon/common/data/serializer/row_compacted_serializer.h | Adds RowWriter::WriteStringView to write length-prefixed bytes from std::string_view. |
| src/paimon/common/data/serializer/row_compacted_serializer.cpp | Enables view-based getters and removes intermediate allocations/copies when serializing string/binary. |
| src/paimon/common/data/serializer/binary_serializer_utils.cpp | Switches STRING/BINARY serialization to pass views instead of copying. |
| src/paimon/common/data/generic_row.h | Adds conversion path when a GenericRow field is stored as std::string_view but API requires BinaryString. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
3effb16 to
dc9403e
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 66 out of 66 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 66 out of 66 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (2)
src/paimon/core/mergetree/compact/compact_strategy.h:1
- Including
bucketed_dv_maintainer.hin this header can significantly increase compile-time coupling. Since the API only needsstd::shared_ptr<BucketedDvMaintainer>, you can forward-declareclass BucketedDvMaintainer;in thepaimonnamespace and move the heavy include to the corresponding.cpp(or wherever the type’s definition is actually required).
src/paimon/core/mergetree/compact/compact_strategy.h:1 - Including
bucketed_dv_maintainer.hin this header can significantly increase compile-time coupling. Since the API only needsstd::shared_ptr<BucketedDvMaintainer>, you can forward-declareclass BucketedDvMaintainer;in thepaimonnamespace and move the heavy include to the corresponding.cpp(or wherever the type’s definition is actually required).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c00cd5b to
bb6b2ae
Compare
Purpose
Linked issue: #93
BinarySection,BinaryString,BinaryRow,BinaryArray,BinaryMap) from a multi-segment model (std::vector<MemorySegment>) to a single-segment model (MemorySegment), ensuring contiguous memory layout. It also introducesGetStringView/string_view-based read paths to avoid unnecessary data copies in serialization and comparison operations.CompactStrategysupportsBucketedDvMaintainer.Tests
CompactStrategyTest, TestPickFullCompaction
DataDefineTest, GetStringView
LookupMergeTreeCompactRewriterTest, TestRewriteWithDvAndAggForStringFields
API and Format
Blob::ArrowField() remove nullable param as blob not support null now.
Generative AI tooling
Partially Generated-by: Claude-4.6-Opus