feat: add common data structures (BinarySection, BinaryString, InternalRow)#18
Conversation
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the contribution. I found one blocker that should be fixed before merge.
BinarySection::HIGHEST_FIRST_BIT is defined as 0x80L << 56. In C++, this left-shifts a signed integer into the sign bit, which is undefined behavior when the result is not representable as a signed long. This mask is used by ReadBinary to distinguish inline fixed-length data from variable-length data, so compiler-dependent behavior here can break binary/string decoding.
Please define these masks using unsigned arithmetic and cast after the shift, for example static_cast<int64_t>(0x80ULL << 56) (and preferably use unsigned masks consistently for the related 0x7F mask as well).
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the update. The previous signed-shift issue is fixed, and I did not find any remaining blocking issue in this round.
Purpose
No Linked issue.
Introduce
BinarySection— base class for binary data backed by a MemorySegment, providing offset/size-based access (binary_section.h/cpp)BinaryString— variable-length binary string with comparison, copy, hash, and substring operations (binary_string.h/cpp)DataDefine— variant type definitions (NullableDataField,DataField) for Paimon columnar data (data_define.h)DataGetters/DataSetters— abstract interfaces for typed columnar read/write access (data_getters.h,data_setters.h)InternalRow— abstract row interface with typed field accessors and field count (internal_row.h/cpp)InternalArray— abstract array interface for columnar array data (internal_array.h)InternalMap— abstract map interface wrapping key/value InternalArrays (internal_map.h)Tests
binary_section_test.cpp— BinarySection construction and memory accessbinary_string_test.cpp— BinaryString comparison, copy, hash, substring operationsinternal_row_test.cpp— InternalRow field access and JoinedRow correctnessAPI and Format
Documentation
Generative AI tooling