feat: introduce binary row format with reader/writer support#22
feat: introduce binary row format with reader/writer support#22lszskye wants to merge 4 commits into
Conversation
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the binary row implementation. This PR currently does not compile against its base branch because it includes and uses BinaryArray/BinaryMap, but src/paimon/common/data/binary_array.h and src/paimon/common/data/binary_map.h are not present in the PR or in main. Please either add those dependencies in this PR, rebase after the PR that introduces them is merged, or remove the nested array/map support from this change.
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the update. The previous missing BinaryArray/BinaryMap issue is fixed, but I found one remaining correctness blocker in the inline bytes writer path.
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the update. The previous inline bytes writer issue is fixed, but there is still one signed left-shift overflow issue in the newly added BinaryRow code.
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for the update. I re-reviewed the current PR head end-to-end and found two remaining blockers that should be fixed before merging.
| }; | ||
| return field_setter; | ||
| } | ||
| case arrow::Type::type::DECIMAL: { |
There was a problem hiding this comment.
This branch still will not handle arrow::decimal128(...). In Arrow C++, decimal128 reports arrow::Type::DECIMAL128 (the rest of this codebase also switches on DECIMAL128, e.g. DataDefine::VariantValueToLiteral and BinaryArrayWriter::GetElementSize tests). As written, CreateFieldSetter(arrow::decimal128(...)) falls through to the unsupported-type error, so the decimal cases added in binary_row_writer_test.cpp cannot work in a real build. Please switch this to DECIMAL128 (and add DECIMAL256 handling only if intended).
| arrow::internal::checked_cast<arrow::Decimal128Type*>(field_type.get()); | ||
| assert(decimal_type); | ||
| auto precision = decimal_type->precision(); | ||
| field_setter = [field_idx, precision](const VariantType& field, |
There was a problem hiding this comment.
For non-compact decimals, the setter captures only precision and then calls WriteDecimal with a Decimal value whose scale may differ from the Arrow field scale. WriteDecimal currently only asserts the precision and never checks or normalizes scale, but the reader reconstructs decimals with the schema scale. That means a value like Decimal(5, 3, 123) written to a decimal128(5, 2) field will be read back as 1.23 instead of the original 0.123 (or being rejected). Please capture the Arrow scale here and either validate value.Scale() == scale or rescale/reject before writing, so the stored unscaled bytes match the schema scale.
|
Re-reviewed the current head ( |
Purpose
Introduce the binary row data format, compatible with Java Paimon's
BinaryRowserialization format. This module provides efficient in-memory row representation with fixed-length and variable-length parts within a singleMemorySegment.Key components:
BinaryRow, supporting all Paimon data types.MemorySegment. ImplementsInternalRowfor reading all field types, supports null tracking via bit set, hashing, equality comparison, and copy operations.InternalArraybacked by a singleMemorySegment. Supports all Paimon data types. Provides bulk conversion methods (ToIntArray,ToLongArray, etc.) and factory methods (FromIntArray,FromLongArray).BinaryArray, inheriting fromAbstractBinaryWriter.InternalMapbacked by a singleMemorySegment.Tests
BinaryRowTestBinaryRowWriterTestBinaryArrayTestBinaryMapTest