Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I am consuming Arrow data from a JVM producer (arrow-java's Data.exportArrayStream) via arrow::ffi_stream::ArrowArrayStreamReader. When a batch contains a Decimal128 column whose underlying buffer happens to land on an offset that is 8-byte aligned but not 16-byte aligned, ArrowArrayStreamReader::next panics inside ScalarBuffer::<i128>::from(Buffer):
panicked at arrow-buffer/src/buffer/scalar.rs:194:43:
Memory pointer from external source (e.g, FFI) is not aligned with the specified scalar type.
Before importing buffer through FFI, please make sure the allocation is aligned.
The producer is spec-conformant. The Arrow C Data Interface only recommends 8-byte alignment, and arrow-java's VectorUnloader and NettyAllocationManager only guarantee 8-byte alignment. The mismatch is on the consumer side: since Rust 1.77 / LLVM 18, align_of::<i128>() == 16 on x86 (it has always been 16 on ARM), so ScalarBuffer::<i128> requires 16-byte alignment when constructing typed arrays from imported ArrayData.
This is the same root cause as #5553 and PR #5554, which fixed it for the IPC reader by adding IpcReadOptions::require_alignment (triggering a realigning copy on import). The equivalent is missing from the C Data Interface readers.
Describe the solution you'd like
Call ArrayData::align_buffers() unconditionally inside arrow::ffi::from_ffi and arrow::ffi::from_ffi_and_data_type, after consume(). ArrowArrayStreamReader then inherits the fix automatically.
from_ffi is the right layer because:
- It's the FFI consume entry point; downstream typed-array construction is what panics, so the import path owns the repair.
arrow-pyarrow already does this manually (arrow-pyarrow/src/lib.rs:368) — that workaround becomes unnecessary.
- Direct
from_ffi callers hit the same panic today; fixing only the stream reader leaves them broken.
align_buffers() is a no-op when buffers are already aligned, so well-behaved producers pay nothing.
This matches the IPC reader's default behavior (auto-realign; require_alignment is opt-in for zero-copy users) established in #5554.
Spec basis
The 16-byte requirement is not in any Arrow spec — it is a consequence of Rust 1.77+ setting align_of::<i128>() == 16. The Columnar format only recommends 8- or 64-byte alignment for primitives; the C Data Interface goes further: "It is recommended, but not required, that the memory addresses of the buffers be aligned… Consumers MAY decide not to support unaligned memory." Auto-realigning on import is explicitly within bounds.
Describe alternatives you've considered
- Forcing the JVM producer to allocate decimal buffers with 16-byte alignment. Not portable: there is no alignment hook on
arrow-java's BufferAllocator / NettyAllocationManager, and the spec only requires 8-byte alignment of the producer.
- Wrapping
ArrowArrayStreamReader in user code by replicating its internals (driving FFI_ArrowArrayStream::get_next directly, calling from_ffi, then align_buffers(), then building the typed batch). Workable but duplicates arrow-rs internals; every JVM-Arrow consumer hits this and ends up writing the same wrapper.
- Realigning post-import. Not possible from outside the reader because the panic happens inside
ArrowArrayStreamReader::next before the caller sees a RecordBatch.
Additional context
Related:
Reproducer shape: any JVM producer that exports a RecordBatch containing a Decimal128 column (or List<Decimal128> / Struct<..., Decimal128>) where the data buffer offset within its slab is 8 mod 16. Triggers ~50% of the time with arrow-java's default NettyAllocationManager.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I am consuming Arrow data from a JVM producer (
arrow-java'sData.exportArrayStream) viaarrow::ffi_stream::ArrowArrayStreamReader. When a batch contains aDecimal128column whose underlying buffer happens to land on an offset that is 8-byte aligned but not 16-byte aligned,ArrowArrayStreamReader::nextpanics insideScalarBuffer::<i128>::from(Buffer):The producer is spec-conformant. The Arrow C Data Interface only recommends 8-byte alignment, and
arrow-java'sVectorUnloaderandNettyAllocationManageronly guarantee 8-byte alignment. The mismatch is on the consumer side: since Rust 1.77 / LLVM 18,align_of::<i128>() == 16on x86 (it has always been 16 on ARM), soScalarBuffer::<i128>requires 16-byte alignment when constructing typed arrays from importedArrayData.This is the same root cause as #5553 and PR #5554, which fixed it for the IPC reader by adding
IpcReadOptions::require_alignment(triggering a realigning copy on import). The equivalent is missing from the C Data Interface readers.Describe the solution you'd like
Call
ArrayData::align_buffers()unconditionally insidearrow::ffi::from_ffiandarrow::ffi::from_ffi_and_data_type, afterconsume().ArrowArrayStreamReaderthen inherits the fix automatically.from_ffiis the right layer because:arrow-pyarrowalready does this manually (arrow-pyarrow/src/lib.rs:368) — that workaround becomes unnecessary.from_fficallers hit the same panic today; fixing only the stream reader leaves them broken.align_buffers()is a no-op when buffers are already aligned, so well-behaved producers pay nothing.This matches the IPC reader's default behavior (auto-realign;
require_alignmentis opt-in for zero-copy users) established in #5554.Spec basis
The 16-byte requirement is not in any Arrow spec — it is a consequence of Rust 1.77+ setting
align_of::<i128>() == 16. The Columnar format only recommends 8- or 64-byte alignment for primitives; the C Data Interface goes further: "It is recommended, but not required, that the memory addresses of the buffers be aligned… Consumers MAY decide not to support unaligned memory." Auto-realigning on import is explicitly within bounds.Describe alternatives you've considered
arrow-java'sBufferAllocator/NettyAllocationManager, and the spec only requires 8-byte alignment of the producer.ArrowArrayStreamReaderin user code by replicating its internals (drivingFFI_ArrowArrayStream::get_nextdirectly, callingfrom_ffi, thenalign_buffers(), then building the typed batch). Workable but duplicates arrow-rs internals; every JVM-Arrow consumer hits this and ends up writing the same wrapper.ArrowArrayStreamReader::nextbefore the caller sees aRecordBatch.Additional context
Related:
RawPtrBox::new#2882 / Copying inappropriately aligned buffer in ipc reader #2883 / Increase default IPC alignment to 64 (#2883) #2884: earlier discussion of buffer alignment on import.ArrayData::align_buffers()already implements the fix; it just needs to be invoked from the FFI import paths.Reproducer shape: any JVM producer that exports a
RecordBatchcontaining aDecimal128column (orList<Decimal128>/Struct<..., Decimal128>) where the data buffer offset within its slab is8 mod 16. Triggers ~50% of the time witharrow-java's defaultNettyAllocationManager.