feat(ffi): opt-in buffer alignment in `ArrowArrayStreamReader` by dataders · Pull Request #9951 · apache/arrow-rs

dataders · 2026-05-08T19:36:05Z

Which issue does this PR close?

Contributes to Array data imported through the FFI might contain unaligned buffers #7136

(See also the prior attempt #7137, which proposed the always-on form of this fix and was closed after maintainer feedback asking for an opt-in API. This PR implements exactly the builder shape @alamb sketched in that thread.)

Rationale for this change

Some Arrow C Data Interface producers — notably the Go ADBC drivers in wide use today (e.g. the Snowflake driver) — emit data buffers whose pointers are not aligned to the underlying primitive type's alignment. When such buffers reach arrow-rs, downstream typed access in ScalarBuffer::from panics with "Memory pointer from external source (e.g, FFI) is not aligned with the specified scalar type", taking the consumer process down.

We hit this in production reading real Snowflake VARCHAR columns through the Go ADBC Snowflake driver into a Rust consumer, and we are not the only ones — see e.g. apache/arrow-adbc#2526 and the linked downstream reports.

The previous attempt #7137 made ArrowArrayStreamReader unconditionally call align_buffers() on every imported batch. The discussion there (with @tustvold and @alamb) settled on:

silent reallocation hides producer bugs and changes behavior for consumers who would prefer a hard error;
but for consumers who know they're talking to a producer that emits unaligned data (e.g. a specific ADBC driver), they need a way to ask arrow-rs to fix it up rather than panicking deep in their kernels.

@alamb's sketch:

let stream_reader = ArrowArrayStreamReader::try_new(ffi_stream)
    .with_align_buffers(true);

This PR implements that.

What changes are included in this PR?

Add a private align_buffers: bool field to ArrowArrayStreamReader, defaulting to false so the existing behavior is preserved bit-for-bit.
Add a public builder method ArrowArrayStreamReader::with_align_buffers(self, bool) -> Self.
In Iterator::next, when the flag is set, call ArrayData::align_buffers() on the freshly imported ArrayData before constructing the RecordBatch. align_buffers only copies buffers that are actually misaligned, so adequately aligned producers pay no cost beyond the per-buffer alignment check.

The existing PR #7137's diff was 7 lines and changed default behavior. This PR is opt-in only.

Are these changes tested?

Yes. Two new tests in arrow-array/src/ffi_stream.rs:

test_unaligned_buffers_default_panics_on_typed_access — fabricates an FFI_ArrowArrayStream whose first batch has a deliberately misaligned Int32 data buffer (a one-byte-offset slice into an aligned backing buffer). Asserts via #[should_panic] that the default reader still panics on typed access, preserving the existing invariant that unaligned producers surface as detectable failures.
test_unaligned_buffers_with_align_buffers_opt_in — same input, reader configured with with_align_buffers(true). Asserts the imported buffer's pointer is aligned to align_of::<i32>(), that typed access (.values()) does not panic, and that the stream is exhausted after the single batch.

Both tests bypass the standard FFI_ArrowArrayStream::new(reader) helper because that helper materializes the source RecordBatch into typed arrays, which itself panics on misaligned input. Instead they build a minimal custom FFI_ArrowArrayStream whose get_next callback emits an FFI_ArrowArray constructed from an untyped ArrayData — closer to what a real foreign producer would do.

```
$ cargo test -p arrow-array --features ffi
test result: ok. 948 passed; 3 ignored; 0 failed

$ cargo clippy -p arrow-array --all-features --tests
(clean)

$ cargo fmt -p arrow-array --check
(clean)
```

Are there any user-facing changes?

Yes, but additive only: one new public method on ArrowArrayStreamReader. No existing API or default behavior changes. No breaking changes.

Notes for reviewers

Marking as draft because the requesting team wants to eyeball it before un-drafting; happy to take any API-shape feedback (e.g. naming the method differently, defaulting differently, adding the same opt-in to other FFI import paths) before that.
The signature with_align_buffers(mut self, bool) -> Self mirrors the existing ArrayDataBuilder::align_buffers(mut self, bool) -> Self (data.rs:2222) for consistency. If you'd prefer a parameterless with_align_buffers(self) -> Self, easy change.

@tustvold

# Which issue does this PR close? - Contributes to apache#7136 # Rationale for this change Some Arrow C Data Interface producers — notably the Go ADBC drivers that are in wide use today (e.g. the Snowflake driver) — emit data buffers whose pointers are not aligned to the underlying primitive type's alignment. When such buffers reach `arrow-rs`, downstream typed access in `ScalarBuffer::from` panics with `"Memory pointer from external source (e.g, FFI) is not aligned with the specified scalar type"`, taking the consumer process down. A previous attempt at a fix (apache#7137) made `ArrowArrayStreamReader` unconditionally call `align_buffers()` on every imported batch. Reviewers (@tustvold, @alamb) pushed back: silently reallocating buffers behind users' backs hides bugs in producers and changes behavior for consumers who would prefer a hard error. @alamb suggested the API ought to be opt-in via a builder, e.g. ```rust let stream_reader = ArrowArrayStreamReader::try_new(ffi_stream) .with_align_buffers(true); ``` This PR implements exactly that. # What changes are included in this PR? - Add a private `align_buffers: bool` field to `ArrowArrayStreamReader`, defaulting to `false` so existing behavior is preserved bit-for-bit. - Add a public builder method `ArrowArrayStreamReader::with_align_buffers(self, bool) -> Self`. - In `Iterator::next`, when the flag is set, call `ArrayData::align_buffers()` on the freshly imported `ArrayData` before constructing the `RecordBatch`. `align_buffers` only copies buffers that are actually misaligned, so adequately aligned producers pay no cost beyond a per-buffer alignment check. # Are these changes tested? Yes. Two new tests in `arrow-array/src/ffi_stream.rs`: - `test_unaligned_buffers_default_panics_on_typed_access` — fabricates an `FFI_ArrowArrayStream` whose first batch has a deliberately misaligned `Int32` data buffer (one-byte offset slice). Asserts (via `#[should_panic]`) that the default reader still panics on typed access, preserving the existing invariant that unaligned producers surface as detectable failures. - `test_unaligned_buffers_with_align_buffers_opt_in` — same input, but the reader is configured with `with_align_buffers(true)`. Asserts that the imported buffer is aligned, that typed access does not panic, and that the stream is exhausted after the single batch. Both tests bypass the standard `FFI_ArrowArrayStream::new(reader)` helper because that helper materializes the source `RecordBatch` into typed arrays, which itself panics on misaligned input. Instead they build a minimal custom `FFI_ArrowArrayStream` whose `get_next` callback emits an `FFI_ArrowArray` constructed from an untyped `ArrayData` — closer to what a real foreign producer would do. `cargo test -p arrow-array --features ffi`: 948 passed, 3 ignored, 0 failed. `cargo clippy -p arrow-array --all-features --tests` clean. # Are there any user-facing changes? Yes, but additive only: a new public method `ArrowArrayStreamReader::with_align_buffers`. No existing API or behavior changes. No breaking changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dataders · 2026-05-11T01:56:53Z

Closing this in favor of fixing the producer side. The misaligned buffers are produced by the ADBC Snowflake C Data export path; the targeted fix is in dbt-labs/arrow-adbc#129 and will be shipped upstream there.

If arrow-rs still wants a generic opt-in compatibility layer for other FFI producers, that can be re-opened or reframed separately, but this PR is no longer the root-cause fix for the Snowflake/ADBC issue.

alamb · 2026-05-11T20:43:35Z

        Self::try_new(unsafe { FFI_ArrowArrayStream::from_raw(raw_stream) })
    }

+    /// Configure whether buffers from imported arrays should be aligned, copying


FIWIW this seems reasonable to me

github-actions Bot added the arrow Changes to the arrow crate label May 8, 2026

dataders marked this pull request as ready for review May 8, 2026 21:13

dataders closed this May 11, 2026

alamb reviewed May 11, 2026

View reviewed changes

alamb mentioned this pull request May 11, 2026

Array data imported through the FFI might contain unaligned buffers #7136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ffi): opt-in buffer alignment in `ArrowArrayStreamReader`#9951

feat(ffi): opt-in buffer alignment in `ArrowArrayStreamReader`#9951
dataders wants to merge 1 commit into
apache:mainfrom
dataders:align-buffers-in-ffi-stream-reader

dataders commented May 8, 2026

Uh oh!

dataders commented May 11, 2026

Uh oh!

alamb May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dataders commented May 8, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Notes for reviewers

Uh oh!

dataders commented May 11, 2026

Uh oh!

alamb May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants