What happened?
PostgresCopyListFieldWriter::Write (c/driver/postgresql/copy/writer.h, both the IsFixedSize and variable-length branches) computes the child range for each row from the logical row index without adding array_view_->offset. When the parent List / LargeList / FixedSizeList array has offset > 0 (a sliced parent), the writer reads the wrong slot of the offsets buffer — or, for fixed-size, multiplies the wrong base index by the element size. The resulting child ranges still index into the still-full child values buffer, so list elements end up attached to the wrong rows.
Practical impact: silent, per-row drift of list-column values when an Arrow table is sliced into multiple batches and ingested via adbc_ingest with mode="create" then mode="append". The first chunk (offset=0) is always correct; every subsequent chunk's list/array column is shifted by the chunk's parent.offset. Scalar columns are unaffected because their writers route through ArrowArrayViewGetInt*, which honors offset.
Reproduced end-to-end on the postgres-test service from this repo's compose.yaml against adbc-driver-postgresql 1.11.0, with pyarrow 23 and 24, for list<string>, large_list<string>, and fixed_size_list<string, 2>.
Stack Trace
No exception — silent data corruption.
How can we reproduce the bug?
docker compose up --detach --wait postgres-test, then:
import pyarrow as pa
from adbc_driver_postgresql import dbapi
URI = "postgresql://postgres:password@localhost:5432/postgres"
N, batch = 6, 3
def expected(i):
# variable length so any drift breaks structure, not just values
return [f"r{i}-a", f"r{i}-b"] if i % 2 == 0 else [f"r{i}-x"]
tbl = pa.table({
"pk": pa.array(list(range(N))),
"tags": pa.array([expected(i) for i in range(N)],
type=pa.large_list(pa.string())),
})
with dbapi.connect(URI) as conn, conn.cursor() as cur:
cur.execute("DROP TABLE IF EXISTS adbc_list_bug")
for i, off in enumerate(range(0, N, batch)):
cur.adbc_ingest("adbc_list_bug", tbl.slice(off, batch),
mode="create" if i == 0 else "append")
cur.execute("SELECT pk, tags FROM adbc_list_bug ORDER BY pk")
for pk, tags in cur.fetchall():
print(pk, tags, "OK" if tags == expected(pk) else "DRIFTED")
Observed: pk 0–2 correct, pk 3–5 drifted. Repeats identically with pa.list_(pa.string()) and pa.list_(pa.string(), 2).
The variable-length structure also drifts — pk=3 (expected 1 element) receives the 2-element value from row 0, which nails the diagnosis to "the offsets buffer is being misread" rather than "child values are shifted independently."
Root cause
c/driver/postgresql/copy/writer.h, PostgresCopyListFieldWriter::Write (both template branches) use the logical index directly:
if constexpr (IsFixedSize) {
start = index * array_view_->layout.child_size_elements;
end = start + array_view_->layout.child_size_elements;
} else {
start = ArrowArrayViewListChildOffset(array_view_, index);
end = ArrowArrayViewListChildOffset(array_view_, index + 1);
}
ArrowArrayViewListChildOffset (nanoarrow inline_array.h) is, unlike its sibling ArrowArrayViewGetIntUnsafe, not offset-aware — it reads buffer_views[1].data.as_int32[i] (or as_int64) directly. And the fixed-size branch never references offset either. So both branches misbehave when array_view_->offset > 0.
PyArrow's Table.slice(off, len) produces parent List / FixedSizeList arrays with array.offset = off, sharing the original offsets/child buffers, so any multi-batch adbc_ingest path (or any caller passing a sliced source) trips this.
Suggested fix
const int64_t logical = array_view_->offset + index;
if constexpr (IsFixedSize) {
start = logical * array_view_->layout.child_size_elements;
end = start + array_view_->layout.child_size_elements;
} else {
start = ArrowArrayViewListChildOffset(array_view_, logical);
end = ArrowArrayViewListChildOffset(array_view_, logical + 1);
}
Built locally with that change, .so swapped into the unmodified wheel — all three list types ingest correctly across multi-chunk slices in the same venv where the unpatched wheel drifts.
Workaround (driver-user side)
Pass non-sliced inputs only. Table.combine_chunks() and pa.concat_tables([sliced]) are not sufficient — they short-circuit for a single-chunk slice and preserve offset > 0. Per-column ChunkedArray.combine_chunks() (or Table.from_arrays([c.combine_chunks() for c in t.columns], names=…)) does materialize and reset offsets to 0.
Environment/Setup
adbc-driver-postgresql 1.11.0 (also reproduces on main)
- pyarrow 23.0.1 and 24.0.0
- macOS arm64; the postgres-test container from this repo's
compose.yaml
What happened?
PostgresCopyListFieldWriter::Write(c/driver/postgresql/copy/writer.h, both theIsFixedSizeand variable-length branches) computes the child range for each row from the logical row index without addingarray_view_->offset. When the parentList/LargeList/FixedSizeListarray hasoffset > 0(a sliced parent), the writer reads the wrong slot of the offsets buffer — or, for fixed-size, multiplies the wrong base index by the element size. The resulting child ranges still index into the still-full child values buffer, so list elements end up attached to the wrong rows.Practical impact: silent, per-row drift of list-column values when an Arrow table is sliced into multiple batches and ingested via
adbc_ingestwithmode="create"thenmode="append". The first chunk (offset=0) is always correct; every subsequent chunk's list/array column is shifted by the chunk'sparent.offset. Scalar columns are unaffected because their writers route throughArrowArrayViewGetInt*, which honorsoffset.Reproduced end-to-end on the
postgres-testservice from this repo'scompose.yamlagainstadbc-driver-postgresql1.11.0, with pyarrow 23 and 24, forlist<string>,large_list<string>, andfixed_size_list<string, 2>.Stack Trace
No exception — silent data corruption.
How can we reproduce the bug?
docker compose up --detach --wait postgres-test, then:Observed: pk 0–2 correct, pk 3–5 drifted. Repeats identically with
pa.list_(pa.string())andpa.list_(pa.string(), 2).The variable-length structure also drifts — pk=3 (expected 1 element) receives the 2-element value from row 0, which nails the diagnosis to "the offsets buffer is being misread" rather than "child values are shifted independently."
Root cause
c/driver/postgresql/copy/writer.h,PostgresCopyListFieldWriter::Write(both template branches) use the logicalindexdirectly:ArrowArrayViewListChildOffset(nanoarrowinline_array.h) is, unlike its siblingArrowArrayViewGetIntUnsafe, not offset-aware — it readsbuffer_views[1].data.as_int32[i](oras_int64) directly. And the fixed-size branch never referencesoffseteither. So both branches misbehave whenarray_view_->offset > 0.PyArrow's
Table.slice(off, len)produces parentList/FixedSizeListarrays witharray.offset = off, sharing the original offsets/child buffers, so any multi-batchadbc_ingestpath (or any caller passing a sliced source) trips this.Suggested fix
Built locally with that change,
.soswapped into the unmodified wheel — all three list types ingest correctly across multi-chunk slices in the same venv where the unpatched wheel drifts.Workaround (driver-user side)
Pass non-sliced inputs only.
Table.combine_chunks()andpa.concat_tables([sliced])are not sufficient — they short-circuit for a single-chunk slice and preserveoffset > 0. Per-columnChunkedArray.combine_chunks()(orTable.from_arrays([c.combine_chunks() for c in t.columns], names=…)) does materialize and reset offsets to 0.Environment/Setup
adbc-driver-postgresql1.11.0 (also reproduces onmain)compose.yaml