-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
RecordBatch.numRows returns 0 for zero-column batches deserialized from IPC, even when the IPC message header specifies a non-zero length. Zero-column RecordBatches are valid Arrow payloads and should preserve their row count through serialization round-trips.
Current behavior
- An IPC stream contains a zero-column RecordBatch with
length: 100 - The IPC reader correctly passes
header.lengthtomakeData, andvisitStructpreserves it - But
new RecordBatch(schema, data)callsensureSameLengthDatawhich recomputes length aschunks.reduce((max, col) => Math.max(max, col.length), 0)— with zero children, this returns 0
Expected behavior
RecordBatch.numRows should be 100 after deserializing a zero-column batch with length: 100.
Reproducer
import { makeData, RecordBatch, Schema, Struct } from 'apache-arrow';
const schema = new Schema([]);
const data = makeData({ type: new Struct([]), length: 100, nullCount: 0, children: [] });
const batch = new RecordBatch(schema, data);
console.log(batch.numRows); // 0 — expected 100Root cause
In src/recordbatch.ts line 84, the 2-arg constructor path calls:
[this.schema, this.data] = ensureSameLengthData<T>(this.schema, this.data.children as Data<T[keyof T]>[]);ensureSameLengthData (line 323) defaults maxLength to chunks.reduce(...), which returns 0 for an empty array.
The 1-arg constructor path (line 102) already passes length explicitly and does not have this bug.
Fix
Pass this.data.length as the third argument to ensureSameLengthData, which already accepts an optional maxLength parameter.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels