Description
When reading IPC data with column projection enabled, skipping a Union column encoded with V4 metadata can lead to buffer misalignment and incorrect decoding of subsequent columns.
Root Cause
In arrow-ipc/src/reader.rs, skip_field does not correctly handle the buffer layout of Union types for V4.
Current implementation:
Union(fields, mode) => {
self.skip_buffer(); // Nulls
match mode {
UnionMode::Dense => self.skip_buffer(),
UnionMode::Sparse => {}
};
...
}
However, based on the V4 layout:
-
Union includes:
- null buffer
- type_ids buffer
- (for dense) offsets buffer
And create_array correctly consumes:
if self.version < MetadataVersion::V5 {
self.next_buffer()?; // null
}
let type_ids = self.next_buffer()?; // type_ids
// optionally offsets for dense
So the current skip_field logic does not skip type_ids and misinterprets buffer order
Impact
-
Can lead to:
- incorrect decoding of subsequent columns
- runtime errors (e.g., invalid buffer sizes)
-
Only occurs when:
- projection is enabled
- a
Union column is skipped
- IPC metadata version is V4
Reproduction
A minimal test case:
// Schema:
// union: Union<Int32> (skipped)
// values: Int32 (projected)
let options = IpcWriteOptions::try_new(8, false, MetadataVersion::V4)?;
let mut writer = FileWriter::try_new_with_options(..., options)?;
let reader = FileReader::try_new(cursor, Some(vec![1]))?;
Before fix:
InvalidArgumentError("Need at least 12 bytes in buffers[0] in array of type Int32, but got 1")
Proposed Fix
Update skip_field to match the actual buffer layout
Description
When reading IPC data with column projection enabled, skipping a
Unioncolumn encoded with V4 metadata can lead to buffer misalignment and incorrect decoding of subsequent columns.Root Cause
In
arrow-ipc/src/reader.rs,skip_fielddoes not correctly handle the buffer layout ofUniontypes for V4.Current implementation:
However, based on the V4 layout:
Unionincludes:And
create_arraycorrectly consumes:So the current
skip_fieldlogic does not skiptype_idsand misinterprets buffer orderImpact
Can lead to:
Only occurs when:
Unioncolumn is skippedReproduction
A minimal test case:
Before fix:
Proposed Fix
Update
skip_fieldto match the actual buffer layout