Skip to content

ListArray.values don't take offset into consideration #37733

@0x26res

Description

@0x26res

Describe the bug, including details regarding any error messages, version, and platform.

According to the doc for Array.offset:

A relative position into another array’s data.

The purpose is to enable zero-copy slicing. This value defaults to zero but must be applied on all operations with the physical storage buffers.

So in particular "must be applied on all operations with the physical storage buffers."

I'm wondering if it should be applied to ListArray.values.

Here's an example:

import pyarrow as pa

values = [[1], [1, 2], [1, 2, 3]]
array = pa.array(values)
assert array.to_pylist() == values
assert array.values.to_pylist() == [1, 1, 2, 1, 2, 3]

slice = array[1:]
assert slice.to_pylist() == [[1, 2], [1, 2, 3]]
assert slice.values == array.values  # Wrong Should skip the first value

The work around is to calculate the values offset my self, by looking at ListArray.offsets at position ListArray.offset, but it's not straightforward.

Alternatively if ListArray.values isn't going to respect ListArray.offset it should be documented here

Tested on pyarrow==13.0.0

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions