Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyarrow module can't roundtrip tensor arrays #4805

Closed
wjones127 opened this issue Sep 9, 2023 · 0 comments · Fixed by #4806
Closed

pyarrow module can't roundtrip tensor arrays #4805

wjones127 opened this issue Sep 9, 2023 · 0 comments · Fixed by #4806
Assignees
Labels
arrow Changes to the arrow crate bug python

Comments

@wjones127
Copy link
Member

Describe the bug

When exporting a tensor array (a kind of extension array) as a record batch, PyArrow segfaults. This does not happen if the batch is exported as a stream.

To Reproduce

The following test will fail in arrow-pyarrow-integration-testing/tests/test_sql.py:

def test_tensor_array():
    tensor_type = pa.fixed_shape_tensor(pa.float32(), [2, 3])
    inner = pa.array([float(x) for x in range(1, 7)] + [None] * 12, pa.float32())
    storage = pa.FixedSizeListArray.from_arrays(inner, 6)
    f32_array = pa.ExtensionArray.from_storage(tensor_type, storage)

    # Round-tripping as an array gives back storage type, because arrow-rs has
    # no notion of extension types.
    b = rust.round_trip_array(f32_array)
    assert b == f32_array.storage

    batch = pa.record_batch([f32_array], ["tensor"])
    b = rust.round_trip_record_batch(batch)
    assert b == batch

    del b

Expected behavior

We should round trip the array type successfully.

Additional context

The record batch exporting is done by exporting each individual array, but this separates the extension arrays from their metadata. I suspect PyArrow segfaults because it is receiving a plain array and then later told it is an extension in the final schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug python
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant