Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python]pa.FixedShapeTensorArray.to_numpy_ndarray fails on sliced arrays #35573

Closed
mariosasko opened this issue May 12, 2023 · 1 comment · Fixed by #36164
Closed

[Python]pa.FixedShapeTensorArray.to_numpy_ndarray fails on sliced arrays #35573

mariosasko opened this issue May 12, 2023 · 1 comment · Fixed by #36164

Comments

@mariosasko
Copy link

mariosasko commented May 12, 2023

Describe the bug, including details regarding any error messages, version, and platform.

MRE:

import pyarrow as pa
import numpy as np
np_arr = np.random.randn(100, 3, 4)
pa_arr = pa.FixedShapeTensorArray.from_numpy_ndarray(np_arr)
pa_arr[:2].to_numpy_ndarray()

The error stems from the following line:

np_flat = np.asarray(self.storage.values)

(.values ignores the slicing offset)

Component(s)

Python

@mariosasko mariosasko changed the title pa.FixedShapeTensorArray.to_numpy_ndarray fails on sliced arrays [Python]pa.FixedShapeTensorArray.to_numpy_ndarray fails on sliced arrays May 12, 2023
@AlenkaF
Copy link
Member

AlenkaF commented May 15, 2023

Thank you for reporting the issue!
You are correct. It seems .storage holds the correct sliced part of the array and .storage.values does not:

(Pdb) len(pa_arr.storage.values)
1200
(Pdb) len(pa_arr.storage)
100

So this looks like a bug in pyarrow.FixedSizeListArray.values:

>>> import pyarrow as pa
>>> arr = pa.array([[1, 2, 3], [], None, [4]], type=pa.list_(pa.int32()))
>>> arr.values
<pyarrow.lib.Int32Array object at 0x128665f60>
[
  1,
  2,
  3,
  4
]
>>> arr[:2].values
<pyarrow.lib.Int32Array object at 0x1286661a0>
[
  1,
  2,
  3,
  4
]

I think this issue #35360 has the same underlying problem.

@AlenkaF AlenkaF self-assigned this Jun 19, 2023
@AlenkaF AlenkaF added this to the 13.0.0 milestone Jun 19, 2023
jorisvandenbossche pushed a commit that referenced this issue Jun 29, 2023
… sliced arrays (#36164)

### Rationale for this change
`pa.FixedShapeTensorArray.to_numpy_ndarray` fails if called on a sliced `FixedShapeTensorArray`.

### What changes are included in this PR?
The use of `pyarrow.FixedSizeListArray.values` is replaced with `pyarrow.FixedSizeListArray.flatten()` in `FixedShapeTensorArray.to_numpy_ndarray`.

### Are these changes tested?
Yes, test is added to _python/pyarrow/tests/test_extension_type.py_

### Are there any user-facing changes?
No.
* Closes: #35573

Authored-by: AlenkaF <frim.alenka@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants