-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] List array conversion to Numpy N-d array #29892
Comments
Antoine Pitrou / @pitrou: However, it would be nice if this could work with a fixed-size-list array: >>> pa.array([[1,2,3],[4,5,6]], type=pa.list_(pa.int32(), 3))
<pyarrow.lib.FixedSizeListArray object at 0x7f787e2e8ac0>
[
[
1,
2,
3
],
[
4,
5,
6
]
]
>>> pa.array([[1,2,3],[4,5,6]], type=pa.list_(pa.int32(), 3)).to_numpy()
Traceback (most recent call last):
[...]
ArrowInvalid: Needed to copy 1 chunks with 0 nulls, but zero_copy_only was True |
Ben Epstein: pa.array([[1,2,3],[4,5,6]], type=pa.list_(pa.int32(), 3)).to_numpy(zero_copy_only=False).astype(np.int32) # # ValueError: setting an array element with a sequence. |
Joris Van den Bossche / @jorisvandenbossche: |
Ben Epstein: |
Joris Van den Bossche / @jorisvandenbossche: |
Ben Epstein: |
Antoine Pitrou / @pitrou:
|
Does the arrow team plan on building an internal fix for this? |
There has been recent discussion around support for tensors as a canonical extension type. That might be a better and more general purpose solution for this ask. I am not as familiar with all of the nuance involved but I would encourage you to take a look at [1][2][3] and see if that proposal could work. |
To be explicit, there is no "internal" fix to be done, as this conversion is already possible zero copy with preserving the dtype, if you convert the flat values (i.e. what Antoine showed above):
But so it is more a question about what user facing API we provide for this. Do we expect the user to do this themselves, or do we want to add some "to_numpy_2d" method to FixedSizeListArray that does that for you? But @westonpace makes a good point that the FixedShapeTensorArray extension type that is being added might be interesting, depending on your exact use case. The pyarrow API for that still needs to be finalized and merged, but we were planning to add a |
** Edit ** - StructArray or a just 3 arrays/vectors might be better. Wouldn't it be better to convert an arrow tensor type into an arrow list of structs? What is missing in the solution above are the names for x (3 columns) and y (2 rows).
Kids 1 | 2 | 3
OR which might be more "searchable"
|
When converting a single-dimensional array to numpy, the dtype is preserved
But when doing the same for a multi-dimensional array, the dtype is lost and cannot be set manually
Which is to say that numpy believes this array is not uniform. The only way to get it to the proper dtype is to convert it to a python list then back to a numpy array.
Is there another way to achieve this? Or, at least, can it be fixed such that we can manually set the dtype of the numpy array after conversion?
I know that pyarrow doesn't support ndarrays with ndim>1 (https://issues.apache.org/jira/browse/ARROW-5645) but I was curious if this can be achieved going the other way.
Reporter: Ben Epstein
Note: This issue was originally created as ARROW-14320. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: