-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] List<Extension> arrays aren't supported in to_pandas calls #32791
Comments
Chang She / @changhiskhan:
For computer vision data this is actually fairly important as very often we have a list-of-labels or list-of-Box2d per row (image) |
Joris Van den Bossche / @jorisvandenbossche: On the one hand, that would be consistent with StructArray, where we also fall back to the storage array at the moment. On the other hand, if we want to solve this more "properly" later, that would mean another change in behaviour. |
Joris Van den Bossche / @jorisvandenbossche:
For other list arrays, we actually do not convert to lists but to numpy arrays: In [3]: pa.array([[1, 2], [3, 4, 5]]).to_numpy(zero_copy_only=False)
Out[3]: array([array([1, 2]), array([3, 4, 5])], dtype=object)
In [4]: pa.array([[1, 2], [3, 4, 5]]).to_pandas().values
Out[4]: array([array([1, 2]), array([3, 4, 5])], dtype=object) So it could also be an option to keep using arrays, instead of using lists in case of ExtensionType. And then, if we can somehow convert the list array's values to a single array (calling into the Python
That actually brings up a question: if an ExtensionType defines an ExtensionScalar (but not an associciated pandas dtype, or custom to_numpy conversion), should we use this scalar's Because currently, if you have an ExtensionArray like that (for example using the example from the docs: https://arrow.apache.org/docs/dev/python/extending_types.html#custom-scalar-conversion), we still use the storage type conversion for to_numpy/to_pandas, and only use the scalar's conversion in |
Joris Van den Bossche / @jorisvandenbossche: |
Micah Kornfield / @emkornfield: |
EXTENSION is not in the list of types allowed. I think in order to enable EXTENSION we need to be able to call to_pylist or similar on the original extension array from C++ code, in case there were user provided overrides. Off the top of my head one way of doing this would be to pass through an additional std::unorderd_map<Array*, PyObject*> where PyObject is the bound to_pylist python function. Are there other alternative that might be cleaner?
Reporter: Micah Kornfield / @emkornfield
Related issues:
Note: This issue was originally created as ARROW-17535. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: