Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Python] PyArrow crash when filter/take empty Extension array #29137

Closed
asfimport opened this issue Jul 28, 2021 · 4 comments
Closed

[C++][Python] PyArrow crash when filter/take empty Extension array #29137

asfimport opened this issue Jul 28, 2021 · 4 comments

Comments

@asfimport
Copy link
Collaborator

PyArrow is crashing when applying filter or take on already empty extension arrays.

The bug can be reproduced with the documentation example:

import pyarrow as pa

class Point3DArray(pa.ExtensionArray):
    def to_numpy_array(self):
        return self.storage.flatten().to_numpy().reshape((-1, 3))


class Point3DType(pa.PyExtensionType):
    def __init__(self):
        pa.PyExtensionType.__init__(self, pa.list_(pa.float32(), 3))

    def __reduce__(self):
        return Point3DType, ()

    def __arrow_ext_class__(self):
        return Point3DArray

storage = pa.array([[1, 2, 3], [4, 5, 6]], pa.list_(pa.float32(), 3))
arr = pa.ExtensionArray.from_storage(Point3DType(), storage)
arr = arr.filter(pa.array([False, False]))

# Crashing here...
arr.filter(pa.array([], pa.bool_()))
# Crashing as well...
arr.take(pa.array([], pa.int32()))

The underlying issue seems to be that the function nulls is not implemented for extension types in the C++ codebase: 

Status Visit(const DataType& type) {

Environment: Python 3.7, Ubuntu 20.04
Reporter: Paul Balanca / @balancap
Assignee: Joris Van den Bossche / @jorisvandenbossche

PRs and other links:

Note: This issue was originally created as ARROW-13474. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
@balancap Thanks for the clear bug report! I can confirm the issue and your analysis looks correct. You are looking into a fix for this?

@asfimport
Copy link
Collaborator Author

Paul Balanca / @balancap:
Thanks @jorisvandenbossche  I don't want to over-promise. Next two weeks are a bit busy for me, hopefully after I will get a bit more time. There are a couple of methods not supported currently by PyArrow on extension types (e.g. nulls, concat_arrays, combine_chunks) that I would be interested in implementing for our internal tooling at FiveAI.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
I opened a PR to fix this (so we can get it into the next release).

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 11227
#11227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants