Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Support conversion of pyarrow.RunEndEncodedArray to numpy/pandas #40659

Closed
Tracked by #34701
AlenkaF opened this issue Mar 19, 2024 · 1 comment
Closed
Tracked by #34701

Comments

@AlenkaF
Copy link
Member

AlenkaF commented Mar 19, 2024

Describe the enhancement requested

We want to enable the conversion of Run-End Encoded arrays to numpy and pandas. For example, this should not fail:

In [1]: import pyarrow as pa
   ...: import pyarrow.compute as pc
   ...: 
   ...: arr = pc.run_end_encode([1, 1, 2, 3, 3, 3, 6])
   ...: arr.to_numpy()
---------------------------------------------------------------------------
ArrowNotImplementedError                  Traceback (most recent call last)
Cell In[1], line 5
      2 import pyarrow.compute as pc
      4 arr = pc.run_end_encode([1, 1, 2, 3, 3, 3, 6])
----> 5 arr.to_numpy()

File ~/repos/arrow/python/pyarrow/array.pxi:1585, in pyarrow.lib.Array.to_numpy()
   1583 
   1584         with nogil:
-> 1585             check_status(ConvertArrayToPandas(c_options, self.sp_array,
   1586                                               self, &out))
   1587 

File ~/repos/arrow/python/pyarrow/error.pxi:91, in pyarrow.lib.check_status()
     89     return -1
     90 
---> 91 raise convert_status(status)
     92 
     93 

ArrowNotImplementedError: No known equivalent Pandas block for Arrow data of type run_end_encoded<run_ends: int32, values: int64> is known.

Component(s)

Python

@AlenkaF AlenkaF self-assigned this Mar 19, 2024
jorisvandenbossche added a commit that referenced this issue Mar 21, 2024
…ray to numpy/pandas (#40661)

### Rationale for this change

We want to enable the conversion of Run-End Encoded arrays to numpy and pandas.

### What changes are included in this PR?

In case of RunEndEncodedArray we first decode the array and then convert the decoded array to numpy or pandas.

### Are these changes tested?

Yes, in Python.

### Are there any user-facing changes?

No.
* GitHub Issue: #40659

Lead-authored-by: AlenkaF <frim.alenka@gmail.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
@jorisvandenbossche jorisvandenbossche added this to the 16.0.0 milestone Mar 21, 2024
@jorisvandenbossche
Copy link
Member

Issue resolved by pull request 40661
#40661

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants