Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] __arrow_array__ does not work for ExtensionTypes in Table.from_pandas #23334

Closed
asfimport opened this issue Oct 29, 2019 · 2 comments
Closed

Comments

@asfimport
Copy link

When someone has a custom ExtensionType defined in Python, and an array class that gets converted to that (through \_\_arrow_array\_\_), the conversion in pyarrow works with the array class, but not yet for the array stored in a pandas DataFrame.

Eg using my definition of ArrowPeriodType in pandas-dev/pandas#28371, I see:

In [15]: pd_array = pd.period_range("2012-01-01", periods=3, freq="D").array                                                                                                                                       

In [16]: pd_array                                                                                                                                                                                                  
Out[16]: 
<PeriodArray>
['2012-01-01', '2012-01-02', '2012-01-03']
Length: 3, dtype: period[D]

In [17]: pa.array(pd_array)                                                                                                                                                                                        
Out[17]: 
<pyarrow.lib.ExtensionArray object at 0x7f657cf78768>
[
  15340,
  15341,
  15342
]

In [18]: df = pd.DataFrame({'periods': pd_array})                                                                                                                                                                  

In [19]: pa.table(df)                                                                                                                                                                                              
...
ArrowInvalid: ('Could not convert 2012-01-01 with type Period: did not recognize Python value type when inferring an Arrow data type', 'Conversion failed for column periods with type period[D]')

(this is working correctly for array objects whose \_\_arrow_array\_\_ is returning a built-in pyarrow Array).

Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Joris Van den Bossche / @jorisvandenbossche

PRs and other links:

Note: This issue was originally created as ARROW-7022. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
In the end, this appears not related to the fact that they return an arrow ExtensionType array, but was a bug specifically to pandas' Interval and Period types, as those types have somewhat inconsistent (historical) behaviour for Series.values (they return an object ndarray instead of the extension array).

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 5753
#5753

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants