Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Schema inference for pandas extension dtypes fails on indexes #32167

Closed
asfimport opened this issue Jun 15, 2022 · 1 comment
Closed

Comments

@asfimport
Copy link
Collaborator

Hi! pa.Schema.from_pandas called on a dataframe whose index is a pandas extension dtype (e.g., string[python]) results in an error:

import pyarrow as pa
df = pd.DataFrame({"a": [1, 2]}, index=pd.Index(["A", "B"], dtype="string"))
pa.Schema.from_pandas(df)

produces

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_1827952/3691394220.py in <module>
      1 import pyarrow as pa
      2 df = pd.DataFrame({"a": [1, 2]}, index=pd.Index(["A", "B"], dtype="string"))
----> 3 pa.Schema.from_pandas(df)

~/miniconda3/envs/dask/lib/python3.8/site-packages/pyarrow/types.pxi in pyarrow.lib.Schema.from_pandas()

~/miniconda3/envs/dask/lib/python3.8/site-packages/pyarrow/pandas_compat.py in dataframe_to_types(df, preserve_index, columns)
    527             type_ = pa.array(c, from_pandas=True).type
    528         elif _pandas_api.is_extension_array_dtype(values):
--> 529             type_ = pa.array(c.head(0), from_pandas=True).type
    530         else:
    531             values, type_ = get_datetimetz_type(values, c.dtype, None)

AttributeError: 'Index' object has no attribute 'head'

If I remove the head call, or convert the index to a series manually, things work.

Reported downstream in dask/dask#9186

Related issue from a couple of years ago: https://issues.apache.org/jira/browse/ARROW-8159
 

Reporter: Ian Rose
Assignee: James Bourbeau / @jrbourbeau

PRs and other links:

Note: This issue was originally created as ARROW-16838. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
Issue resolved by pull request 14080
#14080

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant