Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Pandas roundtrip with object-dtype column labels with integer values: data type "integer" not understood #25210

Closed
asfimport opened this issue Jun 10, 2020 · 3 comments

Comments

@asfimport
Copy link

The following will fail the roundtrip since the column indexes' pandas_type is converted from int64 to integer when an additional column is introduced and subsequently moved to the index:

 

df = pd.DataFrame(np.ones((3,1), index=[[1,2,3]])
df['foo'] = np.arange(3)
df = df.set_index('foo', append=True)
table = pyarrow.Table.from_pandas(df)
table.to_pandas()  # Errors

Reporter: Richard Wu
Assignee: Andrew Wieteska / @arw2019

PRs and other links:

Note: This issue was originally created as ARROW-9096. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
Thanks for the report. A smaller reproducer:

df = pd.DataFrame(np.random.randn(5, 1), columns=pd.Index([1], dtype=object))   
table = pa.Table.from_pandas(df)  
table.to_pandas() 

so what triggers this is to have an object-dtype index with integers as the column labels. We try to preserve the dtype of the column labels on roundtrip (that's why we store this in the pandas metadata), but this case is clearly not covered.

Always welcome to take a look.

@asfimport
Copy link
Author

Andrew Wieteska / @arw2019:
ARROW-3651 is related, I think

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Issue resolved by pull request 7822
#7822

@asfimport asfimport added this to the 2.0.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant