Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Table.to_pandas() failing when timezone-awareness mismatch in metadata #26481

Closed
asfimport opened this issue Nov 6, 2020 · 3 comments

Comments

@asfimport
Copy link

We're having an issue with timezones in the Table to_pandas methods. See example below.

import pyarrow as pa
import pandas as pd

print(pa.__version__)
# 2.0.0

df = pd.DataFrame({"time": pd.to_datetime([0, 0])})

time_field = pa.field("time",type=pa.timestamp("ms", tz="utc"), nullable=False)
schema = pa.schema([time_field])

tab = pa.Table.from_pandas(df, schema)

tab.to_pandas() 

# File ".../pandas_compat.py", line 777, in table_to_blockmanager
#   table = _add_any_metadata(table, pandas_metadata)
# File ".../pandas_compat.py", line 1184, in _add_any_metadata
#   tz = col_meta['metadata']['timezone']
# TypeError: 'NoneType' object is not subscriptable

Related issues:
catalyst-cooperative/pudl#705

Environment: Ubuntu 20.04, Python 3.8.6, Pandas 1.1.4
Reporter: Karl Dunkle Werner / @karldw
Assignee: Joris Van den Bossche / @jorisvandenbossche

PRs and other links:

Note: This issue was originally created as ARROW-10511. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Neal Richardson / @nealrichardson:
I tested this on the latest nightly wheel and reproduced the error. If you remove tz="utc", it does not error. (Not proposing that as a solution, just trying to provide more information on where the error is being triggered.)

@asfimport
Copy link
Author

Joris Van den Bossche / @jorisvandenbossche:
Repeating my comment from catalyst-cooperative/pudl#705 (comment): the problem lies in some inconsistent metadata, I think caused by creating the Table from a pandas DataFrame with tz-naive column but schema with timezone. That is certainly a case that should work, but so the to_pandas conversion cannot deal at the moment with this inconsistent metadata.

Will take a look at it next week

@asfimport
Copy link
Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 8625
#8625

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants