Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Datetimes from non-DateTimeIndex cannot be deserialized #19958

Closed
asfimport opened this issue Oct 30, 2018 · 2 comments
Closed

[Python] Datetimes from non-DateTimeIndex cannot be deserialized #19958

asfimport opened this issue Oct 30, 2018 · 2 comments

Comments

@asfimport
Copy link

Given an index which contains datetimes but is no DateTimeIndex writing the file works but reading back fails.

df = pd.DataFrame(1, index=pd.MultiIndex.from_arrays([[1,2],[3,4]]), columns=[pd.to_datetime("2018/01/01")])

# columns index is no DateTimeIndex anymore
df = df.reset_index().set_index(['level_0', 'level_1'])

table = pa.Table.from_pandas(df)
pq.write_table(table, 'test.parquet')

pq.read_pandas('test.parquet').to_pandas()

results in

KeyError                                  Traceback (most recent call last)
~/venv/mpptool/lib/python3.7/site-packages/pyarrow/pandas_compat.py in _pandas_type_to_numpy_type(pandas_type)
    676     try:
--> 677         return _pandas_logical_type_map[pandas_type]
    678     except KeyError:

KeyError: 'datetime'

The created schema:

2018-01-01 00:00:00: int64
level_0: int64
level_1: int64
metadata
--------
{b'pandas': b'{"index_columns": ["level_0", "level_1"], "column_indexes": [{"n'
            b'ame": null, "field_name": null, "pandas_type": "datetime", "nump'
            b'y_type": "object", "metadata": null}], "columns": [{"name": "201'
            b'8-01-01 00:00:00", "field_name": "2018-01-01 00:00:00", "pandas_'
            b'type": "int64", "numpy_type": "int64", "metadata": null}, {"name'
            b'": "level_0", "field_name": "level_0", "pandas_type": "int64", "'
            b'numpy_type": "int64", "metadata": null}, {"name": "level_1", "fi'
            b'eld_name": "level_1", "pandas_type": "int64", "numpy_type": "int'
            b'64", "metadata": null}], "pandas_version": "0.23.4"}'}

Reporter: Armin Berres
Assignee: Wes McKinney / @wesm

PRs and other links:

Note: This issue was originally created as ARROW-3651. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Armin Berres:
Not sure but maybe Pandas should behave different in this case as well and create a DateTimeIndex index in this case as the complete index consists of Timestamp objects?

df.columns = pd.to_datetime(df.columns) in the code above mitigates the problem.

 

@asfimport
Copy link
Author

Wes McKinney / @wesm:
Issue resolved by pull request 5311
#5311

@asfimport asfimport added this to the 0.15.0 milestone Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants