You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
write_to_dataset with pandas fields using pandas.ExtensionDtype nullable int or string produce parquet file which when read back in has different dtypes than original df
I have also tried writing common metadata at the top-level directory of a partitioned dataset and then passing metadata to read_table, but results are the same as without metadata
pq.write_metadata(table.schema, parquet_dataset+'_common_metadata', version='2.0') meta = pq.read_metadata(parquet_dataset+'_common_metadata') pq.read_table(parquet_dataset,metadata=meta).to_pandas().dtypes
This also affects pandas to_parquet when partition_cols is specified:
, when converting the table back to a dataframe to use pandas' groupby.
I am not fully sure what the original reason was to have that ignore_metadata, and there also doesn't seem to be a test failing if I remove that.
(in general, the write_to_dataset is not very efficient right now, because it converts back and forth to pandas)
write_to_dataset with pandas fields using pandas.ExtensionDtype nullable int or string produce parquet file which when read back in has different dtypes than original df
write_table handles schema correctly, pandas.ExtensionDtype survive round trip:
However, write_to_dataset reverts back to object/float:
I have also tried writing common metadata at the top-level directory of a partitioned dataset and then passing metadata to read_table, but results are the same as without metadata
This also affects pandas to_parquet when partition_cols is specified:
Environment: pandas 1.0.1
parquet 0.16
Reporter: Ged Steponavicius
Assignee: Joris Van den Bossche / @jorisvandenbossche
Related issues:
PRs and other links:
Note: This issue was originally created as ARROW-8251. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: