You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Writing a table to parquet, then reading it back fails if:
One of the columns is a dictionary (came from a pandas Categorical), and
Passing the table's schema to read_table
Failing on attempt to cast int64 into dictionary (full stack trace below).
This seems related to ARROW-11157 - but even if losing the categorical type when reading from parquet, the reader should not barf when reading with the schema.
Minimal example of failing code:
importpandasaspdimportpyarrowaspaimportpyarrow.parquetaspqimportpyarrow.datasetasdsa = [1,2,3,4,1,2,3,4,1,2,3,4]
b = ["a"foriina]
c = [iforiinrange(len(a))]
df = pd.DataFrame({"a":a, "b":b, "c":c})
df['a'] = df['a'].astype('category')
print("df dtypes:\n", df.dtypes)
t = pa.Table.from_pandas(df, preserve_index=True)
s = t.schemads.write_dataset(t, format='parquet', base_dir='./test')
df2 = pq.read_table('./test', schema=s).to_pandas()
print("df2 dtypes:\n", df2.dtypes)
Writing a table to parquet, then reading it back fails if:
One of the columns is a dictionary (came from a pandas Categorical), and
Passing the table's schema to
read_table
Failing on attempt to cast int64 into dictionary (full stack trace below).
This seems related to ARROW-11157 - but even if losing the categorical type when reading from parquet, the reader should not barf when reading with the schema.
Minimal example of failing code:
Which gives:
Reporter: Yishai Beeri
Note: This issue was originally created as ARROW-17625. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: