Is this expected behaviour?

```
import os
import pyarrow as pa
import pyarrow.parquet as pq

df = pd.DataFrame(dict(symbol=["A", "B", "C", "D"], year=[2017, 2018, 2019, 2020], close=np.arange(4)))

root_path = "test"
os.makedirs(root_path, exist_ok=True)
dataset = ds.dataset(root_path, format="parquet", partitioning="hive")

table1 = pa.Table.from_pandas(df)
print(f"\nbefore:\n{table.schema.to_string(show_field_metadata=False)}")
pq.write_to_dataset(table, root_path=root_path, partition_cols=["symbol", "year"])

table2 = dataset.to_table()
print(f"\nafter:\n{table2.schema.to_string(show_field_metadata=False)}")
```

before:
symbol: string
year: int64
close: int64
-- schema metadata --
pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 582

after:
close: int64
symbol: string
year: int32
-- schema metadata --
pandas: '{"index_columns": [], "column_indexes": [{"name": null, "field_n' + 300

i.e. column ordering and types. I suspect this might be due to partitioning. Should I be storing additional metadata and using it when subsequently retrieving?

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is this expected behaviour? #8420

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is this expected behaviour? #8420

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions