-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Field metadata is lost on serialization round-trip #18942
Copy link
Copy link
Closed
Description
It seems only schema metadata roundtrips, while field metadata is lost:
import pandas as pd
import pyarrow as pa
fnm = "/path/to/file.arr"
df = pd.DataFrame({"x": [0,1,2,3]})
tbl = pa.Table.from_pandas(df)
metadata = {"custom": "test"}
# Update field metadata, and schema metadata
fields = [col.field.add_metadata(metadata) for col in tbl.itercolumns()]
schema_metadata = {**tbl.schema.metadata, **metadata}
schema = pa.schema(fields, metadata=schema_metadata)
tbl = pa.Table.from_batches(tbl.to_batches(), schema=schema)
print(tbl.column(0).field.metadata) # correct :)
print(tbl.schema.field_by_name("x").metadata) # correct :)
print(tbl.schema) # correct :)
# Roundtrip
writer = pa.RecordBatchStreamWriter(fnm, tbl.schema)
writer.write_table(tbl)
writer.close()
reader = pa.RecordBatchStreamReader(fnm)
tbl = reader.read_all()
# Check
print(tbl.column(0).field.metadata) # None :(
print(tbl.schema.field_by_name("x").metadata) # None :(
print(tbl.schema) # Metadata good :)Reporter: Thomas Buhrmann / @buhrmann
PRs and other links:
Note: This issue was originally created as ARROW-2573. Please see the migration documentation for further details.
Reactions are currently unavailable