pa.field sets the metadata parameter, prints schema.metadata, but the metadata of columns is empty, how to correctly display the metadata of columns
class Dimension:
INSTRUMENT_TYPE = 'instrument_type'
INSTRUMENT_CODE = 'instrument_code'
INSTRUMENT_NAME = 'instrument_name'
schema = pa.schema([
pa.field(Dimension.INSTRUMENT_CODE, pa.string(), metadata={b"table_filed": b"FUND_CODE"}),
pa.field(Dimension.INSTRUMENT_NAME, pa.string(), metadata={b"table_filed": b"FUND_NAME"}),
pa.field(Dimension.INSTRUMENT_TYPE, pa.string())
],
metadata={
Dimension.INSTRUMENT_CODE: 'code',
Dimension.INSTRUMENT_NAME: 'name',
Dimension.INSTRUMENT_TYPE: 'type',
}
)
df = cls.query(sql)
df.rename(columns=cls.etl_rename_dict, inplace=True)
df[Dimension.INSTRUMENT_TYPE] = InstrumentType.FUND
table = pa.Table.from_pandas(df, schema=cls.schema)
# pq.write_table(table, 'test_parquet')
table_schema = table.schema
print(table_schema.metadata)
Print the results
{b'instrument_code': b'code', b'instrument_name': b'name', b'instrument_type': b'type', b'pandas': b'{"index_columns": [], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "instrument_code", "field_name": "instrument_code", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "instrument_name", "field_name": "instrument_name", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "instrument_type", "field_name": "instrument_type", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}], "creator": {"library": "pyarrow", "version": "10.0.0"}, "pandas_version": "1.5.1"}'}
- python:3.10
- pyarrow: 10.0.0
Component(s)
Parquet, Python
pa.field sets the metadata parameter, prints schema.metadata, but the metadata of columns is empty, how to correctly display the metadata of columns
Print the results
Component(s)
Parquet, Python