Skip to content

The metadata parameter of pa.filed is not valid #14728

@phpsxg

Description

@phpsxg

pa.field sets the metadata parameter, prints schema.metadata, but the metadata of columns is empty, how to correctly display the metadata of columns

class Dimension:
    INSTRUMENT_TYPE = 'instrument_type'
    INSTRUMENT_CODE = 'instrument_code'
    INSTRUMENT_NAME = 'instrument_name'



schema = pa.schema([
        pa.field(Dimension.INSTRUMENT_CODE, pa.string(), metadata={b"table_filed": b"FUND_CODE"}),
        pa.field(Dimension.INSTRUMENT_NAME, pa.string(), metadata={b"table_filed": b"FUND_NAME"}),
        pa.field(Dimension.INSTRUMENT_TYPE, pa.string())
    ],
        metadata={
            Dimension.INSTRUMENT_CODE: 'code',
            Dimension.INSTRUMENT_NAME: 'name',
            Dimension.INSTRUMENT_TYPE: 'type',
        }
    )
df = cls.query(sql)
        df.rename(columns=cls.etl_rename_dict, inplace=True)
        df[Dimension.INSTRUMENT_TYPE] = InstrumentType.FUND

        table = pa.Table.from_pandas(df, schema=cls.schema)
        # pq.write_table(table, 'test_parquet')
        table_schema = table.schema
        print(table_schema.metadata)

Print the results

{b'instrument_code': b'code', b'instrument_name': b'name', b'instrument_type': b'type', b'pandas': b'{"index_columns": [], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "instrument_code", "field_name": "instrument_code", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "instrument_name", "field_name": "instrument_name", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "instrument_type", "field_name": "instrument_type", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}], "creator": {"library": "pyarrow", "version": "10.0.0"}, "pandas_version": "1.5.1"}'}

  • python:3.10
  • pyarrow: 10.0.0

Component(s)

Parquet, Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions