Skip to content

Extract partition list from pyarrow.dataset.ParquetFileFragment object #34212

@ayouqi

Description

@ayouqi

Describe the enhancement requested

A pyarrow.dataset.ParquetFileFragment object has two elements: path and partition. The path can be extracted using path attribute but there is no attribute to get the partition.
For example for the following object:
<pyarrow.dataset.ParquetFileFragment path=pq-test/Location=US/Industry=HT/dce9900c46f94ec3a8dca094cf62bd34-0.parquet partition=[Industry=HT, Location=US]>

object.path returns pq-test/Location=US/Industry=HT/dce9900c46f94ec3a8dca094cf62bd34-0.parquet but object.partition has not been defined.

Also, it seems the order of partition list is reverse.

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions