[Python] Ability to create partitions when writing to Parquet #15457

asfimport · 2017-08-23T01:02:34Z

I'm fairly new to pyarrow so I apologize if this is already a feature, but I couldn't find a solution in the documentation nor an existing issue. Basically I'm trying to export pandas dataframes to .parquet files with partitions. I can see that pyarrow.parquet has a way of reading .parquet files with partitions, but there's no indication that it can write with partitions. E.g., it would be nice if there was a parameter in pyarrow.Table.write_table() that took a list of columns to partition the table similar to the pyspark implementation: spark.write.parquet's "partitionBy" parameter.

Referenced links:
https://arrow.apache.org/docs/python/parquet.html
https://arrow.apache.org/docs/python/parquet.html?highlight=pyarrow%20parquet%20partition

Environment: Mac OS Sierra 10.12.6
Reporter: Safyre Anderson / @saffrydaffry
Assignee: Safyre Anderson / @saffrydaffry

_{Note: This issue was originally created as ARROW-1400. Please see the migration documentation for further details.}

asfimport · 2017-08-23T01:48:38Z

Wes McKinney / @wesm:
This would be a very useful feature. The simplest way to do this in the short term will be to generate the partition scheme from a pandas.DataFrame using pandas operations to split the object into pieces. We should add a function in pyarrow.parquet which enables data to be "inserted" into a directory containing a standard Hive-like partition schema. So you could do something like (just spitballing here)

pq.write_table_to_dataset(dataset_path, partition_keys=keys, **options)

Here dataset_path is a directory, and this will write a new Parquet file in the appropriate location in the subdirectory structure if partition_keys is not None.

A patch would be welcome. I will mark this issue for 0.7.0

asfimport · 2017-08-24T06:32:48Z

Safyre Anderson / @saffrydaffry:
Submitted a pull request (991) for a hot fix: #991.

asfimport · 2017-09-04T02:37:33Z

Wes McKinney / @wesm:
Issue resolved by pull request 991
#991

asfimport closed this as completed Sep 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Ability to create partitions when writing to Parquet #15457

[Python] Ability to create partitions when writing to Parquet #15457

asfimport commented Aug 23, 2017

asfimport commented Aug 23, 2017

asfimport commented Aug 24, 2017

asfimport commented Sep 4, 2017

[Python] Ability to create partitions when writing to Parquet #15457

[Python] Ability to create partitions when writing to Parquet #15457

Comments

asfimport commented Aug 23, 2017

asfimport commented Aug 23, 2017

asfimport commented Aug 24, 2017

asfimport commented Sep 4, 2017