You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm fairly new to pyarrow so I apologize if this is already a feature, but I couldn't find a solution in the documentation nor an existing issue. Basically I'm trying to export pandas dataframes to .parquet files with partitions. I can see that pyarrow.parquet has a way of reading .parquet files with partitions, but there's no indication that it can write with partitions. E.g., it would be nice if there was a parameter in pyarrow.Table.write_table() that took a list of columns to partition the table similar to the pyspark implementation: spark.write.parquet's "partitionBy" parameter.
Wes McKinney / @wesm:
This would be a very useful feature. The simplest way to do this in the short term will be to generate the partition scheme from a pandas.DataFrame using pandas operations to split the object into pieces. We should add a function in pyarrow.parquet which enables data to be "inserted" into a directory containing a standard Hive-like partition schema. So you could do something like (just spitballing here)
Here dataset_path is a directory, and this will write a new Parquet file in the appropriate location in the subdirectory structure if partition_keys is not None.
A patch would be welcome. I will mark this issue for 0.7.0
I'm fairly new to pyarrow so I apologize if this is already a feature, but I couldn't find a solution in the documentation nor an existing issue. Basically I'm trying to export pandas dataframes to .parquet files with partitions. I can see that pyarrow.parquet has a way of reading .parquet files with partitions, but there's no indication that it can write with partitions. E.g., it would be nice if there was a parameter in pyarrow.Table.write_table() that took a list of columns to partition the table similar to the pyspark implementation: spark.write.parquet's "partitionBy" parameter.
Referenced links:
https://arrow.apache.org/docs/python/parquet.html
https://arrow.apache.org/docs/python/parquet.html?highlight=pyarrow%20parquet%20partition
Environment: Mac OS Sierra 10.12.6
Reporter: Safyre Anderson / @saffrydaffry
Assignee: Safyre Anderson / @saffrydaffry
Note: This issue was originally created as ARROW-1400. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: