[Python] Add documentation about parquet.write_to_dataset and related methods #17851

asfimport · 2017-11-25T22:43:02Z

pyarrow does not only allow one to write to a single Parquet file but you can also write only the schema metadata for a full multi-file dataset. This dataset can also be automatically partitioned by one or more columns. At the moment, this functionality is not really visible in the documentation. You mainly find the API documentation for it but we should have a small tutorial-like section that explains the differences and use cases for each of these functions.

See also https://stackoverflow.com/questions/47482434/can-pyarrow-write-multiple-parquet-files-to-a-folder-like-fastparquets-file-sch

Reporter: Wes McKinney / @wesm
Assignee: Donal Simmie / @dsimmie

PRs and other links:

GitHub Pull Request #1925

_{Note: This issue was originally created as ARROW-1858. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2018-03-05T22:51:22Z

Wes McKinney / @wesm:
The lack of documentation about this came up in https://stackoverflow.com/questions/49085686/pyarrow-s3fs-partition-by-timetsamp

asfimport · 2018-04-21T20:22:05Z

Uwe Korn / @xhochy:
Issue resolved by pull request 1925
#1925

asfimport · 2021-02-16T12:42:13Z

ARF / @ARF1:
Even with the new documentation I am unclear on whether I can append to partitioned datasets.

I.e. is it possible to write a partitioned dataset when the entire dataset is too large to hold in memory prior to writing?

asfimport closed this as completed Apr 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Add documentation about parquet.write_to_dataset and related methods #17851

[Python] Add documentation about parquet.write_to_dataset and related methods #17851

asfimport commented Nov 25, 2017

asfimport commented Mar 5, 2018

asfimport commented Apr 21, 2018

asfimport commented Feb 16, 2021

[Python] Add documentation about parquet.write_to_dataset and related methods #17851

[Python] Add documentation about parquet.write_to_dataset and related methods #17851

Comments

asfimport commented Nov 25, 2017

PRs and other links:

asfimport commented Mar 5, 2018

asfimport commented Apr 21, 2018

asfimport commented Feb 16, 2021