Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Add documentation about parquet.write_to_dataset and related methods #17851

Closed
asfimport opened this issue Nov 25, 2017 · 3 comments
Closed

Comments

@asfimport
Copy link

pyarrow does not only allow one to write to a single Parquet file but you can also write only the schema metadata for a full multi-file dataset. This dataset can also be automatically partitioned by one or more columns. At the moment, this functionality is not really visible in the documentation. You mainly find the API documentation for it but we should have a small tutorial-like section that explains the differences and use cases for each of these functions.

See also https://stackoverflow.com/questions/47482434/can-pyarrow-write-multiple-parquet-files-to-a-folder-like-fastparquets-file-sch

Reporter: Wes McKinney / @wesm
Assignee: Donal Simmie / @dsimmie

PRs and other links:

Note: This issue was originally created as ARROW-1858. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Wes McKinney / @wesm:
The lack of documentation about this came up in https://stackoverflow.com/questions/49085686/pyarrow-s3fs-partition-by-timetsamp

@asfimport
Copy link
Author

Uwe Korn / @xhochy:
Issue resolved by pull request 1925
#1925

@asfimport
Copy link
Author

ARF / @ARF1:
Even with the new documentation I am unclear on whether I can append to partitioned datasets.

I.e. is it possible to write a partitioned dataset when the entire dataset is too large to hold in memory prior to writing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant