Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Filename-based partitioning scheme #29763

Open
asfimport opened this issue Sep 29, 2021 · 0 comments
Open

[Python] Filename-based partitioning scheme #29763

asfimport opened this issue Sep 29, 2021 · 0 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Sep 29, 2021

This originates from [this SO question|[https://stackoverflow.com/questions/69379083/read-a-partitioned-parquet-dataset-from-multiple-files-with-pyarrow-and-add-a-pa].]

The idea is to have a portioning scheme that would allow to construct a primary key from the filename.

Let's say that one is trying to read /data-N.parquet where N is an integer. That information should go in a primary key for later reference.

This is quite similar to have the files laid-out like this : /N/data.parquet so I imagine this is technically feasible.

Reporter: Cédric Hernalsteens

Related issues:

Note: This issue was originally created as ARROW-14176. Please see the migration documentation for further details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant