[Python] Filename-based partitioning scheme #29763

asfimport · 2021-09-29T19:32:27Z

This originates from [this SO question|[https://stackoverflow.com/questions/69379083/read-a-partitioned-parquet-dataset-from-multiple-files-with-pyarrow-and-add-a-pa].]

The idea is to have a portioning scheme that would allow to construct a primary key from the filename.

Let's say that one is trying to read /data-N.parquet where N is an integer. That information should go in a primary key for later reference.

This is quite similar to have the files laid-out like this : /N/data.parquet so I imagine this is technically feasible.

Reporter: Cédric Hernalsteens

Related issues:

[C++] Support for filename-based partitioning (is related to)

_{Note: This issue was originally created as ARROW-14176. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport mentioned this issue Jan 11, 2023

[C++] Support for filename-based partitioning #30158

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Filename-based partitioning scheme #29763

[Python] Filename-based partitioning scheme #29763

asfimport commented Sep 29, 2021 •

edited

Loading

[Python] Filename-based partitioning scheme #29763

[Python] Filename-based partitioning scheme #29763

Comments

asfimport commented Sep 29, 2021 • edited Loading

Related issues:

asfimport commented Sep 29, 2021 •

edited

Loading