[Python] Allow usage of field_names in partitioning when saving datasets #29385

asfimport · 2021-08-25T14:42:03Z

When loading back datasets, it's possible to quickly provide the name of the columns for which data was partitioned using

partitioning=pyarrow.dataset.partitioning(field_names=["year"])

this is convenient because it's easier and quicker than providing the whole schema, which can still be autodetected from the loaded data.

On the other side, we don't support this when saving data. If you provide field_names instead of the schema you will get a crash

pyarrow/dataset.py in _ensure_write_partitioning(scheme)
    684     if not isinstance(scheme, Partitioning):
    685         # TODO support passing field names, and get types from schema
--> 686         raise ValueError("partitioning needs to be actual Partitioning object")
    687     return scheme
    688

It would be convenient to allow to use field_names only even when saving as we can automatically detect the schema from the table itself that we are saving.

Reporter: Alessandro Molina / @amol-
Assignee: Alessandro Molina / @amol-

PRs and other links:

GitHub Pull Request #11008

_{Note: This issue was originally created as ARROW-13755. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2021-09-21T15:50:04Z

Joris Van den Bossche / @jorisvandenbossche:
Issue resolved by pull request 11008
#11008

asfimport closed this as completed Sep 21, 2021

asfimport assigned amol- Jan 10, 2023

asfimport added this to the 6.0.0 milestone Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Allow usage of field_names in partitioning when saving datasets #29385

[Python] Allow usage of field_names in partitioning when saving datasets #29385

asfimport commented Aug 25, 2021

asfimport commented Sep 21, 2021

[Python] Allow usage of field_names in partitioning when saving datasets #29385

[Python] Allow usage of field_names in partitioning when saving datasets #29385

Comments

asfimport commented Aug 25, 2021

PRs and other links:

asfimport commented Sep 21, 2021