[Python][Dataset] Improve ergonomy of the FileSystemDataset constructor #24483

asfimport · 2020-03-31T14:47:45Z

Currently, to manually create a FileSystemDataset, you can do something like:

dataset = ds.FileSystemDataset(
        schema, None, ds.ParquetFileFormat(), pa.fs.LocalFileSystem(),
        ["data_file1.parquet", "data_file2.parquet"],
        [ds.field('file') == 1, ds.field('file') == 2])

There are some usibility improvements we can do though:

Allow passing the arguments by name to improve readability of the calling code (now they all need to be passed positionally, due to the way they are implemented in cython as not None)
I would maybe change the order of the arguments (eg start with the paths, we don't need to match the order of the C++ constructor)
Potentially allow partitions to be optional, in which case they need to be set to a list of ScalarExpression(True) values.

Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Joris Van den Bossche / @jorisvandenbossche

PRs and other links:

GitHub Pull Request #6913

_{Note: This issue was originally created as ARROW-8290. Please see the migration documentation for further details.}

The text was updated successfully, but these errors were encountered:

asfimport · 2020-03-31T14:54:40Z

Ben Kietzman / @bkietz:
Small amenity: if an empty vector is passed for partitions we will populate it with scalar(true) automatically

asfimport · 2020-04-13T19:32:46Z

Krisztian Szucs / @kszucs:
Issue resolved by pull request 6913
#6913

asfimport closed this as completed Apr 13, 2020

asfimport assigned jorisvandenbossche Jan 10, 2023

asfimport added this to the 0.17.0 milestone Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python][Dataset] Improve ergonomy of the FileSystemDataset constructor #24483

[Python][Dataset] Improve ergonomy of the FileSystemDataset constructor #24483

asfimport commented Mar 31, 2020

asfimport commented Mar 31, 2020

asfimport commented Apr 13, 2020

[Python][Dataset] Improve ergonomy of the FileSystemDataset constructor #24483

[Python][Dataset] Improve ergonomy of the FileSystemDataset constructor #24483

Comments

asfimport commented Mar 31, 2020

PRs and other links:

asfimport commented Mar 31, 2020

asfimport commented Apr 13, 2020