You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Joris Van den Bossche / @jorisvandenbossche:
Currently, a list of files is already supported in ParquetDataset. So something like this (that would address the SO question, I think) works:
Do we think that is enough support? (if so, this issue can be closed I think)
Or do we want to add this to pq.read_table ? (which eg also accepts a directory name, which is then passed through to ParquetDataset. We could do a similar pass through for a list of paths)
Joris Van den Bossche / @jorisvandenbossche:
The new dataset API supports creating a dataset from a list of files (both in the higher level ds.dataset(..) which infers the schema as in the lower-level ds.FileSystemDataset(...)).
This functionality is now also exposed in pq.read_table and pq.ParquetDataset, with the use_legacy_dataset=False keyword (ARROW-8039).
See SO question for use case: https://stackoverflow.com/questions/52613682/load-multiple-parquet-files-into-dataframe-for-analysis
Reporter: Wes McKinney / @wesm
Related issues:
Note: This issue was originally created as ARROW-3424. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: