You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am saving a dask dataframe to parquet with two partition columns using the pyarrow engine. The problem arises in scanning the partition columns. When I scan using the directory path, I get the partition columns in the output dataframe, whereas if I scan using the glob path, I dont get these columns
Fastparquet supports the glob path, but somehow pyarrow doesn't
The problem lies in the _make_manifest function. For a single path, it calls the _visit_level fuction which successfully updates the partitions. For a glob path it doesnt call _visit_level fuction, it just creates a list of ParquetDatasetPiece objects with no partitions
The text was updated successfully, but these errors were encountered:
I am saving a dask dataframe to parquet with two partition columns using the pyarrow engine. The problem arises in scanning the partition columns. When I scan using the directory path, I get the partition columns in the output dataframe, whereas if I scan using the glob path, I dont get these columns
pyarrow : 0.9.0.post1
dask : 0.17.1
Fastparquet supports the glob path, but somehow pyarrow doesn't
The problem lies in the _make_manifest function. For a single path, it calls the _visit_level fuction which successfully updates the partitions. For a glob path it doesnt call _visit_level fuction, it just creates a list of ParquetDatasetPiece objects with no partitions
The text was updated successfully, but these errors were encountered: