Fix parallel metadata collection in pyarrow engine by rjzamora · Pull Request #9165 · dask/dask

rjzamora · 2022-06-06T14:59:24Z

Fixes a bug in parallel metadata collection in the "pyarrow" read_parquet engine for hive-partitioned data.

rjzamora · 2022-06-06T15:00:16Z

dask/dataframe/io/parquet/arrow.py

                pa_ds.dataset(
                    files_or_frags,
                    filesystem=fs,
+                    **dataset_options,


This is the critical change. Without these dataset options, the new fragment may be missing hive/directory-partitioning information.

ian-r-rose

LGTM, thanks @rjzamora

jrbourbeau

Thanks @rjzamora for the fix and @ian-r-rose for reviewing

fix parallel metadata colelction for partitioned data in pyarrow engine

9cb4202

rjzamora added io parquet labels Jun 6, 2022

github-actions bot added the dataframe label Jun 6, 2022

rjzamora commented Jun 6, 2022

View reviewed changes

ian-r-rose approved these changes Jun 6, 2022

View reviewed changes

jrbourbeau approved these changes Jun 6, 2022

View reviewed changes

jrbourbeau merged commit 8c9076a into dask:main Jun 6, 2022

rjzamora deleted the fix-partitioning-bug branch June 6, 2022 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix parallel metadata collection in pyarrow engine#9165

Fix parallel metadata collection in pyarrow engine#9165
jrbourbeau merged 1 commit intodask:mainfrom
rjzamora:fix-partitioning-bug

rjzamora commented Jun 6, 2022

Uh oh!

rjzamora Jun 6, 2022

Uh oh!

ian-r-rose left a comment

Uh oh!

jrbourbeau left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

rjzamora commented Jun 6, 2022

Uh oh!

rjzamora Jun 6, 2022

Choose a reason for hiding this comment

Uh oh!

ian-r-rose left a comment

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants