Skip to content

Fix required_extension behavior in read_parquet#8351

Merged
jsignell merged 4 commits intodask:mainfrom
rjzamora:fix-require_extension-bug
Nov 5, 2021
Merged

Fix required_extension behavior in read_parquet#8351
jsignell merged 4 commits intodask:mainfrom
rjzamora:fix-require_extension-bug

Conversation

@rjzamora
Copy link
Copy Markdown
Member

@rjzamora rjzamora commented Nov 5, 2021

This PR changes the behavior of read_parquet(path, ..., required_extension=) to only apply when path is a directory without an active _metadata file. That is, we should not be filtering out "unsupported extensions" when the path is a single file, and specific list of files, or when we are using a _metadata file to tell us what files are valid.

@jakirkham
Copy link
Copy Markdown
Member

cc @jsignell @martindurant @jrbourbeau

if _metadata_exists and ignore_metadata_file:
fns.remove("_metadata")
_metadata_exists = False
if require_extension:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: this and/or L410 is where the single-file case would have been filtered, but now we only do filtering when the path is a directory and there is no _metadata .

If I understood correctly, I am fully behind this change.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup!

Copy link
Copy Markdown
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice! Thanks for the speedy turn around. I just have one comment about making the error a bit more informative.

rjzamora and others added 3 commits November 5, 2021 14:19
Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix @rjzamora! I'll hold off on releasing until this is merged

@jsignell jsignell merged commit a02d2b3 into dask:main Nov 5, 2021
@rjzamora rjzamora deleted the fix-require_extension-bug branch November 5, 2021 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: read_parquet fails on main (not on 2021.10.0)

5 participants