Skip to content

Improve path-related exceptions in read_hdf#6032

Merged
mrocklin merged 4 commits intodask:masterfrom
psimaj:master
Mar 23, 2020
Merged

Improve path-related exceptions in read_hdf#6032
mrocklin merged 4 commits intodask:masterfrom
psimaj:master

Conversation

@psimaj
Copy link
Copy Markdown
Contributor

@psimaj psimaj commented Mar 22, 2020

Fixes #1613
I reviewed previous work on this issue, and this is my proposal of improved error messages in read_hdf, alongside tests.
Note that the case of empty list as an argument was considered ambiguous, as far as error handling goes. To me it would make sense if read_hdf returned an empty DataFrame instead of throwing an exception in this case, but as I haven't found any concrete definition of a "canonical" empty Dask DataFrame, it should be more consistent to just throw an error here. The fact that concat also throws an exception if provided an empty list further makes me feel like we should avoid empty DataFrames.
Also the reason why I chose to return two distinct error messages is that according to Python docs, os.path.exists can also return false if the user has insufficient permissions to call os.stat.

Copy link
Copy Markdown
Member

@mrocklin mrocklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @psimaj ! In principle this looks great to me. I left one small comment about exceptions and os.path.exists. Thank you for helping to improve informative error messages.

try:
exists = os.path.exists(path)
except (ValueError, TypeError):
exists = False
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances does os.path.exists raise these types of exceptions?

If it doesn't raise exceptions then I wonder if this might be simpler as ...

for path in paths:
    if not os.path.exists(path):
        raise IOError(...)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError is raised if for example the file name contains an embedded null byte. One example raising this exception is os.path.exists('a\x00b'), found at https://bugs.python.org/issue33721.
TypeError is raised if the argument is of nonsensical type, like a list (os.path,exists([]) does this).
So while those exceptions can be raised in general, it's a good question if they can be raised here. I think it mostly depends on how stringify_path treats such weird cases, but I couldn't find any in-depth documentation of this function.

@mrocklin
Copy link
Copy Markdown
Member

OK, thank you for the explanation @psimaj . Merging this in.

Also, I notice that this is your first code contribution to this repository. Welcome!

@mrocklin mrocklin merged commit 2fae9e7 into dask:master Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

read hdf() throws misguided exception when file or key not found

2 participants