Improve path-related exceptions in read_hdf#6032
Conversation
| try: | ||
| exists = os.path.exists(path) | ||
| except (ValueError, TypeError): | ||
| exists = False |
There was a problem hiding this comment.
Under what circumstances does os.path.exists raise these types of exceptions?
If it doesn't raise exceptions then I wonder if this might be simpler as ...
for path in paths:
if not os.path.exists(path):
raise IOError(...)There was a problem hiding this comment.
ValueError is raised if for example the file name contains an embedded null byte. One example raising this exception is os.path.exists('a\x00b'), found at https://bugs.python.org/issue33721.
TypeError is raised if the argument is of nonsensical type, like a list (os.path,exists([]) does this).
So while those exceptions can be raised in general, it's a good question if they can be raised here. I think it mostly depends on how stringify_path treats such weird cases, but I couldn't find any in-depth documentation of this function.
|
OK, thank you for the explanation @psimaj . Merging this in. Also, I notice that this is your first code contribution to this repository. Welcome! |
Fixes #1613
I reviewed previous work on this issue, and this is my proposal of improved error messages in
read_hdf, alongside tests.Note that the case of empty list as an argument was considered ambiguous, as far as error handling goes. To me it would make sense if
read_hdfreturned an emptyDataFrameinstead of throwing an exception in this case, but as I haven't found any concrete definition of a "canonical" empty DaskDataFrame, it should be more consistent to just throw an error here. The fact thatconcatalso throws an exception if provided an empty list further makes me feel like we should avoid emptyDataFrames.Also the reason why I chose to return two distinct error messages is that according to Python docs,
os.path.existscan also returnfalseif the user has insufficient permissions to callos.stat.