New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow pathlib.Paths as URLs #2310
Conversation
That seems like a wrong idea. Speaking as the author of the |
Thanks, @pitrou - do you think then that this same change would be useful if put into bytes/local.py:LocalFileSystem._trim_filename ? I'm not sure then what the test in get_fs_paths_myopen (is string like, or is list-like) should be. |
Or is it indeed better to not use Paths, and have the users call glob/str themselves? |
Path objects may be useful to accept in |
Is this ok to merge? |
Good with me unless @pitrou wants to put some test/warning around users incorrectly using Paths that are not simple, local. |
url = pathlib.Path('./.test.accounts.*') | ||
sample, values = read_bytes(url, blocksize=None) | ||
assert sum(map(len, values)) == len(files) | ||
with pytest.raises(OSError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the OSError say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally (in this test) it's IOError("%s resolved to no files" % urlpath)
in read_bytes, but it could be some form of "path not accessible".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wary of people using Path objects to store URLs, so it would be nice if that check was explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could you check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I mean in the dask.bytes
code, not in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but I still don't know how to check is a path is actually a URL :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How dask.bytes parse URLs? Does it delegate the parsing to something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses urlsplit. There is also a special case up front for windows paths (like c:\path\file) - that is all working on strings. We need to know whether the path is string-like or list-like, and whether it has glob characters before then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is it possible, when urlsplit returns, to raise an error if the scheme is non-empty and the original object was a Path? Or would that break the internal abstractions?
OK, so I spent far too long on this. It seems that pytest does some monkey-patching that interferes with pathlib and moto on windows only, calling str(Path) breaks moto for the rest of the test suite. |
Perhaps disable the pathlib tests on Windows? |
Because some sort of monkey-patch is breaking later moto tests
What is the status here. @pitrou are you comfortable with this? |
That's fine with me, yes. |
Fixes #2309
I may be wrong, but this seems a little niche usage to me, so I'm not sure it should go into any of the docstrings to avoid confusion. On the other hand, it doesn't hurt at all to include it.
I wonder what the expected behaviour should be with remote file-systems (or are Paths implicitly always local?). I am not familiar enough with typical usage of Path, but I notice that
str(Path('s3://bucket/file'))
->'s3:/bucket/file'
, which would be bad.