Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow pathlib.Paths as URLs #2310

Merged
merged 8 commits into from May 31, 2017
Merged

Allow pathlib.Paths as URLs #2310

merged 8 commits into from May 31, 2017

Conversation

martindurant
Copy link
Member

Fixes #2309

I may be wrong, but this seems a little niche usage to me, so I'm not sure it should go into any of the docstrings to avoid confusion. On the other hand, it doesn't hurt at all to include it.

I wonder what the expected behaviour should be with remote file-systems (or are Paths implicitly always local?). I am not familiar enough with typical usage of Path, but I notice that str(Path('s3://bucket/file')) -> 's3:/bucket/file', which would be bad.

@pitrou
Copy link
Member

pitrou commented May 8, 2017

That seems like a wrong idea. Speaking as the author of the pathlib module, Paths have always been meant to be local, and their properties and behaviour reflect that.

@martindurant
Copy link
Member Author

Thanks, @pitrou - do you think then that this same change would be useful if put into bytes/local.py:LocalFileSystem._trim_filename ? I'm not sure then what the test in get_fs_paths_myopen (is string like, or is list-like) should be.

@martindurant
Copy link
Member Author

Or is it indeed better to not use Paths, and have the users call glob/str themselves?

@pitrou
Copy link
Member

pitrou commented May 8, 2017

Path objects may be useful to accept in read_csv and such functions. Then they would always point to a local path, not be treated as a URI.

@mrocklin
Copy link
Member

Is this ok to merge?

@martindurant
Copy link
Member Author

Good with me unless @pitrou wants to put some test/warning around users incorrectly using Paths that are not simple, local.

url = pathlib.Path('./.test.accounts.*')
sample, values = read_bytes(url, blocksize=None)
assert sum(map(len, values)) == len(files)
with pytest.raises(OSError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the OSError say?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally (in this test) it's IOError("%s resolved to no files" % urlpath) in read_bytes, but it could be some form of "path not accessible".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wary of people using Path objects to store URLs, so it would be nice if that check was explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could you check this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I mean in the dask.bytes code, not in the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but I still don't know how to check is a path is actually a URL :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How dask.bytes parse URLs? Does it delegate the parsing to something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses urlsplit. There is also a special case up front for windows paths (like c:\path\file) - that is all working on strings. We need to know whether the path is string-like or list-like, and whether it has glob characters before then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is it possible, when urlsplit returns, to raise an error if the scheme is non-empty and the original object was a Path? Or would that break the internal abstractions?

@martindurant
Copy link
Member Author

OK, so I spent far too long on this. It seems that pytest does some monkey-patching that interferes with pathlib and moto on windows only, calling str(Path) breaks moto for the rest of the test suite.
Upshot: the code does work, and tests would pass if only the s3 tests were put before the pathlib one. Thoughts, @pitrou ?

@pitrou
Copy link
Member

pitrou commented May 25, 2017

Perhaps disable the pathlib tests on Windows?

Martin Durant added 2 commits May 25, 2017 15:59
Because some sort of monkey-patch is breaking later moto tests
@mrocklin
Copy link
Member

What is the status here. @pitrou are you comfortable with this?

@pitrou
Copy link
Member

pitrou commented May 31, 2017

That's fine with me, yes.

@pitrou pitrou merged commit 6bd25dc into dask:master May 31, 2017
@sinhrks sinhrks added this to the 0.15.0 milestone Aug 30, 2017
@martindurant martindurant deleted the pathlib_paths branch September 8, 2017 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants