Enabling glob with HTTP file-system#3926
Conversation
This derives from fsspec's abstract file-system, to show reusability of glob and walk code.
|
cc @mmccarty , who wanted glob for HTTP |
|
I tried to give this a shot but ran into |
|
Sorry about that, @mrocklin , seems like I didn't successfully upload versioneer pieced. It should install now from github. |
|
I guess there's no impetus for moving this code out to fsspec; should I add the ls and (partial) glob directly to HTTPFileSystem? I can keep that in this PR. |
|
@mrocklin , I guess this idea has stalled; happy to close as "uninteresting". I may, at a later date, do a more thorough job in fsspec, but it's not a priority. |
|
I'm curious, what caused it to stall? Not enough review? Not a good idea
in the first place? Another solution arose to the motivating problem?
other?
…On Mon, Sep 10, 2018 at 11:46 AM, Martin Durant ***@***.***> wrote:
@mrocklin <https://github.com/mrocklin> , I guess this idea has stalled;
happy to close as "uninteresting". I may, at a later date, do a more
thorough job in fsspec, but it's not a priority.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3926 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszEDUzJG5W8PEuU71xE5rlDvBu28Fks5uZolkgaJpZM4WVbRh>
.
|
|
I think folks are interested in seeing this implemented. I thought the question was where it should go, in dask or fsspec? |
|
Yes, the idea was to get a conversation going on a couple of interrelated things, which is why this makes for a poor PR:
|
|
An effort to make fsspec more than just a demonstration class and into a usable file-system, to which open_files and its requirememnts, as well as HTTPFileSystem could be added: https://github.com/martindurant/filesystem_spec/pull/12 (still requires a lot of testing and docs before anyone should use it for anything) |
Backport changes from fsspec
|
Pulled out fsspec, so that this can work with the current dask bytes infrastructure. Uses some code in fsspec and will contribute some back too. |
|
@jcrist , you might have some opinions here |
| raise NotImplementedError | ||
|
|
||
| def isdir(self, path): | ||
| return True |
There was a problem hiding this comment.
This is odd, but in HTTP, any URL can contain more files at paths below it
| def test_parquet(): | ||
| from distutils.version import LooseVersion | ||
| if LooseVersion(requests.__version__) < LooseVersion("2.21.0"): | ||
| pytest.skip() |
There was a problem hiding this comment.
This is because older requests seems not to be able to make a HEAD request successfully to find out how big the file is.
|
I think this can be merged, and may be useful to some - still will want to push for separation of such stuff into fsspec following the permanent adoption of py3. Anyone have any objections? |
* Example of enabling glob with HTTP file-system This derives from fsspec's abstract file-system, to show reusability of glob and walk code. * ls should return sorted, unique list * more backporting * revert options * remove stray print * remove import for flake * Skip filesize on older requests
(not meant to be merged, just for discussion)
I have implemented a simplistic way to do
lsfor HTTP, which has been often requested. It laods the target page, and looks for HREFs that look like they are children. I'm sure it has many cases that would break it, but the point is to demonstrate reuse of code in fsspec, so we getwalkandglobfor free by definingls.This kind of thing begs the question: do file-systems like this belong in dask? What about the rest of the bytes functionality, some of which is very dask specific, some of which is not?