Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read CSV directly from URL #506

Open
kescobo opened this issue Oct 1, 2019 · 8 comments
Open

Read CSV directly from URL #506

kescobo opened this issue Oct 1, 2019 · 8 comments

Comments

@kescobo
Copy link
Contributor

kescobo commented Oct 1, 2019

This was mentioned in #3, but that was ~4 years ago. Not sure it's worth taking a dependency on HTTP.jl, but being able to do CSV.read(url) would be really great.

@quinnj
Copy link
Member

quinnj commented Oct 1, 2019

There has been interest generally to move MbedTLS and HTTP.jl as stdlibs, where it would be much easier to support this w/o having to take on extra dependencies.

@quinnj
Copy link
Member

quinnj commented Jun 26, 2020

I've added an example to the CSV.File/CSV.Rows docs on how to do this very simply (i.e. CSV.File(HTTP.get(url).body)), and if HTTP.jl ever becomes a stdlib, we can for sure support it natively.

@quinnj quinnj closed this as completed Jun 26, 2020
@kescobo
Copy link
Contributor Author

kescobo commented Jun 26, 2020

That seems like a good compromise 👍

@hungpham3112
Copy link
Contributor

Hi, I come from Python, in pandas they have read_csv() built-in function which really convenience to read csv file directly from URL.

In the perspective of Julia, using CSV.File(HTTP.get(url).body) a lot of time is quite redundant, verbose and not beginner-friendly. More than that, when people create a notebook to read csv and do they stuff, it seems csv only need to read one or two time in that instance. Therefore, the problem in here is adding HTTP.jl dependency which user rarely use, I believe CSV.jl can have its own small feature to reduce the size of dependency problem. It's really good if this issue can be open again to see people thought about this.

@kescobo
Copy link
Contributor Author

kescobo commented Jul 24, 2023

@hungpham3112 For packages like this, keeping really lean is important. I don't know that building a bespoke URL is worth the added development / maintenance burden.

That said, I wonder if a package-extension for people that also load HTTP.jl might be possible.

There has been interest generally to move MbedTLS and HTTP.jl as stdlibs, where it would be much easier to support this w/o having to take on extra dependencies.

Given this was 4 years ago, and the general trend lately is rather taking things out of stdlib, it might be worth revisiting

@hungpham3112
Copy link
Contributor

Given this was 4 years ago, and the general trend lately is rather taking things out of stdlib, it might be worth revisiting

I think it's worth reopening to see what people think.

@quinnj
Copy link
Member

quinnj commented Aug 2, 2023

I think if someone was up for it, the best path forward would be to use the Downloads stdlib. This would involve modifying the getbytebuffer function in the utils.jl file to do a regex match against the source to see if it's a URL, then using Downloads.jl to download the url into memory (but respecting the buffer_in_memory keyword arg) or to disk, then letting the rest of the read process continue normally. It's a bit tricky to add tests that rely on networking, but Base has set up a go port of httpbin.org that they've said we can use in the HTTP.jl package for tests. So you could write a test that hits one of those endpoints (https://httpbingo.julialang.org/) and have it return some csv data and then read the file from that. It'd also be good to test that gzipped csv data from a url is handled correctly.

Anyone want to take a stab at it?

@quinnj quinnj reopened this Aug 2, 2023
@DominiqueMakowski
Copy link

Just chiming in to say that this would be a very nice feature to have to make Julia more accessible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants