Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dataset example self-contained #41

Closed
ResidentMario opened this issue Oct 10, 2017 · 11 comments
Closed

Make dataset example self-contained #41

ResidentMario opened this issue Oct 10, 2017 · 11 comments

Comments

@ResidentMario
Copy link
Owner

I've consolidated the example datasets under a geopandas-like geoplot.datasets domain (here).

This has the beneficial effect that it makes all of the examples in the geoplot documentation, especially the ones in the gallery, immediately reproducible for the user. However, the drawback is that I also have to distribute the example data with the library. After hewing and hawing every which way, I've gotten that down to a ~10 MB examples.zip file.

@choldgraf I would like your feedback on this idea. Is a 10 MB add-on like this an acceptable load for such a library? Or should I maybe provide an nltk-like downloader instead?

@choldgraf
Copy link

I think you should store the data online someplace stable (maybe figshare) and use the module to download the data to ~/geoplot_data folder when the person requests it. IME that's the best long-term solution!

@ResidentMario
Copy link
Owner Author

I asked Reddit this question (here) and Jake VanderPlas actually recommended Quilt, a startup'd data package management tool, in the comments.

I'd heard of it before, but was wary of trusting a startup to continue existing long-term...but they've been around for a little while now, and it's got a VanderPlus thumbs-up, so I've gone ahead and integrated that in. And it works great! Lets me delegate the problem of dealing with file downloads to a dedicated package (quilt) that I don't need to worry about.

@choldgraf
Copy link

very interesting! I hadn't heard of that before. Lemme know how it goes!

@ResidentMario
Copy link
Owner Author

Went well so far! We'll see if this develops any hiccups.

@pybokeh
Copy link

pybokeh commented Nov 2, 2017

Hello, how do I install the geoplot_data? I installed quilt via pip install quilt. Then when I tried to run the example notebook tutorial and do the import:
from quilt.data.ResidentMario import geoplot_data

It says the module is not found. So I'm assuming I'm missing a step or need to install it somehow?? I tried pip install geoplot_data or pip install geoplot-data, but those packages don't exist. Perhaps can we have the geoplot_data pip-installable too?

@ResidentMario
Copy link
Owner Author

ResidentMario commented Nov 2, 2017

There's a CLI command or a Python statement you can run to get the data locally, described here: https://docs.quiltdata.com/use-a-package.html

I'll make this clearer the next time I revise the docs.

@pybokeh
Copy link

pybokeh commented Nov 3, 2017

@ResidentMario Thank you! All I did was run:

import quilt
quilt.install("ResidentMario/geoplot_data")

and was then able to run the example.

@roaringcat
Copy link

Hello,I get the same problem and the page above cound not be found.
How can I solve this to get the geoplot_data?
Thanks.

@ResidentMario
Copy link
Owner Author

The code snippet pybokeh included in his comment should do the job. I think the new URL is https://docs.quiltdata.com/get-started/step-by-step.

@roaringcat
Copy link

Oh I miss the important code above,thank you very much!
I get the geoplot_data finally!

@ResidentMario
Copy link
Owner Author

For future reference for anyone visiting this GH issue, the system for getting example data in geoplot has changed again. There is now a geoplot.datasets.get_path method which serves a URL to a specific example dataset from the ResidentMario/geoplot-data repository on GitHub. Now that geopandas has the ability to read URLs, grabbing an example dataset is as easy as:

import geoplot as gplt
import geopandas as gpd

gpd.read_file(gplt.datasets.get_path("world"))

This feature was introduced in geoplot@0.3.0, and is used throughout the updated documentation.

This removes the external dependency on quilt and the managerial overhead that comes with having yet another package manager.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants