Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include smaller example data for users to follow along (and for future tests) #45

Closed
zkamvar opened this issue Jan 29, 2018 · 4 comments

Comments

@zkamvar
Copy link
Contributor

zkamvar commented Jan 29, 2018

This package is meant to tackle the visualization tasks of large data sets, and the provided examples are fantastic for demonstrating the utter complexity that users may face. I'm especially glad to see that you have posted examples of how you munged the data. This is quite valuable to fair-weather Python users such as myself. 👍

However, in order to follow along, users must start by downloading all 1M+ rows (and growing!) of the NYPDMVC data set. 😿 My suggestion would be to include a small subset of these data in the package (I believe you can specify the location with package_data in your setup file).

@ResidentMario
Copy link
Owner

I had success using the Quilt package manager (https://quiltdata.com/) for this task for my geoplot project. Reckon I'll try to do that again with this dataset. Been meaning to do it for a while, but haven't gotten to it before.

Having the complete dataset is important in the last demo in particular: the geospatial demo really does require all that data to show a "dense enough" outline of the city.

I'll wrangle something up.

@zkamvar
Copy link
Contributor Author

zkamvar commented Jan 29, 2018

To clarify a bit:

I'm not suggesting you get rid of the large examples all together. I think it would be beneficial to have a small example for the quick start and then move on to the larger example to show the real power of the package.

@rhiever
Copy link

rhiever commented Jan 31, 2018

Big 👍 to this suggestion from @zkamvar. It took far too long for me to download the example dataset (~250 MB for the one I downloaded). A small dataset (<1 MB) crafted for demo purposes would be ideal, such that users can install your package and instantly run quickstart examples from the README. Very important for uptake of projects.

@ResidentMario
Copy link
Owner

Done. A sample collisions dataset is now externalized via quilt (you can also just download it directly from a companion rep), and I've updated all of the Quickstart visualizations to use it. I've removed the housing dataset, which I no longer have at hand, would be hard to recreate, and mostly just pads the length out IMHO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants