Include smaller example data for users to follow along (and for future tests) #45

zkamvar · 2018-01-29T16:50:21Z

This package is meant to tackle the visualization tasks of large data sets, and the provided examples are fantastic for demonstrating the utter complexity that users may face. I'm especially glad to see that you have posted examples of how you munged the data. This is quite valuable to fair-weather Python users such as myself. 👍

However, in order to follow along, users must start by downloading all 1M+ rows (and growing!) of the NYPDMVC data set. 😿 My suggestion would be to include a small subset of these data in the package (I believe you can specify the location with package_data in your setup file).

The text was updated successfully, but these errors were encountered:

ResidentMario · 2018-01-29T16:55:20Z

I had success using the Quilt package manager (https://quiltdata.com/) for this task for my geoplot project. Reckon I'll try to do that again with this dataset. Been meaning to do it for a while, but haven't gotten to it before.

Having the complete dataset is important in the last demo in particular: the geospatial demo really does require all that data to show a "dense enough" outline of the city.

I'll wrangle something up.

zkamvar · 2018-01-29T17:11:48Z

To clarify a bit:

I'm not suggesting you get rid of the large examples all together. I think it would be beneficial to have a small example for the quick start and then move on to the larger example to show the real power of the package.

rhiever · 2018-01-31T02:52:03Z

Big 👍 to this suggestion from @zkamvar. It took far too long for me to download the example dataset (~250 MB for the one I downloaded). A small dataset (<1 MB) crafted for demo purposes would be ideal, such that users can install your package and instantly run quickstart examples from the README. Very important for uptake of projects.

ResidentMario · 2018-02-03T22:01:07Z

Done. A sample collisions dataset is now externalized via quilt (you can also just download it directly from a companion rep), and I've updated all of the Quickstart visualizations to use it. I've removed the housing dataset, which I no longer have at hand, would be hard to recreate, and mostly just pads the length out IMHO.

zkamvar mentioned this issue Jan 29, 2018

[REVIEW]: Missingno: a missing data visualization suite openjournals/joss-reviews#547

Closed

36 tasks

ResidentMario closed this as completed Feb 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include smaller example data for users to follow along (and for future tests) #45

Include smaller example data for users to follow along (and for future tests) #45

zkamvar commented Jan 29, 2018

ResidentMario commented Jan 29, 2018

zkamvar commented Jan 29, 2018

rhiever commented Jan 31, 2018

ResidentMario commented Feb 3, 2018

Include smaller example data for users to follow along (and for future tests) #45

Include smaller example data for users to follow along (and for future tests) #45

Comments

zkamvar commented Jan 29, 2018

ResidentMario commented Jan 29, 2018

zkamvar commented Jan 29, 2018

rhiever commented Jan 31, 2018

ResidentMario commented Feb 3, 2018