Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Jupyter Notebooks on Github

In July 2017 we searched for, downloaded, and analyzed the approximately 1.3 million public Jupyter Notebooks on Github at the time. This repository includes the scripts used to query and analyze that dataset. The full dataset is now available online thanks to hosting provided by the UC San Diego Library. The full dataset is nearly 600GB so we have created a smaller 5GB sampler dataset for you to get started.

In our analysis, we looked primarily at how notebooks employ narrative (operationalized as markdown text). Our main finding was that many notebooks (~27%) include no markdown text and that text occurs more often at the start of notebooks.

Many notebooks have little explanatory text

See our forthcoming CHI 2018 publication for a more detailed description of this analysis, as well as our related study of notebooks published alongside academic publications and interviews with academic data analysts.

For those unfamiliar with Jupyter Notebooks, they are an interactive computing environment that enable analysts to combine code, visualizations, and text in a single, easily shared document. See the example of a notebook below:

Example Jupyter Notebook


An analysis of all 1.3 million public Jupyter Notebooks on Github in July 2017



No releases published


No packages published