Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

Jupyter Notebooks on Github

In July 2017 we searched for, downloaded, and analyzed the approximately 1.3 million public Jupyter Notebooks on Github at the time. This repository includes the scripts used to query and analyze that dataset. The full dataset is now available online thanks to hosting provided by the UC San Diego Library. The full dataset is nearly 600GB so we have created a smaller 5GB sampler dataset for you to get started.

In our analysis, we looked primarily at how notebooks employ narrative (operationalized as markdown text). Our main finding was that many notebooks (~27%) include no markdown text and that text occurs more often at the start of notebooks.

Many notebooks have little explanatory text

See our forthcoming CHI 2018 publication for a more detailed description of this analysis, as well as our related study of notebooks published alongside academic publications and interviews with academic data analysts.

For those unfamiliar with Jupyter Notebooks, they are an interactive computing environment that enable analysts to combine code, visualizations, and text in a single, easily shared document. See the example of a notebook below:

Example Jupyter Notebook

About

An analysis of all 1.3 million public Jupyter Notebooks on Github in July 2017

Resources

Releases

No releases published

Packages

No packages published