# Documenting and Versioning Code

The learning goals for _documenting and versioning code_ include:

* Describe Jupyter notebooks and their benefits
* Explain how version control (i.e., git) works

After the exercise, you will be able to 

* Create a Jupyter notebook for one of your projects
* (Briefly) explain your workflow in Markdown in that Jupyter notebook
* Check in that notebook to a version control system

# Jupyter Notebooks

[Jupyter Notebook](https://jupyter.org/)https://jupyter.org/) is an interactive, browser-based application that lets you store and run code, stories or descriptions of code, and visualizations in the same place. I'm using a Jupyter notebook for this presentation. I also use notebooks in my research which falls under the general heading of computational social science. A lot of what I do is analogous to the work you do. Here's a very high level list of my tasks:

1. get a bunch of data that's automatically generated by a sensor or process, 
1. manipulate or munge the data into the format I need, 
1. generate computational and statistical models, 
1. run those models, 
1. analyze the output, and 
1. describe my results to others.

# Why Notebooks

## tl;dr 

Reproducibility (by me and by others) and Transparency

## Long Version

In the course of a normal day, I switch among these tasks in different projects (i.e., data collection on one project and model building on another). Code I wrote two hours ago might as well have been written by a stranger. I've thought about too many other things to remember what I was working on, how far I'd gotten in that particular task, and what decisions I'd made along the way. Enter notebooks. They allow me to store my thought process, my work, and its output all in the same place. I don't have to remember where I put a particular file or how my table was produced because the line of code that gets the data is in the same place as the line of code that generates the table. In between notebooks let me add comments that are human-readable (no # marks or weird colors or fonts) by using Markdown. 

[Private example](https://gitlab.si.umich.edu/posm/mpsa-2019/blob/master/notebooks/models/create/16.0-ams-all-models-EVALUATE.ipynb): results section of [MPSA paper](https://deepblue.lib.umich.edu/handle/2027.42/148323)

![screenshot from private notebook](http://drive.google.com/uc?export=view&id=1QF2R_TjJuaslTlhDhcQcLCy2JtEoQm-p)

The private example is a notebook one of my students created that analyzes the results from some topic models we built. The notebook generates the figures and regression tables that went into the ```Results``` section of our paper. When drafting the paper, we were able to copy directly from her markdown in that document to the results section -- she explained in the notebook what figure she was generating, and the visible code makes it easy to see what data is included in the figure. You can also see places where she left herself notes about what work to return to (before ```In [17]:```).

## Documentation

### For Code Writers

Why do I bring this up in a talk about _documenting code_? By making it easier to capture your process and to provide a single notebook to interested parties (e.g., journal reviewers, researchers wishing to replicate, new members of your research team), Jupyter makes documenting easy. 


### For Code Users

Speaking of code written by strangers. Who has tried to replicate someone else's study using that person's code? How did that go?

(discussion)

Again, enter notebooks. Jupyter notebooks are _playable_ which means I can issue a ```Run``` command (by pushing a button, using a keyboard shortcut, or choosing it from a menu) and watch the notebook unfold.

![Run](https://www.tutorialspoint.com/jupyter/images/jupyter_notebook_toolbar.jpg)

[Public example](https://github.com/tapilab/icwsm-2018-hostility/blob/master/Replication.ipynb): replication notebook of [ICWSM paper](https://aaai.org/ocs/index.php/ICWSM/ICWSM18/paper/view/17875/0)

In the public example, you can see a replication notebook one of my collaborators created for a paper we published last year. You can play this notebook with our data and retrain the models we created, regenerate the tables in our paper, and inspect the data at the same points we did. You won't see many explicit comments (i.e., #'ed lines) or markdown cells in that notebook because storing the generation, analysis, and output together minimizes the need for documentation. This code is nearly _self-documenting_ because it uses elements such as variables names and structure to guide readers through the work. It also makes output easy to find because you also don't have to point to carefully named ```.tex``` or ```.png``` files somewhere else because the tables and figures are right there.

## Transparency

The notebook approach makes documentation easier, meaning we are each more likely to do it. It also improves the transparency of our work by revealing the code we used to manipulate and analyze our data. Being able to understand the code requires a level of expertise with whatever language is employed---my examples were both in Python, but Julia, R, and 90 other languages are supported. But I think we can agree that the notebooks contain far more detail in a more usable structure than a standard methods section or even static appendix.

# How Notebooks

1. Conda environments, nb_conda
2. Demo code, markdown cells
3. Version control

## Conda environments

* what are environments
* why conda
* how conda
* [```bioconda```](https://anaconda.org/bioconda/repo)

## Demo Cell Types

You've seen the colors and text change on my screen as I walk through this notebook. Nearly all of the _cells_ in my notebook contain [Markdown](https://daringfireball.net/projects/markdown/) a lightweight markup language that enable you to [render text in notebooks](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html). Generally, Markdown renders text in sans serif font and code in a ```fixed-width, serif font```.

You can set the type for each cell using the type dropdown:

![Type dropdown](http://drive.google.com/uc?export=view&id=1LW26M7vcTmBST7XrKqZbO5Flj0FnFwTS)

## Version Control

* what's git
* [git workflows](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow)
* [GitHub](http://github.com), [UMich Gitlab](https://its.umich.edu/projects/gitlab/getting-started) (coming soon), 



### Git Workflows

Branches - mapping your work to the language of software development (i.e., features, releases)

![Git workflow general](http://drive.google.com/uc?export=view&id=1KgdPtlIQ0ABZqR9BiQ89zRS6FxfPxFQP)