# Build a Reproducible Workflow From Scratch (in about an hour)

## a.k.a. 2017 NIH Hour of Code

---

## R. Burke Squires

_Contractor with [Medical Sciences and Computing (MSC)](https://www.mscweb.com/) - [Positions available](https://careers-mscweb.icims.com/jobs/search?hashed=-435621309)_

### NIAID Bioinformatics and Computational Biosciences Branch ([BCBB](https://www.niaid.nih.gov/research/bcbb-services))

- Collaborative consulting with NIAID intromural scientists, and others (when possible)
- Develop international used bioinformatics web-based tools such as:

    - [Nephele - AWS microbiome analysis portal](https://nephele.niaid.nih.gov/)
    - [3D Print Exchange](https://3dprint.nih.gov/)
    - [NIAID Bioinformatics Portal](https://bioinformatics.niaid.nih.gov)

---

### [Project Jupyter](http://jupyter.org/)

![jupyter.png](attachment:jupyter.png)

__Note:__ I am using the Damian Avila's [RISE notebook extension](https://github.com/damianavila/RISE) to present this notebook in slide format. 

[Source](http://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/What%20is%20the%20Jupyter%20Notebook.html)

# What is the Jupyter Notebook?

"The Jupyter Notebook is an __interactive computing environment__ that enables users to author notebook documents that include: - Live code - Interactive widgets - Plots - Narrative text - Equations - Images - Video

These documents provide a complete and __self-contained record of a computation__ that can be converted to various formats and shared with others using email, Dropbox, version control systems (like git/GitHub) or [nbviewer.jupyter.org](nbviewer.jupyter.org)."

## Components

"The Jupyter Notebook combines three components:

- The __notebook web application__: An interactive web application for writing and running code interactively and authoring notebook documents.

- __Kernels__: Separate processes started by the notebook web application that runs users’ code in a given language and returns output back to the notebook web application. The kernel also handles things like computations for interactive widgets, tab completion and introspection.

- __Notebook documents__: Self-contained documents that contain a representation of all content visible in the notebook web application, including inputs and outputs of the computations, narrative text, equations, images, and rich media representations of objects. Each notebook document has its own kernel."

## Notebook web application

"The notebook web application enables users to:

- __Edit code in the browser__, with automatic syntax highlighting, indentation, and tab completion/introspection.
- __Run code from the browser__, with the results of computations attached to the code which generated them.
- See the results of computations with __rich media representations__, such as HTML, LaTeX, PNG, SVG, PDF, etc.
- Create and use __interactive JavaScript widgets__, which bind interactive user interface controls and visualizations to reactive kernel side computations.
- Author __narrative text__ using the Markdown markup language.
- Include mathematical equations using __LaTeX syntax in Markdown__, which are rendered in-browser by MathJax."

## Kernels

"Through Jupyter’s kernel and messaging architecture, the Notebook allows code to be run in a range of different programming languages. For each notebook document that a user opens, the web application starts a kernel that runs the code for that notebook. Each kernel is capable of running code in a single programming language and there are kernels available in the following languages:

- Python(https://github.com/ipython/ipython)
- Julia (https://github.com/JuliaLang/IJulia.jl)
- R (https://github.com/IRkernel/IRkernel)
- Ruby (https://github.com/minrk/iruby)
- Haskell (https://github.com/gibiansky/IHaskell)
- Scala (https://github.com/Bridgewater/scala-notebook)
- node.js (https://gist.github.com/Carreau/4279371)
- Go (https://github.com/takluyver/igo)

The __default kernel (IPython) runs Python code__. The notebook provides a simple way for users to pick which of these kernels is used for a given notebook.

Each of these kernels communicate with the notebook web application and web browser using a JSON over ZeroMQ/WebSockets message protocol that is described here. Most users don’t need to know about these details, but it helps to understand that __“kernels run code.”__"

## Notebook documents

"Notebook documents contain the __inputs and outputs__ of an interactive session as well as __narrative text__ that accompanies the code but is not meant for execution. __Rich output__ generated by running code, including HTML, images, video, and plots, is embeddeed in the notebook, which makes it a complete and self-contained record of a computation.

When you run the notebook web application on your computer, notebook documents are just __files on your local filesystem with a ``.ipynb`` extension__. This allows you to use familiar workflows for organizing your notebooks into folders and sharing them with others.

Notebooks consist of a __linear sequence of cells__. There are four basic cell types:

- __Code cells__: Input and output of live code that is run in the kernel
- __Markdown cells__: Narrative text with embedded LaTeX equations
- __Heading cells__: 6 levels of hierarchical organization and formatting
- __Raw cells__: Unformatted text that is included, without modification, when notebooks are converted to different formats using nbconvert

Internally, notebook documents are `JSON <https://en.wikipedia.org/wiki/JSON>`__ data with binary values `base64 <http://en.wikipedia.org/wiki/Base64>`__ encoded. This allows them to be read and manipulated programmatically by any programming language. Because JSON is a text format, notebook documents are version control friendly.

__Notebooks can be exported__ to different static formats including HTML, reStructeredText, LaTeX, PDF, and slide shows (reveal.js) using Jupyter’s `nbconvert` utility.

Furthermore, any notebook document available from a __public URL on or GitHub can be shared__ via [nbviewer](http://nbviewer.jupyter.org/). This service loads the notebook document from the URL and renders it as a static web page. The resulting web page may thus be shared with others without their needing to install the Jupyter Notebook."

## IPython

"IPython provides a rich architecture for interactive computing with:

- A powerful interactive shell.
- Support for __interactive data visualization__ and use of GUI toolkits.
- Easy to use, high performance tools for __parallel computing__.
- __Comprehensive object introspection__.
- __Extensible tab completion__, with support by default for completion of python variables and keywords, filenames and function keywords.
- __Extensible system of ‘magic’ commands for controlling the environment and performing many tasks related to IPython or the operating system__.
- A rich configuration system with easy switching between different setups.
- __Access to the system shell__ with user-extensible alias system.

Source: http://ipython.readthedocs.io/en/stable/


---

## Installing Jupyter Notebook

### Anaconda distribution

"Anaconda is a freemium open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment.
Package versions are managed by the package management system conda."

"Easily install 1,000+ data science packages and manage your packages, dependencies and environments—all with the single click of a button"

Source: https://www.anaconda.com/distribution/

![Anaconda-Distribution-Diagram.png](attachment:Anaconda-Distribution-Diagram.png)

### Conda

Package, dependency and environment management for any language—Python, R, Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN

Conda is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies. Conda easily creates, saves, loads and switches between environments on your local computer. It was created for Python programs, but it can package and distribute software for any language.

### Bioconda...

---

__Requirements:__

Participants who wish to follow along should have access to a computer with:

- a Linux shell; 
    - Mac and Linux computers come with this installed
    - Windows users may install git-bash, part of git for windows (https://git-for-windows.github.io)
- A recent installation of the Anaconda python distribution for python 3.6

## Unable to install Anaconda? Try Jupyter in the Cloud...

__Note__: You can try out Jupyter here but will not bea ble to run the workflow as it requires additional setup.

Go to https://try.jupyter.org. No installation is needed.

Want to try using these notebooks...try MyBinder...https://mybinder.org/

You can also run Juypter notebooks in the cloud by going to [Google Colaboratory](https://colab.research.google.com)


__Additional Resources__:
  
- [Jupyter notebooks](https://github.com/ucsd-ccbb/jupyter-genomics)
- [biopython bioinformatics notebooks](https://github.com/tiagoantao/biopython-notebook)