# An introduction to Jupyter Notebook and Jupyter Lab


by César Herrera

## Motivation

I am an advocate of reproducibility in science. 

I believe that toools such as ROpenSci, Jupyter Notebook, Jupyter Lab, Jupyter Hub, Binder, Docker, Rstudio, and many others are making easier the process of:

- Sharing, and
- Reproducing

our research in an **interactive** and **deployable** container available to everyone.

Which in turns have the potential to improve science and accelerate the adoption of iinovative and worthy ideas, paradigms and knowledge.

## Problem

Reproducing someone else research is hard.

Even when you suceed in reproducing analysis results might differ depending on your programming and package version <sup>1</sup> 


<sup>1</sup> *Beaulieu-Jones, B., Greene, C. Reproducibility of computational workflows is automated using continuous analysis. Nat Biotechnol 35, 342–346 (2017). https://doi.org/10.1038/nbt.3780*



## The way we used to do and communicate science

We share ideas, results, figures, statistical analysis and code in a PDF document.

A PDF document was created to simulate paper: *Is this the best we can do in 2020?*

We are seeing a increasing (but shy) movement to improve the way we communicate our ideas, analysis and results in science.

## The way we should do and communicate science 

<sup> * Ideas from F Perez, K Ram, C Holdgraf, M Pacer, M Ragan-Kelley, J B BuckheitDavid, L Donoho and many others </sup>

Ideally, scientific information should be characterized by:

- open source
- reproducible <sup>1</sup>
    - access to the tools used
    - access to the interface used to use these tools
- contained in easy vehicle/container to:
    - communicate
    - share
    - bundle or bind all the above
- easily accesible to everyone

**The challenge** is how to achieve all the above.

<sup>1</sup> *Buckheit J.B., Donoho D.L. (1995) WaveLab and Reproducible Research. In: Antoniadis A., Oppenheim G. (eds) Wavelets and Statistics. Lecture Notes in Statistics, vol 103. Springer, New York, NY*

Check the paper [here](https://link.springer.com/chapter/10.1007/978-1-4612-2544-7_5)

## Challenge 

<sup> * Ideas from F Perez, C Holdgraf, M Pacer, M Ragan-Kelley, and many others </sup>

Technical reproducibility vs Practical reproducibility (being able to reproduce someone else work effortless).

> An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.

> Buckheit and Donoho, (paraphrasing John Claerbout) - WaveLab and Reproducible Research, 1995

This is taken from the very good talk by C Holdgraf, M Pacer, M Ragan-Kelley in Scipy 2018.
Watch the talk: https://youtu.be/KcC0W5LP9GM

## Solutions


1. Innovate in the Hardware and Software used 

2. A clear framework

    - Research Compedium

>  ...We introduce the concept of a compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data, ...), and as a means for distributing, managing and updating the collection. 

> Gentleman and Temple Lang, 2004

Want to know more about this watch Karthik Ram [presentation in rstudio::conf(2019)](https://rstudio.com/resources/rstudioconf-2019/a-guide-to-modern-reproducible-data-science-with-r-karthik-ram/)

## The way forward

### New tools

Hopefully today we can include additional tools to our toolbox.

- ${\checkmark}$ R
- ${\checkmark}$ Python
- ${\checkmark}$ Git
- ${\checkmark}$ GitHub
- ${\checkmark}$ Rstudio
- ${\square}$ Jupyter Notebook
- ${\square}$ Jupyter Lab
- Jupyter Hub
- ${\square}$ Binder
- Kubernetes
- Docker containers
- Zenodo, CodeOcean, Dryad, FigShare, Tropical Data Hub,etc


It seems to be harder for us, the new generation of scientist, as the list of tools we must learn and practice with dexterity is endless. 

I understand this feeling. My intentions is not to demotivated people with this endless list, but the opposite. These tools have the potential to improve the way we communicate science and enhance the reach of our results and ideas. Many teams are building these tools precisely to reduce the burden in the scientists and programmers.

While I consider that it is definetely harder to current scientist to keep up with all available tools and information, I also belive that we do not need to become a master on all these tools. I think we are only required to be aware of the surrounding software and hardware ecosystem and their potentialities in our own research.

## What is Jupyter Notebook

> The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

## What is Jupyter Lab

> Jupyter Lab is a web-based interactive development environment for Jupyter notebooks, code, and data. JupyterLab is flexible: configure and arrange the user interface to support a wide range of workflows in data science, scientific computing, and machine learning.

## What is Jupyter Hub

> A multi-user version of the notebook designed for companies, classrooms and research labs (including Centralized deployment, Authentication, and much more)

## Jupyter Notebook

Started by Fernando Perez, original named IPython.

Before him, other computational notebooks were available (e.g. Wolfram Mathematica, Maple and others) since 1980's

Currently there are more than 60 interactive notebook environments such as Callisto, PolyNote, and many more <sup>1</sup>.


<sup>1</sup> *The Design Space of Computational Notebooks: An Analysis of 60 Systems in Academia and Industry. Sam Lau, Ian Drosos, Julia M. Markel, Philip J. Guo. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2020.* 

*Check the paper [here](https://pg.ucsd.edu/publications/computational-notebooks-design-space_VLHCC-2020.pdf)*

## Jupyter ecosystem

- Support multiple languages (I think over a hundred)

- Active community

    - New developments

    - Many widgets/plugins for specific tasks

- Open Source (and free)

- Widely adopted

- Interactive code and visualizations in your web browser (you can use it in your phone, tablet, computer)

- It is the dream for any Lecturer/Professor teaching subjects with programming language (install and deploy notebooks for many users)

## Jupyter and notebooks

Support Literate programming 

> From Wikipedia: "Literate programming is a programming paradigm introduced by Donald Knuth in which a computer program is given an explanation of its logic in a natural language, such as English, interspersed with snippets of macros and traditional source code, from which compilable source code can be generated"

Proposed by Donald E. Knuth in 1984 ![Donald E. Knuth in 1984](http://www.literateprogramming.com/knuthweb.pdf)

## How to use Jupyter Notebook with R

1. Install Jupyter Notebook

    - Even though Jupyter Notebooks are able to run many programming languages, Python is a essential requirement.

    - Once you have installed pyhton, probably the best way to install Jupyter is using Conda

    - This [website](https://jupyter.org/index.html) shows the right steps to install Jupyter Notebook and Jupyter Lab.


2. Install R


3. Install the R kernel for Jupyter

    - How to install: https://github.com/IRkernel/IRkernel or https://irkernel.github.io/installation/


You can find a list of all programming language supported in Jupyter in this repo: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels


## Understanding Jupyter notebooks

Notebook Menubar

Notebook Toolbar

Mode
    Edit
    Command

Cells

Cell types

     Code
     Markdown
     Raw NBConvert