# Basics of Jupyter Notebooks

## First fetch the source

```git clone https://github.com/coderefinery/jupyter.git
cd jupyter
jupyter-notebook```

## Some history
- In 2014, Fernando Pérez announced a spin-off project from IPython called Project Jupyter, moving the notebook and other language-agnostic parts of IPython to Jupyter
- The name "Jupyter" derives from Julia+Python+R, but today Jupyter kernels exist for [dozens of programming languages](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels)
- Galileo's publication in a pamphlet in 1610 in Sidereus Nuncius, one of the first notebooks!  
<img src="http://media.gettyimages.com/photos/pages-from-sidereus-nuncius-magna-by-galileo-galilei-a-book-of-and-picture-id90732970" width="500">



## Navigating Jupyter notebooks
 - Notebook Dashboard
   * `Files` tab shows files in current directory
   * `Running` tab shows kernels running on your computer
   * `Clusters` tab lets you launch kernels for parallel computing
 - Fully-fledged terminal (you can run emacs and vi)
 - Text editor for source code in many different languages  
 

## Cells

- **Markdown cells** contain formatted text written in Markdown 
- **Code cells** contain code to be interpreted by the *kernel* (Python, R, Julia, Octave/Matlab...)

![Components](img/notebook_components.png)

## Markdown cells

This cell contains simple [markdown](https://daringfireball.net/projects/markdown/syntax), a simple language for writing text that can be automatically converted to other formats, e.g. HTML, LaTeX or any of a number of others.

**Bold**, *italics*, **_combined_**, ~~strikethrough~~, `inline code`.

* bullet points

or

1. numbered
3. lists

**Equations:**   
inline $e^{i\pi} + 1 = 0$
or on new line  
$$e^{i\pi} + 1 = 0$$

Images ![CodeRefinery Logo](https://pbs.twimg.com/profile_images/875283559052980224/tQLhMsZC_400x400.jpg)

Links:  
[One of many markdown cheat-sheets](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#emphasis)


## Code cells

In [1]:
# a code cell can run statements of code.
# when you run this cell, the output is sent 
# from the web page to a back-end process, run 
# and the results are displayed to you
print("hello world")

hello world


## In this lesson you will learn 
- How *markdown* and *code* cells work
- How to use keyboard shortcuts to speed up your work
- How to use *widgets*
- How to use notebook *magic commands* and create new custom magics
- How to mix in different markup and programming languages (html, LaTeX, bash, ruby, perl, R, octave)

# Data analysis and visualization in Jupyter Notebooks


### Let us look into some Jupyter features
- Toggle between code and markdown cells
- Edit mode and Command mode
- Executing a cell
- Inserting, copying, pasting and removing cells
- Execution order - prompt numbers
- Meaning of _
- Getting help with ?

### <font color="red"> *Exercise 1.1* </font>

Spend a couple of minutes playing around with Markdown and code cells:
1. Create a new cell below this one, and make it a Markdown cell 
2. Go to Edit mode, and add a heading along with some bullet points and an equation
3. Add another cell below, and make it a code cell
4. Add some code which returns output (either use `print()` or type the variable name at the end of the cell)
5. Try some of the keyboard shortcuts listed below

Here are some useful hints:
* You can edit the cell by double-clicking on it, or pressing `Enter` when it's selected
* You can run the cell by pressing the play-button in the toolbar, or press `Shift-Enter`
* You can change the type of the cell from the toolbar, or press `m` for Markdown and `y` for code

**Questions**
* What is the difference between executing a cell with `Shift-Enter`, `Ctrl-Enter` or `Alt-Enter`?


If you already know all this or if you want to move on:
- Go to exercise 1.2 below

### Keyboard shortcuts 

Some shortcuts only work in Command or Edit mode.

* `Enter` key to enter Edit mode (`Escape` to enter Command mode)
* `Ctrl`-`Enter`: run the cell
* `Shift`-`Enter`: run the cell and select the cell below
* `Alt`-`Enter`: run the cell and insert a new cell below
* `Ctrl`-`s`: save the notebook 
* `Tab` key for code completion or indentation (Edit mode)
* `m` and `y` to toggle between Markdown and Code cells (Command mode)
* `d-d` to delete a cell (Command mode)
* `z` to undo deleting (Command mode)
* `a/b` to insert cells above/below current cell (Command mode)
* `x/c/v` to cut/copy/paste cells (Command mode)
* `Up/Down` or `k/j` to select previous/next cells (Command mode)
* `h` for help menu for keyboard shortcuts (Command mode)
* Append `?` for help on commands/methods, `??` to show source (Edit mode) 

### Shell commands
  - You can run shell commands by prepending with !
    - NB: on Windows, GitBash needs to have the following option enabled:   
    `Use Git and the optional Unix tools from the Windows Command Prompt` 
  - Useful, e.g., for managing the python environment
  - Remember to make sure your cell command doesn't require interaction

In [None]:
!echo "hello"

In [None]:
!pip list

 - Many common linux shell commands are available as magics: %ls, %pwd, %mkdir, %cp, %mv, %cd, *etc.*, more on magics [later in the lesson](#Magics)

<a id="exercise_git"></a>
### <font color="red"> *Exercise 1.2* </font>

Try to only use keyboard shortcuts for the following steps:

1. Create a new code cell below this one
2. Run a `git diff` ([this is why tools like nbdime have been developed!](#Version-control-of-notebooks))
3. Toggle the output of this cell

## Interactive plotting

Jupyter supports interactive plotting with matplotlib and other visualization libraries (including for other languages). Matplotlib can be used with different backends, which will make the plots appear differently in the Notebook

In [None]:
%matplotlib --list

In [None]:
#%matplotlib notebook
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,2*np.pi,100)
y = np.sin(x)
plt.plot(x,y, 'r-')
plt.show()

## Widgets

Widgets add more interactivity to Notebooks, allowing one to visualize and control changes in data, parameters etc.

In [None]:
from ipywidgets import interact

#### Use `interact` as a function

In [None]:
def f(x, y, s):
    return (x, y, s)

interact(f, x=True, y=1.0, s="Hello");

#### Use `interact` as a decorator

In [None]:
@interact(x=True, y=1.0, s="Hello")
def g(x, y, s):
    return (x, y, s)

## More interactive plotting using widgets

In [None]:
from ipywidgets import interact # IPython.html.widgets before IPython 4.0

@interact
def plot(n=(1,6)):
    x = np.linspace(0,2*np.pi,100)
    y = np.sin(n*x)
    plt.plot(x,y, 'r-')
    plt.show()

### <font color="red"> *Exercise 1.3* </font>

- Execute the cell below. It fits a 5th order polynomial to a gaussian function with some random noise 
- Use the `@interact` decorator together with the function `fit`, such that you can visualize fits with polynomial orders `n` ranging from, say, 3 to 30


In [None]:
# gaussian function
def gauss(x,param):
    [a,b,c] = param
    return a*np.exp(-b*(x-c)**2)

# gaussian array y in interval -5<x-5 
nx = 100
x = np.linspace(-5.,5.,nx)
p = [2.0,0.5,1.5] # some parameters
y = gauss(x,p)

# add some noise
noise = np.random.normal(0,0.2,nx)
y += noise

# we fit a 5th order polynomial to it

def fit(n):
    pfit = np.polyfit(x,y,n)
    yfit = np.polyval(pfit,x)
    plt.plot(x,y,"r",label="Data")
    plt.plot(x,yfit,"b",label="Fit")
    plt.legend()
    plt.ylim(-0.5,2.5)
    plt.show()
    
# call function fit
# these lines are unnecessary when you use the interact widget
n=5
fit(n)

## Magics

Magics are a simple command language which significantly extend the power of Jupyter 

Two kinds of magics:

  - **Line magics**: commands prepended by one % character and whose arguments only extend to the end of the current line.
  - **Cell magics**: use two percent characters as a marker (%%), receive as argument the whole cell (must be used as the first line in a cell)

Other features:
  - Use `%lsmagic` magic to list all available line and cell magics
  - Question mark shows help: `%lsmagic?`
  - `%quickref` gives a short reference of available magic (and other) functionality 
  - Additional magics can be created, see below for example

In [None]:
%lsmagic

In [None]:
%quickref

You can capture the output of line magic (and shell) commands

In [None]:
!ls

In [None]:
ls_out = %ls
ls_out

In [None]:
%sx?

In [None]:
ls_out = %sx ls
ls_out

### %timeit
- Timing execution
- Both Line and Cell level

In [None]:
%timeit import time ; time.sleep(1)

In [None]:
import numpy as np

In [None]:
%%timeit 
a = np.random.rand(100, 100)
np.linalg.eigvals(a)

### %%writefile
Writes the cell contents as a named file

In [None]:
%%writefile foo.py
print('Hello world')

### %run 
 - Executes python code from .py files 
 - Can also execute other jupyter notebooks

In [None]:
%run foo

### %load
 - Loads code directly into cell. File either from local disk or from the internet
 - After uncommenting the code below and executing, it will replace the content of cell with contents of file.

In [None]:
# %load https://matplotlib.org/_downloads/annotate_transform.py

### %debug
Activate interactive debugger

Let's try using `%debug` to hunt down a bug.     
Note that one first needs to load the .py file, then execute it, and then run the `%debug` magic.

In [None]:
# %load debug_example.py
import numpy as np
import matplotlib.pyplot as plt

def plot_log():
    fig, ax = plt.subplots(2, 1)
    x = np.linspace(1, 2, 10)
    ax.plot(x, np.log(x))
    plt.show()

plot_log()  # Call the function, generate plot


Run the debugger post-mortem. If an exception has just occurred, the debug magic lets you inspect its stack frames interactively

In [None]:
%debug

**Don't forget to exit the debugger by typing `q` and `Enter`!**  
If you don't, the background process will not be ready for your next command.

### %prun
 - Python code profiler
 - Cell and Line magic

### <font color="red"> *Exercise 1.4* </font>

1. Load the random_walk.py example into a cell below with the appropriate magic command
2. Split up the functions over cells (either via Edit menu or keyboard shortcut `Ctrl-Shift-minus)`. 
3. Initializating `n` and calling `walk()` doesn't need to be in a main function, and you can remove the `__name__` stuff
4. Plot the random walk trajectory
5. Time the execution of `walk()` with a line magic
6. Run the prun cell profiler
7. Can you spot a little mistake which is slowing down the code?

In [None]:
# %load random_walk.py

### <font color="green"> *Solution* </font>

> One possible solution can be found in the solutions.ipynb notebook

### Installing a new magic command

Magics can be loaded like plugins and installed using `pip`.

We will now install a line-profiler to get more detailed profile, and hopefully find insight to speed up the code.

In [None]:
!pip install line_profiler

In [None]:
%load_ext line_profiler

In [None]:
# %load random_walk.py

In [None]:
%lprun -f walk main()

### Other types of media

In [None]:
import IPython.display
IPython.display.Audio?

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('j9YpkSX7NNM')

In [None]:
from IPython.display import Audio
framerate = 44100
t = np.linspace(0,5,framerate*5)
data = np.sin(2*np.pi*220*t) + np.sin(2*np.pi*224*t)
Audio(data,rate=framerate)

#Audio(url="http://www.w3schools.com/html/horse.ogg")



In [None]:
from IPython.display import IFrame
IFrame("http://jupyter.org",width='100%',height=350)

### Further shell access with %%bash magic
 - Run cells with bash in a subprocess
 - On Windows, you *may* have to use `%%cmd` instead and use appropriate Windows commands

In [None]:
%%bash
mkdir tmpdir
cd tmpdir
pwd
echo "foo" > test.file
ls
cat test.file
cd ..
rm -r tmpdir

#### One can store the standard error and output

In [None]:
%%bash --out output --err error
echo "hi, stdout"
echo "hello, stderr" >&2

In [None]:
print(error)
print(output)

### Mixing in other languages (assuming that they're installed)

The `%%script` magic is like the #! (shebang) line of a Unix script,
specifying a program (bash, perl, ruby, etc.) with which to run.  
But one can also directly use these:
- %%ruby
- %%perl
- %%bash
- %%html
- %%latex
- %%R

Why would you want to mix programming languages in the same notebook?
 - leverage strengths from different languages
 - using code from colleagues
 - a fantastic library exists in another language than your favorite one

In [None]:
%%ruby
puts 'Hi, this is ruby.'

In [None]:
%%script ruby
puts 'Hi, this is also ruby.'

In [None]:
%%perl
print "Hello, this is perl\n";

In [None]:
%%bash
echo "Hullo, I'm bash"

In [None]:
%%html
<table>
<tr>
<th>Header 1</th>
<th>Header 2</th>
</tr>
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>

In [None]:
%%latex
\begin{align}
\nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\
\nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\
\nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\
\nabla \cdot \vec{\mathbf{B}} & = 0
\end{align}

### R

The R world already has a powerful IDE, RStudio, where one can annotate code using Markdown and export to HTML.  
A key difference between RStudio and Jupyter is that in Jupyter one can modify and rerun individual cells, without having to rerun everything.

In [None]:
# first we need to install the necessary packages
#!conda install -c r r-essentials 
#!conda install -y rpy2

To run R from the Python kernel we need to load the rpy2 IPython extension

In [None]:
%load_ext rpy2.ipython

In [None]:
%%R
myString <- "Hello, this is R"
print ( myString)

Inline plotting in R is straightforward 

In [None]:
%%R 
# Define the cars vector with 5 values
cars <- c(1, 3, 6, 4, 9)

# Graph cars using blue points overlayed by a line 
plot(cars, type="o", col="blue")

# Create a title with a red, bold/italic font
title(main="Autos", col.main="red", font.main=4)

Data in R cells is of course persistent

In [None]:
%%R 
barplot(cars)

We can plot a Python pandas dataframe with R code

In [None]:
import pandas as pd
df = pd.DataFrame({
    'cups_of_coffee': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'productivity': [2, 5, 6, 8, 9, 8, 0, 1, 0, -1]
})

In [None]:
%%R -i df -w 6 -h 4 --units cm -r 200
# the first line says 'import df and make default figure size 5 by 5 inches 
# with resolution 200. You can change the units to px, cm, etc. as you wish.
library(ggplot2)
ggplot(df, aes(x=cups_of_coffee, y=productivity)) + geom_line()

### Octave/Matlab

If we have Octave installed we could switch to an Octave kernel. This comes with a new set of magics

In [None]:
%lsmagic

t = linspace(0,6*pi,100);
plot(t,sin(t))
grid on
hold on
plot(t,cos(t), 'r')

tx = ty = linspace (-8, 8, 41)';
[xx, yy] = meshgrid (tx, ty);
r = sqrt (xx .^ 2 + yy .^ 2) + eps;
tz = sin (r) ./ r;
mesh (tx, ty, tz);

### Creating your own custom magic

Using the `@register_cell_magic` decorator, we will create a cell magic command that compiles C++ code and executes it


> This example has been borrowed from the [IPython Minibook](http://ipython-books.github.io/), by Cyrille Rossant, Packt Publishing, 2015.


In [None]:
from IPython.core.magic import register_cell_magic

In [None]:
@register_cell_magic
def cpp(line, cell):
    """Compile, execute C++ code, and return the standard output."""

    # We first retrieve the current IPython interpreter instance.
    ip = get_ipython()

    # We define the source and executable filenames.
    source_filename = '_temp.cpp'
    program_filename = '_temp'

    # We write the code to the C++ file.
    with open(source_filename, 'w') as f:
        f.write(cell)

    # We compile the C++ code into an executable.
    compile = ip.getoutput("g++ {0:s} -o {1:s}".format(
        source_filename, program_filename))

    # We execute the executable and return the output.
    output = ip.getoutput('./{0:s}'.format(program_filename))

    print('\n'.join(output))

In [None]:
%%cpp 
#include<iostream>
int main(){
    std::cout << "Hello C++";
}

This cell magic is now only available in the current notebook. To make it permanent we need to make an IPython extension. This we do by writing the definition of the function `cpp()` into a file on `PYTHONPATH` (for example current directory), and add a small function at the end

In [None]:
%%writefile cpp_ext.py
def cpp(line, cell):
    """Compile, execute C++ code, and return the standard output."""

    # We first retrieve the current IPython interpreter instance.
    ip = get_ipython()

    # We define the source and executable filenames.
    source_filename = '_temp.cpp'
    program_filename = '_temp'

    # We write the code to the C++ file.
    with open(source_filename, 'w') as f:
        f.write(cell)

    # We compile the C++ code into an executable.
    compile = ip.getoutput("g++ {0:s} -o {1:s}".format(
        source_filename, program_filename))

    # We execute the executable and return the output.
    output = ip.getoutput('./{0:s}'.format(program_filename))

    print('\n'.join(output))

def load_ipython_extension(ipython):
    ipython.register_magic_function(cpp,'cell')

In [None]:
%load_ext cpp_ext

In [None]:
%%cpp?

# Summing up

## Key features of Jupyter Notebooks
- Excels at [literate programming](https://en.wikipedia.org/wiki/Literate_programming)
- Many features of integrated development environment (IDE): code completion, easy access to help
- [Support for many programming languages](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels)

## Use cases
- Experimenting with new ideas, testing new libraries/databases 
- Interactive code, data analysis and visualization development
- Sharing and explaining code to colleagues
- Learning from other notebooks
- Keeping track of interactive sessions, like a digital lab notebook
- Supplementary information with published articles
- Teaching (programming, experimental/theoretical science)
- Presentations

## When not to use notebooks?

- Large codebases are difficult to manage in notebooks
- More difficult to follow good software development practices
    - doesn't play well with version control (see below)
    - not as easy to do automated testing
    - not as useful as IDE to ensure PEP8-compliance

## Sharing notebooks

- You can enter a URL, GitHub repo or username, or GIST ID in [`nbviewer`](https://nbviewer.jupyter.org/) and view a rendered Jupyter notebook
    - try entering just "coderefinery" and see if you can find this current notebook
- Read the Docs can render Jupyter Notebooks via the [nbsphinx package](https://nbsphinx.readthedocs.io/)
- [Binder](https://mybinder.org/) creates live notebooks based on a GitHub repository
- [CoCalc](https://cocalc.com/) (formerly SageMathCloud) allows collaborative editing of notebooks in the cloud 
- Google's [colaboratory](https://colab.research.google.com/) lets you work on notebooks in the cloud, and you can [read and write to notebook files on Drive](https://colab.research.google.com/notebooks/io.ipynb)
- [Microsoft Azure Notebooks](https://notebooks.azure.com/) also offers free notebooks in the cloud
- [JupyterLab](https://github.com/jupyterlab/jupyterlab) supports sharing and collaborative editing of notebooks via Google Drive 
- [Notedown](https://github.com/aaren/notedown), [Jupinx](https://github.com/QuantEcon/sphinxcontrib-jupyter) and [DocOnce](https://github.com/hplgit/doconce) can take Markdown or Sphinx files and generate Jupyter Notebooks
- The `jupyter nbconvert` tool can convert a (`.ipynb`) notebook file to:
    - python code (`.py` file) 
    - an HTML file
    - a LaTeX file
    - a PDF file
    - a slide-show in the browser

Note: the Google, Microsoft and CoCalc platforms are free but have paid subscriptions for faster access to cloud resources

## [JupyterHub](https://github.com/jupyterhub)

- A multi-user hub to spawn, manage and proxy multiple instances of the Jupyter Notebook server
- Purpose: supporting multiple users, who can log in and start notebooks
- Used by: student classes, corporate data science workgroup, scientific research group, high-performance computing group

## [JupyterLab](https://github.com/jupyterlab/jupyterlab)

- Natural evolution of the Jupyter Notebook user interface
- An "IDE": *Interactive* Development Environment
- Flexible user interface for assembling the building blocks of interactive computing
- Adaptable to multiple workflows. Switch between Notebook/narrative focus and script/console focus
- A stable version suitable for general usage was released in Feb. 2018

![jupyterlab](img/jlab-screenshot-nb-con-term-2_40.png)

## Version control of notebooks
Jupyter Notebooks are stored in json format, which is easy to parse but basic diff and merge tools do not handle it well.  
This reduces the power of version control systems like Git. Tools like [nbdime](http://nbdime.readthedocs.io/en/latest/) provide "content-aware" diffing and merging

- nbdime can be installed with `pip install nbdime` 
- To diff two notebooks in terminal do `nbdiff notebook_1.ipynb notebook_2.ipynb`
- Or if you want a rich web-based rendering: `nbdiff-web notebook_1.ipynb notebook_2.ipynb`
- To integrate nbdime with Git, type `nbdime config-git --enable --global`  
(this will leave Git's behavior unchanged for non-notebook files, but use nbdime's diff and merge for notebook .ipynb files)
- [Click here to get back to exercise 1.2](#exercise_git)

## Links and further reading
 - https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks
 - http://ipython-books.github.io/minibook/
 - http://ipython-books.github.io/cookbook/
 - https://www.oreilly.com/ideas/the-state-of-jupyter

## Lesson key points

- Keyboard shortcuts simplify using Jupyter
- Magics allow you to
 - access the filesystem
 - time, debug and profile your code
 - run shell commands in underlying system
- You can also create your own magics
- You can add inline plots, and widgets provide more interactivity
- The json format of Jupyter Notebooks is not optimal for version control with Git, but the nbdime tool helps
- Jupyter can run many kernels, among them Python, Octave, Julia and R (assuming they are installed on the host running Jupyter)

# Use cases

If time allows, we will now split up and work in groups (or individually). Choose one theme from the following list that you're most interested in. 


1. Data analysis. Study the [data_analysis.ipynb](https://github.com/coderefinery/jupyter/blob/master/data_analysis.ipynb) notebook, try to solve the exercises and discuss with other learners and instructors.
2. Accelerating Python code. Study the [accelerating_python.ipynb](https://github.com/coderefinery/jupyter/blob/master/accelerating_python.ipynb) notebook, try to solve the exercises and discuss with other learners and instructors.
3. Discussion group on possible use cases of Jupyter Notebooks
    - Do you see possible use cases in your research, or does it not suit your needs? 
    - Would you consider publishing a Jupyter Notebook as supplementary material with your articles? 
    - What do you find to be particularly interesting/powerful about Jupyter, and what are the drawbacks and limitations? 
4. Jupyter Lab is finally in beta, you can install it with `!pip install jupyterlab` and run it with `jupyter lab` instead of `jupyter notebook`
5. Do you want to research some other aspects of Jupyter Notebooks? Go ahead and feel free to discuss with other learners and instructors.