# Jupyter Notebooks

Jupyter Notebooks & Pandas Workshop  
HackFrost NL, February 20, 2021

This document is a notebook. They are tools for 'literate programming'. They contain your code, the output from running your code, and notes.

They're useful for:
* _Exploratory coding:_ When you don't know in advance exactly what you need to do, you write a little bit of code, run it, maybe look at a plot or widget, go back and tweak it, run it again, and so on. It's a good pattern for Data Science.
* _Sharing your work:_ These documents explain and demonstrate what you're doing. You can even use the 'RISE' extension to write presentations about Data or Code. When you're done you can save this document to Git, download it as PDF/HTML/RevealJs and share it with poeple. BinderHub is a tool which runs off Git to share any notebook you've committed.

But it's not the right tool for all programming: as code gets more complicated, we move pieces of it out into modules so that we can reuse it in several files.

## Basics

Notebooks are organized into cells. There are different kinds. The words you're reading are in a Markdown cell. Below is a code cell. To change cell types use the menu or the `Esc`+`M` and `Esc`+`Y` hotkeys.

In [None]:
# This is a code cell

a = 5
print(a)

To run a code cell select the cell and press `Shift`+`Enter` or the `Run` button on the toolbar. The last variable in the cell is printed unless you end the line with `;`. You can go back and re-run the cell and the number by the cell will update -  cells don't have to be run in order. 

The code in this notebook may not have been run yet.

When a cell is not run it shows `In [ ]`.

When a cell is running it shows `In [*]`.

When a cell is done running it shows `In [1]`, where the number is the order in which cells were run.

## Read-Eval-Print Loop (REPL)

Jupyter Notebooks fit into the Read-Evaluate-Print Loop (known as REPL) type environment. The notebook reads the code in the cell, executes it, and prints back the result. REPL environments are great for exploratory coding.

## Coding in Jupyter Notebooks

Jupyter Notebooks use Python by default, but can use many other languages, such as Java, R, Julia, Scala, etc.

You can do all the things you would normally do in Python.

Like write **loops**.

In [None]:
for i in range(1, 5):
    print(i)

Or define **functions** and use them.

In [None]:
def print_numbers():
    for i in range(1, 5):
        print(i)

In [None]:
print_numbers()

You can also write **classes**.

In [None]:
class transaction:
    
    amount = 0 
    
    def set_amount(self, amount):
        self.amount = amount
    
    def get_amount(self):
        return self.amount

In [None]:
a = transaction()

a.set_amount(5000)
a.get_amount()

## Jupyter != Python

Jupyter notebooks are just one way to interact with the language. The code can be taken out and run in a python .py script as regular code!

## The Kernel

Every notebook runs a kernel. The kernel is what executes code cells. 

The kernel also functions as the global memory of the notebook. It is in place for all cells. All variables and functions are stored in the kernel as code is executed.

Once we've executed code in a cell, we can reference the variables defined in that cell in any other cell. (We've seen this above.) **Importantly, this is true regardless of the order in which the cells are executed.**

Let's look at an example of *notebook hygiene*.

In [None]:
a = 5
b = 10

In [None]:
b = b * a

In [None]:
print(b)

One way to mitigate this risk is to keep cells that depend upon each other in one code cell block.

In [None]:
a = 5
b = 10
b = b * a

In [None]:
print(b)

Another way is to hide code in functions.

In [None]:
a = 5
b = 10

In [None]:
def mult(x, y):
    return x * y

In [None]:
print(mult(a, b))

### Notebooks Should Flow from Top-to-Bottom

Avoid situations where cells depend on cells executed further down the notebook. The notebook should execute linearly from top to bottom. 

Restarting the kernel and re-running your notebook (*restart and run all*) is a good check that your notebook runs sequentially.

Avoid this situation!

In [None]:
a = 50.0 * my_func(20)

In [None]:
def my_func(x):
    return x * (x-1)

## Markdown Cells

Jupyter notebooks use Markdown to create all the text cells that we've seen so far. Markdown lets us do things like **bolding** or *italicizing* text.

Sometimes you want to include lists:
* item 1
* item 2
* item 3

Or numbered lists:
1. item 1
2. item 2
3. item 3

We can also add section headers.

# Section header
## Subsection
### Subsubsection
#### And so on

We can also use double ticks for monospaced inline code, for example ``foo()``, and triple ticks for code blocks:
```
bar()
```

There are plenty of markdown cheatsheets a short google search away.  
One option is: http://mdcheatsheet.com/

## Getting Help

If you need help with a specific function use `?`

In [None]:
?print

This pops up the manual for the print function.

In [None]:
?range

Jupyter also supports ipython _magics_ which start with `%`.

In [None]:
%time 

a = 5
a + a

## Notebooks are Documents

Jupyter Notebooks run code, but they are more like living documents. You should annotate your work, similar to commenting code, but you get all the power of Markdown to add sectioning and rich text.

A few best practices on notebooks:
* Notebooks should be single purpose.
* A user should be able to click run and execute your notebook from top to bottom and have it work.
* Annotate your notebook with Markdown to explain things (like commenting code), including adding sections, etc.
* Version your notebook using git. Github integrates well with notebooks!

## Sample Notebook

Let's see how a well structured notebook might look like. 

Don't worry for now about the particular code or libraries that are used. We will investigate these in the next few notebooks.

**First**, let's import the libraries and tools that we need.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

**Second**, get some data to work with. This may be data that you've created, customer data that you've imported, or something else.

In [None]:
iris = sns.load_dataset('iris')

**Third**, investigate your data and understand it. 

In [None]:
iris.head()

This data set contains information about flowers. We have the width and length of their sepals and petals.

![title](03_iris.png)

**Fourth**, calculate something from your data, or generally do something useful. Let's pretend that we want to approximate the area of the petal.

In [None]:
# Area calculation
def area_of_petal(length, width):
    return length * width

In [None]:
iris['petal_area'] = area_of_petal(iris['petal_length'], iris['petal_width'])

In [None]:
iris.head()

**Fifth**, present our results in some way.

In [None]:
sns.barplot(x = 'species', y = 'petal_area', data = iris)

**Finally**, you can see from start to finish how we did we did, with plots, figures and our results presented right in the notebook.

# Summary

Notebooks are commonly used for Data Science. 
To read more about Machine Learning check out:
    
[Hands-On Machine Learning with Scikit-Learn and TensorFlow - Aurelien Geron](http://shop.oreilly.com/product/0636920052289.do) 

The book uses several notebooks which are available at 
[github.com/ageron/handson-ml](github.com/ageron/handson-ml)

For some online courses check out:
* [fast.ai's practical intro to deep learning](https://www.fast.ai)
* [Google's educational material](https://ai.google/education/)
* [Amazon's training material](https://www.aws.training/LearningLibrary?filters=classification%3A30&search=&tab=view_all)
* [Andrew Ng's course](https://www.coursera.org/learn/machine-learning)