![image-2.png](attachment:image-2.png)

# What is a Jupyter Notebook?

What is a Jupyter Notebook? You're reading one right now!

A Jupyter Notebook is an interactive document that combines code and prose. It's a single document where you can run code, display the output, add explanations, formulas, charts. This allows you to make your work more transparent, understandable, repeatable, and shareable.

We use Jupyter Notebooks to teach content in our machine learning courses, so that you can interact with the code yourself and see the output throughout the typical machine learning process.

In this notebook, we'll explain everything you need to know about Jupyter Notebooks to be ready for our courses!


**Learning Objectives:**:

By the end of this document, you should be able to effectively navigate a Jupyter notebook on your own.

You will learn:

- What a kernel is
- The difference between the menu bar and tool bar
- The difference between code and markdown cells
- How to write math in markdown cells
- How to write Python in code cells
- The run flow of notebooks
- What magic commands are
- How to load data into Jupyter Notebooks
- Shortcuts for navigating Jupyter Notebooks

   

## The Jupyter Notebook Interface

A Jupyter notebook has a few key elements:

![image.png](attachment:image.png)

## Notebook name

This is just the name of the notebook!

## Menu bar

![Screen%20Shot%202022-08-30%20at%201.54.29%20PM.png](attachment:Screen%20Shot%202022-08-30%20at%201.54.29%20PM.png)

The menu bar" holds a few different commands of interest:

- **File**: Options at the file (notebook) level
- **Edit**: A list of cell options
- **View**: Toggle options for notebook view
- **Insert**: Menu to insert cells above or below the selected cell
- **Cell**: Cell-level run commands
- **Kernel**: Kernel runtime commands
- **Widgets**: Widget options (outside the scope of this document)
- **Help**: Help resources & references

## Toolbar

![Screen%20Shot%202022-08-30%20at%201.54.38%20PM.png](attachment:Screen%20Shot%202022-08-30%20at%201.54.38%20PM.png)


The toolbar has a series of cell-level commands. These commands let you click on the cell and quickly run it, stop running it, delete it, duplicate it, convert the cell-type, etc. More on cells below.

<div style="border: 4px solid coral; text-align: center; margin: auto;">
    <h2><i>Try it Yourself!</i></h2>
    <br><br>
    <p style=" text-align: center; margin: auto;">Select the cell below and then use the <b>Cell</b> menu in the menu bar to run the cell!</p>
    <br>
</div>

In [None]:
print("My first code cell!")

---


## The Kernel

You may hear the term "kernel" passed around in the context of Jupyter notebooks.
The kernel is basically the “computational engine” that runs the code contained in the notebooks.
You can see it at the top right corner (it should say `conda_python3`). It's basically the programming language + environment (and dependencies) that the notebook uses for running code. 

You can show the `Kernel` menu in the menu bar to enact notebook-wide changes to our code, such as restarting every cell or interrupting the kernel (stopping a cell if it takes too long to run).

<div style="border: 4px solid coral; text-align: center; margin: auto;">
    <h2><i>Try it Yourself!</i></h2>
    <br><br>
    <p style=" text-align: center; margin: auto;">What Kernel is currently being used? (hint: check the top-right)</p>
    <br>
    <p style=" text-align: center; margin: auto;">How many kernel options are available?</p>
    <br>
</div>

---


## Cells

Cells form the body of a notebook, and they are the atomic units of code in a notebook.

There are two main cell types that we will cover:

- **Markdown cell**: A Markdown cell contains text formatted using Markdown and displays its output in-place when the Markdown cell is run. It's used for text and math equations. They provide you the ability to clearly document your work so that others (and yourself) can understand the notebook.

- **Code cell**: A code cell contains code to be run in the kernel. When the code is run, the notebook displays the output below the code cell that generated it.

We'll cover each in a bit more detail.

---

### Markdown Cells

#### Standard Markdown

To create a new markdown cell, create a new cell and type `Esc` + `m`, or in the menu click `Cell` -> `Cell Type` -> `Markdown`.

Markdown cells are used to write text using Markdown, a basic markup language. 

Basically, markdown is a markup language for formatting plain text. Different symbols will format or render text differently. For example:

`# Level 1 Header `

will render text as:

# Level 1 Header

and

`Add emphasis via **bold** and __bold__, or *italic* and _italic_` 

will render as

Add emphasis via **bold** and __bold__, or *italic* and _italic_

You can also create lists using:

- Item 1
- Item 2
- Item 3

or

1. Item a
1. Item b
1. Item c

It's like writing a fancy email.

A full tutorial on markdown is outside the scope of this document, but you can read all about markdown here: https://www.markdownguide.org/ 

#### Writing Math Equations in Markdown

You can easily add math equations to markdown by writing LaTeX inside of `$` symbols:

`$ y = \beta_{0} + 2\beta_1^2 $`

will render

$ y = \beta_{0} + 2\beta_1^2 $

Jupyter uses MathJax for rendering. You can read about available equations/symbols here: https://docs.mathjax.org/en/v2.7-latest/tex.html

<div style="border: 4px solid coral; text-align: center; margin: auto;">
    <h2><i>Try it Yourself!</i></h2>
    <br><br>
    <p style=" text-align: center; margin: auto;">Create a new markdown cell below this with two different headers and a list!</p>
    <br>
</div>

In [None]:
# todo: Create two different markdown headers and a list:

# Code Cells

A code cell contains code to be run in the kernel. When the code is run, the notebook displays the output below the code cell that generated it.

**We can write any Python in a code cell**:

In [None]:
# this is a comment in a code cell
x = 5
multiples_of_five = [x * i for i in range(5)]
print(multiples_of_five)

To run a code cell, click on the cell and click `Run` in the menu, or hold `shift` + `return`

In [None]:
print("I am a code cell! I will print after being run!")

<div style="border: 4px solid coral; text-align: center; margin: auto;">
    <h2><i>Try it Yourself!</i></h2>
    <br><br>
    <p style=" text-align: center; margin: auto;">Create a new code cell below this.<br> Define the variable <code>name</code> in Python, and give it the value of your name. </p>
    <br>
</div>

In [None]:
# create name variable here

In [None]:
try:
    print(f"hello {name}")
except NameError as e:
    raise Exception("Make sure you define the name variable above!")

You can use code cells for any common Python tasks. 
For example, we'll often want to load (`import`) libraries into our kernel.

**Importing libraries is as easy as calling `import`**:

In [None]:
# let's load pandas and make a dataframe for our multiples of five
import pandas as pd

multiples_of_five_df = pd.DataFrame({'multiples': multiples_of_five})
multiples_of_five_df

Notice that in our code cell above, we referred to `multiples_of_five`, which was defined in a previous cell. Cells can access values in the global environment that were defined in other cells, just make sure to run them in order!

Another common task is to load external files. 

You can **load external files just as with any other Python script**: via relative paths for local files, URL paths for web-hosted files, via S3 buckets, etc.:

In [None]:
# use pandas, which we imported earlier, to load a csv containing reviews of Spanish wines
# this data is a subset of https://www.kaggle.com/datasets/zynicide/wine-reviews
sp_wines = pd.read_csv("spanish_wines_reviews.csv")
sp_wines.head(3)

Pandas has all kinds of useful functionality for working with data. We can easily subset, reshape, calculate statistics, carry out feature engineering, or even plot our data easily:

In [None]:
# how are the wine scores distributed over regions
sp_wines.plot.scatter(x="region", y="points", rot=90, title="Spanish Wine Scores Across Regions");

<div style="border: 4px solid coral; text-align: center; margin: auto;">
    <h2><i>Try it Yourself!</i></h2>
    <br><br>
    <p style=" text-align: center; margin: auto;">Read about pandas documentation <a href="https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html">here</a>. <br>
    In the sp_wines dataframe, calculate the mean wine score across all regions.</p>
    <br>
</div>

In [None]:
# todo: in the sp_wines dataframe, calculate the mean wine score across all regions

---

## Run Flow

One confusing thing about Jupyter Notebooks is that **the cells can run in any order**. Typically, you want to run code cells from top-to-bottom, as cells towards the end of the notebook will likely require variables defined earlier in the notebook. For this reason, Jupyter includes numbers next to each code cell, found in the square brackets next to the cell, indicating the order of run in which the cell was run.

When writing your own notebook, make sure to have cells in order so that anybody who runs your notebook in the future can just run the kernel's `Restart & Run All` method without hitting any errors. 

You should also get used to adding markdown cells at the start of a notebook giving details about what it does. You can also use this to let people know if there are code segments that will take a long time (such as training the model) to run.

<div style="border: 4px solid coral; text-align: center; margin: auto;">
    <h2><i>Try it Yourself!</i></h2>
    <br><br>
    <p style=" text-align: center; margin: auto;">In the menu, click on Kernel then Restart.</p>
    <br>
    <p style=" text-align: center; margin: auto;">Run the cell below, did it error out? Why or why not?</p>
    <br>
</div>

In [None]:
# run this cell after restarting the kernel
# if you reset the kernel, it will error out since sp_wines is not defined yet! 
# scroll up and rerun the instantiation of the sp_wines dataframe to solve this error.
sp_wines.head(3)

<div style="border: 4px solid coral; text-align: center; margin: auto;">
    <h2><i>What did restarting the kernel do?</i></h2>
    <br><br>
    <p style=" text-align: center; margin: auto;">Since you restarted the kernel, the cell had an error.</p>
    <br>
    <p style=" text-align: center; margin: auto;">This is because restarting the kernel clears all the defined variables out of memory, including the <code>sp_wines</code> dataframe.</p>
    <br>
</div>

---

## Writing shell commands

Jupyter runs on a linux instance. To write bash commands in a notebook, simply preface them with `!`, that is, you can use `![command]` to run shell commands:

In [None]:
# print current data in utc
!date

In our classes, you'll often see us use `!` to update the dependencies of our kernel:

In [None]:
# install boto3 to kernel
!pip install boto3

<div style="border: 4px solid coral; text-align: center; margin: auto;">
    <h2><i>Try it Yourself!</i></h2>
    <br><br>
    <p style=" text-align: center; margin: auto;">Run the <code>ls</code> command in a code cell below</p>
    <br>
    <p style=" text-align: center; margin: auto;">Based on your output, what's the command doing?</p>
    <br>
</div>

In [None]:
# run ls shell command here

---

## Magic Commands

Jupyter code cells have a number of 'magic commands' that allow you to run special commands that only work in a notebook. One example is looking at the command history of your notebook or running different programming languages than your current kernel.

Magic commands are prefaced with `%` (and for different programming languages, `%%`).

You can use magic commands to access other programming languages too, like html or javascript.

In [None]:
# Look at history of previous notebook edits:
%history

In [None]:
%%html

<h2 style="color: red; border: 2px solid black; text-align: center;">I am html</h2>

In [None]:
%%javascript
alert('sorry for this popup!')

<div style="border: 4px solid coral; text-align: center; margin: auto;">
    <h2><i>Try it Yourself!</i></h2>
    <br><br>
    <p style=" text-align: center; margin: auto;">Read about more magic commands <a href="https://ipython.readthedocs.io/en/stable/interactive/magics.html">here.</a></p>
    <br>
    <p style=" text-align: center; margin: auto;">Look up how to time how long a Python statement or expression takes to run.</p>
    <br>
    <p style=" text-align: center; margin: auto;">Time the expression below. How long did it take?</p>
    <br>
    <br>
    <p style=" text-align: center; margin: auto;">Hint: search for the "timeit" magic command</p>
    <br>
</div>

In [None]:
# time how long it takes to run the following:
throwaway = [i*2 for i in range(1000000)]

## Keyboard Shortcuts

Jupyter notebooks have two different sets of keyboard shortcuts: one set that is active in edit mode and another in command mode.

If you see a pencil next to the kernel, that means you're in edit mode. Otherwise, you're in command mode.

The most important keyboard shortcuts are `Enter`, which enters edit mode, and `Esc`, which enters command mode.

In edit mode, most of the keyboard is dedicated to typing into the cell's editor. Thus, in edit mode there are only a  few shortcuts. In command mode, the entire keyboard is available for shortcuts, so there are many more. The `Help > Keyboard Shortcuts` dialog lists the available shortcuts.

We recommend learning the command mode shortcuts in the following rough order:

- Basic navigation: `enter`, `shift-enter`, `up`/`k`, `down`/`j`
- Saving the notebook: `s`
- Change Cell types: 
     - `y`: code
     - `m`: markdown
     - `1-6`: indicate the header levels for the markdown (i.e. 1 is H1 or #)`
- Cell creation: 
    - `a`: create cell above currently selected cell
    - `b`: create cell below currently selected cell
- Cell editing: 
    - `x`: cut
    - `c`: copy
    - `v`: paste
    - `dd`: delete 
    - `z`: undo
- Kernel operations: 
    - `i`: interrupt kernel
    - `0`: restart kernel 

<div style="border: 4px solid coral; text-align: center; margin: auto;">
    <h2><i>Try it Yourself!</i></h2>
    <br><br>
    <p style=" text-align: center; margin: auto;">Create a new markdown cell below this using only the keyboard.</p>
    <br>
</div>

## Our typical workflow in MLU

Our MLU courses usually follow the same typical workflow in our Jupyter notebooks:

- 1. Install dependencies at the top of the notebook
- 2. Import and libraries and datasets we need.
- 3. Use Markdown text to describe the problem and narrate what's happening in each cell
- 4. Use Python code to display machine learning material we talked about in class.

## End

This document is meant to be a quick way to get you up-to-speed with what Jupyter notebooks are, how to interact with them, and how to use them for yourself. 

You should now be able to operate effectively in any of our MLU classes!

**Note:** you can return to this notebook during the quiz to help answer questions.