# Jupyter Notebooks

## Why?

In the past, most peoples' intoductions to text-based coding environments involved an "integrated development environment" or "IDE":

<img src = 'https://upload.wikimedia.org/wikipedia/commons/thumb/c/c9/Ninja-ide-screenshot.png/800px-Ninja-ide-screenshot.png' width=600>

IDE's are generally designed for professional software developers that are working on large, complex projects and require a fair amount of experience. IDE's are also intimidating for young coders. They lump all the lines of code together which can make it difficult to find and correct bugs. 

When teaching coding and data science, teachers are often looking to build instuctional materials that are accessible and engaging for their students as well as facilitating assessment.

**Jupyter Notebooks** provide exactly that by merging powerful coding and data science tools with a webpage-like interface that allows teachers to create a rich learning experience and promote engagement.

## Cells

Jupyter notebooks are read from top to bottom, like a story. The notebooks are made of three types of "cells": **markdown**, **code**, and **raw** cells. We will only use markdown and code cells. 

When you click on a cell, you will see a blue bar to the left. Also, there is a dropdown above that tells you which type of cell it is: 

<img src = "markdown.png" >

### Markdown Cells

You are reading a markdown cell right now. Markdown cells can contain many things such as

***formatted text*** 

images: <img src='jupiter.png' width=60> 

audio: <audio width="320" height="240" 
        src="https://upload.wikimedia.org/wikipedia/commons/6/66/Mozart_Piano_Sonata_in_C%2C_K545%2C_end_of_first_movement_01.wav"  
        controls>
</audio>

even embedded videos: <video width="200" height="200" controls 
        src=https://upload.wikimedia.org/wikipedia/commons/8/83/Apollo_16_rover_practice.ogv 
        type=video/webm> 
</video>

 Markdown cells provide a richness and make a Jupyter notebook feel and act like a webpage. 

### Code Cells

While markdown cells provide context and make our notebook more engaging, a **code** cell is where the magic happens. When you run a code cell, it is actually executing that operation in the computer.

In [None]:
I am a code cell. How can you tell?

### Editing Cells

Cells can be cut, copied and pasted using the toolbar. 

<img src= "cut_copy_paste.png" >

They can also be merged, split and moved using the edit menu.

### Runnning Cells

The whole idea of using Jupyter notebooks is to be able to merge a narrative with coding. While the notebook may read like a story, we want the notebook to be interactive as well. This includes being able to creating, editing and "running" them. There are 2 ways to run a cell. After selecting a cell,  click the &#9658; button in the toolbar or use the keyboard shortcut `shift` + `enter`

For a **markdown** cell, you can edit it by double clicking on it. See if you can make "double clicking" in the previous sentence bold.

Running a code cell sometimes creates an "output" below. Try running the following code cell.

In [None]:
print("Hello, world!")

Notice the number in brackets to the left of the code cell. This is important because it tells us the order in which the cell has been run. We can actually jump around the notebook running cells in any order we want. It doesn't have to be top to bottom. However, the order does matter for the computer and this is why we have this numbering system.

If you ever want to clear the computer's memory and start the cell number from 1. Click the "restart kernel" &#8635; button in the toolbar. 

#### Comments

Non-coding text does not have to be in a markdown cell. Long before we had Jupyter notebooks, computer scientists had to embed both personal notes and notes to others in their code. 

This is done by using **comments**. In a code cell, anything after a `#` is ignored by the computer. Run the following cell as is, then see what happens is you remove the `#` and rerun it.

In [None]:
print("Hello, world!")  # I am a comment. Ignore me.

# Python Basics

Now that you know your way around a Jupyter notebook (sort of), let's do a crash course in the basics of Python programming language. Python strikes the balance between being beginner-friendly and powerful at the same time. Python could be someone's first coding language in school and continue to serve them in college and beyond. 

## Operators

Operators are pieces of code that allow us to perform operations. They are at the heart of basic fuctions of the computer.

### Arithmetic Operators (Math)

The most basic way to use Python (and all coding languages) is as a calculator. The following arithmetiic operators include some you are probably familiar with an maybe a couple of new ones. 

|name|operator|
|------|------|
| add | `+` |
| subtract | `-`| 
| multiply | `*` |
| divide  | `/` |
| raise to a power | `**` |
| floor division | `//` |
| modulus | `%` |

Try them out on the cell below:

In [None]:
2 + 2

<br>***TRY SOME MORE***

create at least 3 lines of new code below and practice a different mathematical operation

(use + sign in menu bar to create new line of code)

### Comparison Operators

Comparison operators, as the name implies is a way to compare two values. This allows you to ask the computer a question.


| Name | Operator |
|-------|-------|
| equal to | `==` |
| not equal to | `!=` |
| greater than | `>` |
| greater than or equal to | `>=` |
| less than | `<` |
| less than or equal to | `<=` |

See how the computer responds:

In [None]:
2 == 2

In [None]:
## create at least 2 lines of code and practice a different comparsion operator

### Logic Operators

We can combine comparisons using logic operators


| Description | Operator |
|-------|-------|
| are both statements true? | `and` |
| is either statement true? | `or` |

Try them out:

In [None]:
2 < 3 and 5 == 4

In [None]:
## create at least 2 lines of code using and, or

## Variables

Like variables in math, variables in coding allow us to assign a value, like `5`,  to a variable, like `x`, in the form:

In [None]:
x = 5

Now the computer treats `x` and `5` as the same:

In [None]:
2 + x

Variable assignments are not permanent. They can be overwritten by assigning a new value to an existing variable. This is where the order in which you run the cells is important. 

In [None]:
x = 10
2 + x

However, we should think of variables much more broadly in Python as a "bucket" to store data. That data could be a single number, a list of items, even massive datasets.

### Naming Variables

It is important to give variables a descriptive name as you want to write code so that others can understand it (`x` is too generic).

Variable names are case sensitive. There are several common conventions when there are more than one word in a variable name:

|Convention|Examples|
|----------|--------|
|Camel Case|`camelCase`|
|Snake Case|`snake_case`|
|Pascal Case|`PascalCase`|

In this notebook, we will use snake case.

## Functions

While operators are the most basic way to interact with a computer, they can be combined to perform more complex tasks. Their job is to take one or more inputs, called "arguments", and work on them to "return" an output.  

You saw a Python function in the code cell above: `print("Hello, world!")`. This is called the **print function** and it's pretty obvious what it does.

You can "define" your own functions. See if you can figure out what this function is designed to do:

In [None]:
# This just defines the function (commits it to the computer's memory).
# Run the cell. See what happens.
def cel_to_far(celsius):
    return (9/5) * celsius + 32

In [None]:
# Try using the function by replacing the ??? with a number:
cel_to_far(???)

In [None]:
# Try using the function by replacing the ??? with a number:
cel_to_far(???)

## Data

Obviously, doing data science involves working with data. Understanding the different "flavors" of data and how it is organized in the computer's memory is critical knowledge.

### Data Types

Notice `Hello, world!` is in quotes `" "`. This is because the phrase `Hello, world!` is a specific **data type** known as a **string** (text) and strings must be in quotes. By the way, single quotes `' '` can be used as well. They are the same as double quotes `" "` in Python. What happens if you remove the quotes?

In [None]:
print("Hello, world!")

There are many data types, but the following are the most common:

|data type| Description|Example|
|--------|-------------|--------|
|int|integer|`5`|
|float|number with a decimal|`3.14`|
|str|string of characters|`"What?"`|
|list|ordered collection|`["eggs", "milk", "juice"]`|

Data types are important because functions only accept arguments with certain data types. 

In [None]:
round(5.3)

In [None]:
round(???)

A useful function is `type()` which will tell you the data type of a value of data stored in a variable.

In [None]:
type("fun")

In [None]:
type(???)

## Data Structures

Lists are an example of **structured data**. Essentially, several pieces of data have been organized. Lists are **ordered**, meaning the items are labeled using an **index**. 

In [None]:
grocery_list = ["eggs", "milk", "juice"]   # Assign the list to the variable "grocery_list"
print(grocery_list[???])  # Replace ??? with a number to print "milk"

You were probably surprised to see that numbering in Python (and other coding languages) does not start at 1. Dont' worry. You will get used to it. 

## Pandas

Pandas is a data analysis and manipulation tool. We use pandas to analyze and look at tables

A 1D array in Pandas is called a `Series`.

In [None]:
import pandas as pd

pandas_planets = pd.Series(['mercury', 'venus', 'earth', 'mars', 'jupiter'])

type(pandas_planets)

### Pandas DataFrames

A 2D array in Pandas is called a `DataFrame`. 

This is done by passing another structured data type, known as a `Dictionary`, to the `DataFrame` function. 

In [None]:
pandas_planets = {'A': ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter'],
        'B': [1, 2, 3, 4, 5], 'C': [0, 0, 1, 2, 79]}

In [None]:
planets_df = pd.DataFrame(pandas_planets)
planets_df

You may be wondering what the difference is between a Numpy 2D array and a Pandas dataframe. Again, the diffence is subtle, but the computer sees the two differently:

**Numpy 2D Array**

<img src="numpy_planets.png" width = 300>

**Pandas Dataframe**

<img src="pandas_planets.png" width = 300>

## Reading in Data from a .csv File

We will primarily be using Pandas dataframes, but Numpy will be a valuable library as well. Fortunately, we don't need to build our 2D arrays and dataframes from scratch like above. Usually, we will bring in data from an existing file (also known as a **dataset**). The most common file format for raw data is a `.csv` file, short for "comma separated values" and it is very easy to convert it into a dataframe:

In [None]:
planets = pd.read_csv('planets.csv')
planets

A Pandas dataframe should look familiar. It is how data is structured in spreadsheets like Excel and Google Sheets. The bold numbers on the left are known as the **index** and allow us to number rows. The column names allow us to label and reference individual 1D arrays in the dataframe.