# Python & Jupyter Notebooks

## Jupyter what?!

Jupyter Notebooks is a popular interactive coding web application used by data scientists and programmers. If you're reading this, you're currently inside an Jupyter Notebook file. Notebook files consist of cells. You can navigate from cell to cell using the up and down arrow keys. Give it a try!

## What do I mean when I say *interactive coding tool*?

Most computer programs are static in nature. They generally consist of a set of predefined steps which are stored in a text file on a computer. As needed, the contents of the file are fed to the **interpreter** -- the program which turns code into computer instructions (more on this later) -- and are executed as a unit. Here's an example of what a typical Python file looks like:

```
# my_program.py

x = "Hello, World!"
print(x)
```

The above program consists of three steps. The first line `# my_program.py` is what's called a <strong>*comment*</strong>. In Python, any line prefaced with a `#` is a comment and is ignored. In other words, the first line is the name of the file. It is purely stylistic and has no effect on the behavior of the program itself.

The next step, `x = "Hello, World!"`, is a bit more interesting. In this case, the program is instructing the computer to create a reference, `x`, to some chunk of memory and to store inside that chunk of memory the **string** "Hello, World!".

Finally, the last step, `print(x)`, instructs the computer to <strong>*invoke*</strong> (or call) a **function** called `print` with an **argument** (or input) of `x`. `print` is another program built-in to python that accepts an input and prints it to the screen.

The cell below is meant to demonstrate how this program would run. Give it a try!

In [None]:
# run me by pressing shift + enter

x = "Hello, World!"
print(x)

This style of programming is perfectly fine when the result of a program is determined by an unchanging set of rules. The static file approach however is NOT well-suited for all use cases. When performing data analysis using Python for example, the steps can't always be determined ahead of time. Datasets come in all different shapes and sizes, often require various unpredictable preprocessing steps (AKA <strong>*wrangling*</strong>) before they can be worked with.

This is where Jupyter Notebooks comes in to the picture. As you can see, a Jupyter Notebook file is comprised of cells; each one building upon the last. It's almost as if the creators of Jupyter built-in a pause, fast-forward, and rewind button into your program. At anytime you can add cells to the end beginning or in-between. So in our Jupyter Notebook our old program could be written like this:

In [None]:
x = "Hello, World!" ## <--- Try changing the value in-between the quotes and then run the cells again

In [None]:
print(x)

## Python you'll need to know for this workshop
- *variables*
- *objects & types*
- *functions*
- *import statements & dependencies*

### Variables
As mentioned earlier, a variable is simply a reference to some chunk of memory. Every variable has a name and using that name, we can interact with the aformentioned chunk of memory; we can store things, modify things, and access what we've stored and/or modified using variables. Below are all examples of variables:

In [None]:
a = "my variable"
b = 12
c = 15.29034
d = [1, 'hello', 3.456]
e = {"hello": 3}

### Objects & Types
In Python, everything is an <strong>*object*</strong>. It's not terribly important for now to fully understand what an object is, but suffice it to say, an object is the most basic unit of Python thing (lol!) and everything is an object of some <strong>*type*</strong>. Different types of objects have different characteristics. This is also something we need not fully understand for now. Just acknowledging that there are differences is perfectly sufficient for our purposes. Below are some examples of object types. In the below examples we make use of a <strong>*function*</strong> (more on this shortly) called `type`, which simply takes an object and returns the object's type. Have a look:

In [None]:
type("my name is Eitan") # <--- anything w/ double quotes is a string.

In [None]:
type('1,000,000') # <--- single quotes work too.

In [None]:
type(143099) # <--- Numbers without quotes and decimals are called integers.

In [None]:
type(c) # we defined the variable c a few cells back. It's type is called a float.

In [None]:
type([1, 2, "three"]) # <--- square braces == list

In [None]:
type({"one": 1}) # <--- curly braces == dictionary

### Functions
In our last few examples we made use of a function called `type`, but what is a function? One of the fundamental rules of programming is D.R.Y or Don't Repeat Yourself. Functions allow us take a complex set of commands which we don't want write over and over again, assign them name (another form of a variable FYI), and reuse as needed. This makes our programs more readable and can also abstract away some of the unimportant details. Functions can optionally accept arguments. Many functions, such as `type`, are built-in Python's standard libraries, but we can define our own as well as make use of third-party defined functions.

In the below example I've defined a function called `titleize`, which given a title will capitalize the first letter of each word. As you can see the logic I use to accomplish this is complex enough that I wouldn't want to have to re-write it over and over again. Instead I can reuse the function I've defined over and over again as needed. Give it a try:

In [None]:
def titleize(title):
    return " ".join([x[0].upper() + x[1:] for x in title.split()])

In [None]:
print(titleize("to kill a mockingbird"))
print(titleize("catcher in the rye"))
print(titleize("war and peace"))

### Import Statements & Dependencies
Probably the coolest thing about Python is that it was designed such that it easily integrates with third-party software also referred to as <strong>*packages*</strong> and <strong>*libraries*</strong>. Indeed there are thousands of community-driven packages which programmers and data scientists rely on to do their work. We will be making use of a few such packages in our workshop today. 

Here are two examples of how one would add a package to their project:

```
import some_package1
import some_package2 as sp2
```

In the first example we simply import (or add to our project) a package called `some_package1`. We now have full access to all the functionality bundled inside `some_package1`; it's essenstially as though we copied and pasted all of the code from `some_package1` into our project. 

The second example is much the same except in the case we <strong>*alias*</strong> our package using the `as` keyword to `sp2`. So here again we have full access to the package `some_package2`, but instead of using the full name we use `sp2` for ease of typing.

### Recap
Now I'll show you a quick example of all the things we learned. We'll start by importing a package called `pandas`, which is used to interact with data and which we will be using extensively today. Then I'll show you how to make use of pandas' funtionality.

In [None]:
## importing a package called pandas
import pandas as pd

### A Note About CSV Files
In the workshop today, the datasets will mainly come in the form of CSV files. CSV stands for Comma Separated Values. A CSV file is essentially a simplified spreadsheet except that the columns are delimited by commas and the rows by newlines.

In [None]:
# This sort of what a CSV file looks like
raw_data = """
first_name,last_name,age
Albert,Johnson,31
Emily,Smith,29
James,Allen,37
Judith,Bridges,33
"""

In [None]:
# This Function Is Unimportant
def unimportant_function(raw_data):
    from io import StringIO
    return StringIO(raw_data)

In [None]:
### Process Our Mock Data
data = unimportant_function(raw_data)

### Create Python object for data interaction
df = pd.read_csv(data)

In [None]:
type(df)

In [None]:
df.head()

In [None]:
df['age'].max()