<a href="https://colab.research.google.com/github/ddsmit/teach_and_learn/blob/master/jupyter_training_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Jupyter Training for Data
## Welcome!
I'm writing this notebook for people who are new to Python and Jupyter, but have some background in programming. I will quickly go over some of the basics of Python (types, conditionals, and loops), and then I will get into some of the Python specific packages that make data analysis much easier. But first, an introduction into Jupyter.

## What is Jupyter
Jupyter is an app based on web technologies that allows for incremental processing and data exploration. Jupyter is made up of cells, which can come in a variety of formats. The cells I use most often are Markdown and Code cells. Markdown cells use the markdown format to create nicely formatted text, lists, images, etc for documenting the study and explaining the analysis process in the notebook (this cell and the previous cell is Markdown). Code cells take code as an input, and can show an output depending on the contents of the cell. The code cells we will be working with will be written in Python, but there are a variety of languages that can be used in the cells (R, Julia, Javascript, etc).

## Starting With Jupyter
### Basics

Below are a couple important things to know to get started with Jupyter:

- To execute a code cell or show the output of the markdown cell, you can click the "Run" button or use shift + ENTER
- A new cell is created below the bottom cell when you run the cell.
- To create a new cell without running a cell or to insert a cell, click the "insert cell below" (+) button
- You can click the "up" and "down" buttons to shift cells around the notebook.
### The Kernel

One thing that makes Jupyter great for analyzing large data sets (and also tricky to troubleshoot) is if data is assigned to a variable, it stays in memory as long as the kernel stays alive. This means that you don't have to load that massive data set or run that expensive query over and over again. It also means that it can be easy to accidently introduce bugs because you accidently used a variable twice, and the newer data persists when you try to run older cells. 
The "Kernel" menu has a variety of options for managing the kernel:

- Interupt: This stops the processing, but does not kill the kernel.
- Restart: Restarts the kernel, clearing the memory
- Restart and Clear Output: Restarts the kernel and clears the output form the cells
- Restart and Run All: Restarts the kernel and Reruns all Cells in the Notebook
I have not used the last 2 options extensively, so I don't want to mispeak about their purpose. If you're curious, you can do some research yourself!

Now, lets get started with a simple example to show how the kernel retains data. Run the cell below (#cell 1).

In [0]:
#cell 1
#assign a value to a variable
x = 1
x

1

Notice how the value of "X" was output. This is because Jupyter will show the output if something valid is on the bottom of the cell. Now run #cell 2 and #cell 3 below:

In [0]:
#cell 2
x = 2

In [0]:
#cell 3
x

2

As you would expect, the output was "2" because the value of 2 was assigned to "x". Now run #cell 1 and #cell 3 OMITTING #cell 2. What was the output?
This behavior is part of the reason you need to be carefull with execution order in Jupyter to make sure you're not accidently changing your results.

# Python: Built in Types
As with any programming languages, it is important to understand the basic/most common ways that data is stored in the language. There are many types built into the Python standard library, but we will focus on some of the types I've used most often in my data analysis.
One feature of Python that makes it so powerful for quick interation is that it is a dynamically typed language. This means that you do not need to declare the type of a variable ahead of time. Also, you can assign a value of a different type to a variable without an issue

## String
Python has a single type that covers all character/string data. This type has may built in functions that can be very helpful. First, we will start by assigning a string to a variable. To let python know a value is a string, you can use single quotes '', double quotes "", or triple quotes """""". Triple quotes allow for multi line text, and are typically reserved for such cases.

In [0]:
first_name = 'David' #single quotes
last_name = "Smit" #double quotes
life_story = """
My life story is very short, 
but it doesn't mean it only takes 1 line.
"""

What if you want to know how long a string is? Use the len() function.

In [0]:
len(first_name)

5

If you want to lower all of the letters, capitilize all of the letters, or capitlize the first letter, there are functions for that too!

In [0]:
last_name.lower()

'smit'

In [0]:
last_name.upper()

'SMIT'

In [0]:
last_name.capitalize()

'Smit'

You may have noticed how the syntax was different between the len and lower/upper/capitalize. This is because len is a function, and lower/upper/captilize are methods. We will get into that a little bit more in the intermediate notebook. 

If you only have (1) value in square brackets [] after the variable, it will return the value at that index (starting at 0). So if you use 0 in the brackets, you'd get:

In [0]:
first_name[0]

'D'

If you use 1 in the brackets, you get:

In [0]:
first_name[1]

'a'

Now, we will try slicing with strings. Slicing can be used on many datatypes to pull out sections of the data quickly and easily. In the case of text, you can pull out a specific section of the text.
Slicing is done by putting square brackets [] behind a variable like finding the index, but you also include multiple values seperated by ":"

[start index:end_index:step]

If you ommit the value for step, it will default to 1. So to pull out indexes 0 to 2, you would do the following:

In [0]:
first_name[0:2]

'Da'

Notice how the output wasn't 'Dav', this is because the slice gets the values up to, but not including the index in the 2nd spot. You can also omit a value, which means it will default to the first or last index depending on which value is omitted. 

In [0]:
first_name[:2] #Begining up to index 2

'Da'

In [0]:
first_name[:3] #Begining up to index 3

'Dav'

In [0]:
first_name[1:] #From index 1 to end

'avid'

In [0]:
first_name[2:] #From index 2 to end

'vid'

You can also use a negative value. In the first position, this will index from the end for the starting point (2 indexes from the end). This can be very helpful.

In [0]:
first_name[-2:]

'id'

In [0]:
first_name[:-2]

'Dav'

The third value (step) can be used to index at a different increment other than 1. For example, if you use 2 in for the 3rd value, you will index through every other letter.

In [0]:
first_name[::2]

'Dvd'

You can also use a negative value to reverse the string.

In [0]:
first_name[::-1]

'divaD'

THe illustration below is a good reference to use for understanding how indexing works (the example is the string 'probe').
![alt text](https://cdn.programiz.com/sites/tutorial2program/files/python-list-index.png). 

The original source of this image is https://www.programiz.com/python-programming/list, check out the article for more information on lists!

## List
Another very helpful type in Python is the list. The list is a collection that is mutable and ordered. What does this mean? It means the contents can be changed, and the order is maintained. Lists can contain anything, including other collection types (like lists), and a list does not need to contain all of the same types. A list can be created by putting items in square brackets seperated by a comma. You can also create an empty list by using empty square brackets. 
Below is an example of a list being assigned to a variable.

In [0]:
people = ['David','Yolanda','Cydney','Mathilda','Kent']

Lists can be sliced in the same exact way as a string. 


In [0]:
people[0]

'David'

In [0]:
people[0:2]

['David', 'Yolanda']

In [0]:
people[-1:]

['Kent']

In [0]:
people[:-2]

['David', 'Yolanda', 'Cydney']

In [0]:
people[::-1]

['Kent', 'Mathilda', 'Cydney', 'Yolanda', 'David']

In [0]:
people[1:3]

['Yolanda', 'Cydney']

You can use methods such as append, pop, and remove to update the list.
Append will add an item to the end of the list. 

In [0]:
people.append('Ina')
people

['David', 'Yolanda', 'Cydney', 'Mathilda', 'Kent', 'Ina']

Pop will remove the last item in the list, and return the value of the last item. Below, we are assigned the returned value to the variable "last_person" and then displaying both the last_person variable and the people list.

In [0]:
last_person = people.pop()
print(last_person)
people

Ina


['David', 'Yolanda', 'Cydney', 'Mathilda', 'Kent']

Wait, what is "print"? This is a function that will either print out a value to standard out or in the output poriton of the cell. 

## Sets
Sets are a mutable, unordered collection type. Only 1 of any value can exist in a set. I will covert a collection to a set as an easy was to eliminate duplicate values.
You can create a set by using curly braces {} and seperating the values with a comma:

In [0]:
example_set = {'id1','id2','id3','id3','id4'}
example_set

{'id1', 'id2', 'id3', 'id4'}

Notice how there was only 1 instance of id3, this is due to the behavior described above (only 1 instance of each value). To create an empty set (or to convert another collection to a set), you use the set function:

In [0]:
example_list2 = [1,2,3,3,3,2,5,4,2,4,6,7,8,6,5,4,3,3,]
example_set2 = set(example_list2)
example_set2

{1, 2, 3, 4, 5, 6, 7, 8}

## Tuples
Tuples are an imutable collection (cannot be changed). I most often use them as keys in a dictionary when I want multiple pieces of data to define a key (I wil explain more later on in dictionaries). Slicing in a tuple works just like a list, but many of the other functions for a list will not work with a tuple because it cannot change. A tuple can contain multiple data types including a mixture of types.
Tuples can be created by seperating items with a comma in parenthesis ().

In [0]:
example_tuple1 = 

## Dictionaries
A dictionary is a datatype that includes a key and a value. The key much be unique and m