# Day 1 Morning Review

## Intro to Jupyter Notebooks

There are different types of cells. This is a markdown cell! Markdown is just a text formatting "language" that we use to make our notebooks easy to read. The README files in each repo are also markdown files.

In [1]:
# This is a code cell! In particular, this is a comment.

To change the type of a cell, use the dropdown menu above. Or, use `Esc` to move into "command mode," where the cell outline is blue rather than green. In command mode, `y` makes the cell a code cell and `m` makes the cell a markdown cell. 

You can do all sorts of things in command mode. You can select multiple cells with `Shift` and then use `Shift + m` to merge all of those cells.

I recommend playing around with the buttons at the top of the notebook. Figure out what they all do and, if you like, commit to memory some keyboard shortcuts for them.

Above the buttons, you see a series of tabs (File, Edit, View, etc.) The important one to understand is the Kernel tab. In it, you can restart your Python kernel, clear the output, etc. You should be able to "Restart and Run All" on your notebook to ensure everything is running as you expect it to you when you run it from top to bottom.

## Data Types Review

### Numbers

There are basically only two that you need to worry about (for now!):

In [2]:
# Integers

In [3]:
type(5) == int

In [4]:
# Floats
# Check: why are they called 'floats'?

type(5 / 3) == int # is not an integer

In [5]:
type(5 / 3)

In [6]:
type('5') != int # but this is just a string

True

### Sequential Data Types

Why are they 'sequential'? Because they have _elements_! This means they can be *indexed* and *sliced*. This also means they can be _iterated upon_. They are _iterables_. We'll discuss iterables in greater depth when we discuss control flow and list comprehensions. Note: not all iterables are sequential types!

#### Tuples

Tuples are the most 'basic' sequential type. They are **immutable** and **ordered**, and there's not too much you can do with them. But they're the 'default' data structure when there is more than one element:

In [7]:
x = 1
y = 2
z = x,y

`x,y` is the same as `(x,y)`:

In [8]:
z == (x,y) 

True

But we typically think of tuples coming to us in parentheses.

#### Lists

Lists are the 'go to' sequential type / data structure / object in native Python. Like tuples, they contain **ordered** heterogenous elements. Unlike tuples, they are **mutable**, which means we can do all sorts of things to change them, like so:

In [9]:
# Lists come in square brackets:
x = [1,2,3,4,3]

# Because they are mutable, we can do stuff with them:
x.append(4)
x

[1, 2, 3, 4, 3, 4]

Another example of a list method is sort. (A method is just a function that is defined for objects of a certain type. Instead of calling it directly, we call it _on_ an object, like so: `object.method(arguments)`.)

In [10]:
x.sort(reverse=True) # An inplace operation
x

[4, 4, 3, 3, 2, 1]

#### Sets

Sets are inspired by the mathematical notion of a set. They are **mutable** but **unordered** collections of **unique** items. They come in `{}` most of the time. (Note: you can create sets with `{}` _or_ with the `set` function. Likewise, there are `list`, `tuple`, and `dict` functions.)

In [11]:
y = set(x)
y # Notice that duplicates were removed

{1, 2, 3, 4}

In [12]:
z = {2, 7, 5}

The methods that apply to sets come from mathematical set theory and probability:

In [13]:
y.intersection(z) # The elements of y that are also in z

{2}

#### Dictionaries

The best way to understand dictionaries it to think of how a regular language dictionary works. Instead of looking up elements according to their location in a certain order, you look up elements according to a certain 'key,' which may be any immutable object. 

Dictionaries are 'semi-ordered' collection of **key, value** pairs. I say 'semi-ordered' because Python recently implemented a kind of order for dictionaries. Nevertheless, Python throws an error if you try and access the 'first' value in a dictionary, because that is not how they are meant to be used!

In [14]:
jimmies = {'Jimmy':{'Last Name':'Woehrle'},
 'James':{'Last Name':'Linek','GHE Username':'Something'}}

`jimmies` is a dictionary where each value for its two keys is itself a dictionary.

### An important distinction:

You've noticed that sequential data types can be _nested_. This means that we need to respect the distinction between the type of an object and the type of the elements that object contains. 

An example:

In [15]:
# What type is x?
x = 1
type(x)

int

In [16]:
# What type is y?
y = [1]
type(y)

list

This distinction becomes very important when we begin working with Numpy and Pandas. For example, we might be dealing with a column of numbers. In Pandas terminology, the column's type is a `Series`, but the type (`dtype`) of the elements in that column is `int`.

### A Numpy preview:

Numpy (short for 'numerical python') gives us access to _new_ data types that didn't come with native Python.

In [17]:
import numpy as np

In [18]:
x = 1
type(x)

int

In [19]:
x = np.int64(x)
type(x)

numpy.int64

## Indexing and Slicing Sequential Data Types

Everywhere in Python, the square brackets (`[]`) are used to *index* and to *slice* things. You'll see them all the time when you subset your data.

But it is important to know how to do basic indexing and slice on the basic Python sequential data types.

#### Dictionaries

In [20]:
jimmies['James'] # Look up the key 'James' in the dictionary 'jimmies'

{'GHE Username': 'Something', 'Last Name': 'Linek'}

In [21]:
jimmies['James']['Last Name'] # Do one further look up to find James' Last Name

'Linek'

In [22]:
jimmies['Jamie'] # There is no key called 'Jamie'!

KeyError: 'Jamie'

In [23]:
# .get is a safe way of looking up a key if you're not sure it exists
jimmies.get('Jamie') # It returns nothing:

#### Lists

In [24]:
x = [1,2,3,4,5,5,6]

To access the element of `x` corresponding to the index `i`, use `x[i]`:

In [25]:
x[0] # Python is zero-indexed

1

The square brackets have this syntax: `[start:stop:step]`. To go backwards, specify a negative `step`.

In [26]:
x[-1] # If no colons are used, 'start' is assumed. Return the last element.

6

In [None]:
x[:3] # Start is unspecified, so we start at the beginning. 'Stop' is exclusive!

In [None]:
x[::2] # Go from the beginning to the end by twos.

In [27]:
x[5:2:-1] # Go backwards from index 5 to index 3 (because stop is exclusive)

[5, 5, 4]

#### The same bracket slicing syntax is used elsewhere:

In [28]:
string = 'data science is boring' 
# Remember that strings are sequential data types!

In [None]:
string[::-1] # String is reversed

In [29]:
# To split the string on whitespace, btw:
string.split(sep=' ') # sep is actually the default though

['data', 'science', 'is', 'boring']

In [30]:
y = (1, 2, 3, 4, 5)
y[::-1] 
# Show the reversed tuple. Note that the tuple itself is not reveresed. Tuples are immutable!

(5, 4, 3, 2, 1)

#### An addendum:

This is **multiple variable assignment**, specifically **tuple unpacking**:

In [31]:
a, b, c = ('a','b','c')
print(a)
print(b)

a
b
