# Module \#1 - Orientation and Introductory Python

## Starting from scratch with Python & Jupyter

Python is a general-purpose programming language used widely in machine learning and data science. To begin working with Python, we will use Jupyter Notebook. This program allows you to write and run python code inside of blocks. 

To execute and code in a block, simply press "Run" or, press ctrl+enter (Windows) or cmd+enter (macOS).

### Hello world!

To begin, we will follow in the time-honored tradition of showing you how to print "Hello world!" in python. **Execute the block below** to print "Hello world!".

In [1]:
print("Hello world!")

Hello world!


**Challenge**: Can you modify the block below to print "Hello Python!"?

In [2]:
print()




### Markdown

Jupyter notebook allows you to write code in R, python, julia in code blocks. It also allows you to write markdown in markdown blocks. 

Markdown blocks are converted to formatted text when executed. **Execute the block below** to display "Hello world!"

"Hello world!"

#### Formatting in markdown

Markdown is very powerful and used ubiquitously throughout the computing universe. You can easily style your markdown to display it the way you want. **Execute the block below** to reveal how markdown is converted to styled text. 

This example is borrowed from [markdownguide.org](https://www.markdownguide.org/cheat-sheet/)

# H1
## H2
### H3
#### H4

**bold text**

*italicized text*

> blockquote

1. First item
2. Second item
3. Third item

	- First item
- Second item
- Third item

`code`

---

[link](https://www.example.com)


| Syntax | Description |
| ----------- | ----------- |
| Header | Title |
| Paragraph | Text |


```
print("Hello world!")
print("This is a multi-line code snippet")
```

term
: definition

~~The world is flat.~~

- [x] Write the press release
- [ ] Update the website
- [ ] Contact the media

![alt text](https://cdn0.iconfinder.com/data/icons/octicons/1024/markdown-512.png)


### Efficient Jupyter Notebook

Finally, Jupyter notebook is used most-efficiently when making full use of keyboard shortcuts and built-in tools. A full cheatsheet can be found [here](https://www.edureka.co/blog/wp-content/uploads/2018/10/Jupyter_Notebook_CheatSheet_Edureka.pdf).

For now, a couple key concepts:

In Jupyter, you can be in either **Edit mode** or **Command mode**

**Edit mode** allows you edit blocks. You can enter it by double-clicking on a block or by using the arrow keys to select a block and pressing enter. In this mode, you can write markdown or code blocks.

**Command mode** allows you to modify blocks without editing their contents directly. You can enter command mode by pressing the escape key. Here is a shortlist of command mode shortcuts:

Hit **Escape** to enter command mode, and then:
- Add a new block below: **b**
- Convert a block to markdown: **m**
- Convert a block to code: **y**
- Delete a block: **d + d**

**Challenge:** Can you add a new block below, convert it to a markdown block, and then use it to print "Hello Jupyter!" in bold?

<hr>

# Simple objects in python

Python is an object-oriented programming language. This means that **everything in python is an object**. Objects have (1) a class (aka "type"), (2) properties, and (3) methods. Let's look at some simple object types in python and their associated methods:

## Booleans and comparisons

Booleans (logicals) are a simple object type in python. They are binary because they can be only `True` or `False`.

In [25]:
True

True

In [27]:
False

False

In [26]:
type(True)

bool

### Logical operations

Python makes logical operations easy and allows us to determine whether some logical condition is met. For example, we can use the `and` operator. This operator will evaluate to `True` if both left and right sides are `True`

In [49]:
# And (both left and right are True)
True and False

False

In [50]:
# And (one side isn't true)
True and False

False

There is also the `or` operator, which evaluates to `True` if at least one side is `True`.

In [51]:
# Or (only one side has to be True)
True or False

True

Finally, the `not` (not) operator. This negates a logical's value:

In [52]:
not True

False

Logical operators can be easily combined to make more complex statements. They also follow an order of operations, just like mathematical expressions. The order is:

1. `()`
2. `not`
3. `and`
4. `or`

In [55]:
not False and True

True

In [65]:
not False and not True

False

In [57]:
not (True and True)

False

In [66]:
not False or not (True or not True)

True

**Challenge question**: What is the output for the following (without running it yourself):

```python
False and True or True and (not False or False)
```

In [62]:
False and True or True and (not False or False)

True

## Numerics and mathematics

There are three main types of numerical objects in python:

1. `int` -- includes whole numbers
2. `float` -- includes decimals
3. `complex` -- includes imaginary numbers

Let's explore the `int` type first. We can create an instance of `int` by simply typing any whole number into the code block:

#### `int` 

In [6]:
1

1

We can also verify this is an `int` object with the `type` method. This method can be used on any type of object in python and it returns the class of that object.

In [7]:
type(1)

int

Instances of any class can also be created by using the name of that class as a method. For example, we can create (aka "instantiate") an empty integer, by calling the `int()` method.

In [9]:
int()

0

#### `float`

`floats` are numerical objects which have a decimal place. For example:

In [6]:
1.01

1.01

In [7]:
type(1.01)

float

#### `complex`

`complex` are numerical objects which have imaginary numbers. For example:

In [10]:
2 + 2j

(2+2j)

In [11]:
type(2+2j)

complex

### Numerical methods: Math operations

We can perform simple mathematical operations with numerical objects.

In [12]:
# Addition
1 + 1

2

In [13]:
# Subtraction
4 - 2

2

In [14]:
# Multiplication 
2 * 3

6

In [15]:
# Division
16 / 3

5.333333333333333

In [16]:
# Floored Division
16 // 3

5

In [17]:
# Modulo (remainder)
16 % 3

1

In [18]:
# Exponentiation
2**5

32

In [19]:
# Negation
-1

-1

In [20]:
# Absolute value
abs(-1)

1

**Challenge question**: What is the `type` of `22 / 2`?

#### Order of operations

Python obeys PEMDAS (parentheses, exponent, multiplication, division, addition, subtraction) to determine the order in which to evaluate a mathematical operation. For example:

In [21]:
3 + 5 * 2  # It is not 16 because the multiplication comes first

13

**Challenge question:** What is the result of this operation (without running it yourself): 

```
2 + 3 * (2 + 25 / 5 ** 2) 
```

### Logical comparisons of numerics

We can use comparison operators in python to check the relationship between any two numerics. 

In [67]:
# Greater-than
9 > 8

True

In [68]:
# Less-than
8 < 10

True

In [69]:
# Equal to
1 == 2

False

In [93]:
# Not equal to
1 != 2

True

In [70]:
# less-than or equal-to
2 <= 2

True

In [71]:
# Greater-than or equal-to
10 >= 12

False

#### Complex comparisons

We can add in arthimetic to perform more mathematically complex operations

In [96]:
5 ** 2 + 1 == 52 / 2

True

In [82]:
1j**2 == -1

True

## Strings

Strings hold text data, such as names or addresses. They are constructed by using quotations (double or single):

In [17]:
'Hello world!'  # Single quotes

'Hello world!'

In [18]:
"Hello world!"  # Double quotes

'Hello world!'

In [24]:
type("Hello world!")

str

### String methods

There are several basic methods for string objects. Many more exist and they will be covered later in the course. For a list of string methods please see the W3 schools guide [here](https://www.w3schools.com/python/python_ref_string.asp).

In [19]:
# Print
print("Hello world!")

Hello world!


In [88]:
# Concatenate
"Hello " + "world!"

'Hello world!'

In [89]:
# Upper-case
"Hello world!".upper()

'HELLO WORLD!'

#### Logical comparisons with strings

Just like numerics, logical comparisons work with strings as well

In [90]:
"Hello" == "Hello"

True

In [91]:
"Hello" == "World"

False

In [92]:
"Hello" != "World"

True

## Cross-type comparison challenge

There is no requirement that logical comparisons involve only one data type. For example:

In [112]:
# 1 is not equivalent to "Hello world!"
1 != "Hello world!"

True

In [114]:
# Both sides generate booleans which can be compared using "and"
"Hello" != "world" and 1 < 2

True

**Challenge questions**: Please take this opportunity to figure out the results of each line (be prepared to explain your answers):

```python

# Question 1
"Hello" == False or 1 * 2 <= 2

# Question 2
"Hello" + "World" == "Hello world"

# Question 3
not "A" != "B" or not True == False and 3**2 == 9 

```

## Variables

Variables are names (aka 'aliases' or 'references') given to an object in python. Any object in python can be aliased as a variable. Rather than calling the object directly, you can use the variable name instead. This enables complicated code to be written and understood by humans.

To create a variable, use the `=` sign:

In [120]:
a = 1

Now that we have created the variable `a` to hold the integer `1`, we can perform operations on `a` directly.

In [108]:
# Use a for arithmetic
a + 2

3

In [122]:
# Use a for logical comparisons
a != "Hello world!"

True

In python, any variable can reference any object, including the results of computations.

In [130]:
result_1 = 1 + 2 < 3  # Variable to reference numeric comparison
result_2 = "Hello " + "world" == "Hello world"  # Variable to reference string comparison

In [131]:
result_1 or result_2

True

### `is` and `==`

In python, two methods exist for testing equivalence:

1. `==` (equivalent values)
2. `is` (identical objects)

While the distinction is subtle, it is crucial to remember that `is` tests whether two objects are literally the same where as `==` only tests whether two objects are equal to eachother. 


For example, we can assign the numerical object `1` to the variable `a`, and then assign `a` to `b`. Both `a` and `b` refer to the same object of the numeric class holding the value `257`. Therefore, they are equivalent and the same.

In [163]:
a = 257
b = a

In [164]:
a == b

True

In [165]:
a is b

True

Conversly, if we assign `a` and `b` to `500` separately, we see that they do not refer to the same object:

In [160]:
a = 257
b = 257

In [161]:
a == b

True

In [162]:
a is b

False

**EXTREME Challenge question**:

What happens when I repeat the above example using `256` instead of `257`? Why does the result change? 

*Hint*: See [this SO post](https://stackoverflow.com/a/1085656) for additional guidance.

<hr>

# Complex objects in python

Now that we have discussed simple python objects, lets explore the wide world of complex objects. These objects provide powerful methods for the storage and manipulation of data. They are essential tools for the data scientist to wield.

Object types:
1. Lists
2. Dictionaries
3. Tuples
4. Sets
5. Numpy arrays
6. Pandas DataFrames

## Lists

Lists are a python object type which can store any arbitrary number of any type of object. 

In [174]:
# List of strings
words = ["Hello", "World"]
words

['Hello', 'World']

In [175]:
# List of numbers
numbers = [1, 2, 3]
numbers

[1, 2, 3]

In [177]:
# List of booleans
bools = [True, False, False]
bools

[True, False, False]

In [176]:
# Mixed list
mix = [1, True, "Hello"]
mix

[1, True, 'Hello']

### List methods

Lists have a wide variety of methods associated with them. For a more exhaustive reference, please refer to the W3 schools guide [here](https://www.w3schools.com/python/python_ref_list.asp).

For now, we will discuss:
1. Construction
2. Indices
3. Appending
4. Length
5. Sort

#### Construction

Lists can be constructed using the `list()` function:

In [182]:
# Just like other python object, there is a constructor function for lists
my_list = list()
my_list

[]

More commonly, lists are defined by using `[]` and providing the objects to include:

In [183]:
my_list = [1, 2, 3, 'a', 'b']
my_list

[1, 2, 3, 'a', 'b']

In [223]:
# Lists can even contain lists
lst_list = [1, 2, 3, [4, 5, 6]]
print(lst_list)

[1, 2, 3, [4, 5, 6]]


#### Indices

Lists hold data and have a specific order. To access the objects in a list, one can use the object's index. **NOTE**: unlike `R`, indices start at `0` in python.

In [184]:
# Retrieve first element from list
my_list[0]

1

In [186]:
# Retrieve fourth element from list
my_list[3]

'a'

In [187]:
# Retrieve the last element from list
my_list[-1]

'b'

Indices can also be accessed using a slice (`start:stop`). The slice indicates the range of indices to retrieve. If either `start` or `stop` is blank all elements will be included up until an element (former) or after an element (latter). 

In [192]:
# Retrieve the values from the 2nd to the 5th element
my_list[1:4]

[2, 3, 'a']

In [193]:
# Retrieve all values from the 3rd element to the end of the list
my_list[2:]

[3, 'a', 'b']

In [194]:
# Retrieve all values until the 4th element
my_list[:3]

[1, 2, 3]

#### Appending

New elements can be added to the end of a list using the `append()` method:

In [197]:
# Append 1 to a list
my_list.append(1)
my_list

[1, 2, 3, 'a', 'b', 1, 1, 1]

In [202]:
# NOTE: This overwrites the list object. 
my_list.append(2)
my_list.append("a")
my_list

[1, 2, 3, 'a', 'b', 1, 1, 1, 1, 1, 1, 1, 2, 'a']

#### Length

It may be helpful to know the length of a list. You can get that information with the `len()` function:

In [205]:
len(my_list)

14

#### Sorting

It may be helpful to sort the elements of a list. This can be accomplished with the `sort()` function.

In [210]:
my_list2 = [1, 2, 5, 3]
my_list2.sort()
print(my_list2)

[1, 2, 3, 5]


## Dictionaries

Dictionaries are the core data type of the python language. Unlike lists, dictionaries are **unordered** and accessed using keys rather than using numerical indices. This workshop will not describe dictionaries in detail, but you can refer to the W3 schools guide [here](https://www.w3schools.com/python/python_dictionaries.asp) for more info. 

In [242]:
# Create a dict using key-value pairs between {}
my_dict = {
    'hello': 1
}
my_dict

{'hello': 1}

In [243]:
# Access the value in a dict using keys
my_dict['hello']

1

In [244]:
# Dictionaries can have numerical, string, and boolean keys. They can hold any number of arbitrary object types.
my_dict = {
    'hello': 1,
    'world': True,
    123: [1, 2, ["Hello world"]],
    True: {
        "New": True
    }
}
my_dict

{'hello': 1, 'world': True, 123: [1, 2, ['Hello world']], True: {'New': True}}

In [245]:
my_dict[True]

{'New': True}

## Sets and Tuples

Sets and tuples are also used in python. Discussing them is outside the scope of this workshop. For more information please refer to the following:

1. Sets - [here](https://www.w3schools.com/python/python_sets.asp)
2. Tuples - [here]()

## Numpy arrays

`numpy` is a python package that provides complex data types for performing mathematical operations. In particular, numpy provides the `array` data type which is similar to the `matrix` in R. 

Before arrays can be constructed it is necessary to load the numpy library in Python. We can accomplish this using an import statement (below). This loads the `numpy` library as an object called `np` into python. 

In [247]:
import numpy as np

Libraries (aka "packages") in python are actually a type of object called a `module`.

In [248]:
type(np)

module

Just like other objects, they have properties and methods. We typically load modules into python because we want to use the methods they contain. As a reminder, you can access an object's methods using the `<object>.<method>()` notation. For the `numpy` module, the method we are most interested is the `array()` method -- this is what we can use to construct an `array` object.

In [251]:
# Create a 1-dimensional (1d) array holding the values 1, 2, and 3
np.array([1, 2, 3])

array([1, 2, 3])

We can create 2-dimensional arrays by adding lists of lists:

In [256]:
# Construct a 2d array
np.array([
    [1, 2, 3],
    [4, 5, 6]
])

array([[1, 2, 3],
       [4, 5, 6]])

We can even create a 3-dimension array (and beyond) using lists of lists of lists (etc). 

In [260]:
# Construct a 3d array
np.array([
    [
        [1, 2, 3],
        [4, 5, 6]
    ],
    [
        [7, 8, 9],
        [10, 11, 12]
    ]
])

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

### Numpy array methods

Many methods are available for `array` objects. An exhaustive reference is available [here](https://numpy.org/doc/stable/reference/index.html). For now, we will discuss a few key methods:

1. Creation
2. Shape and dimensions
3. Accessing elements
4. Setting elements
5. any / all 
6. Mathematical operations

#### Creation

Numpy arrays are created in multiple ways. The simplest invovles the use of lists (shown above):

In [276]:
my_arr = np.array([
    [True, False, False],
    [False, True, False]
])
my_arr

array([[ True, False, False],
       [False,  True, False]])

Arrays can also be created using the `arrange()` method. This method creates a sequential `array` given the max element specified:

In [277]:
# Create a 1d integer array from 0-9
my_arr = np.arange(10)
my_arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [278]:
# Create a 1d float array from 0.0-9.0
my_arr = np.arange(10.0)
my_arr

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

#### Shape and dimensions

`numpy` arrays have a number of dimensions and a shape. Note that these are properties, not methods. They are accessed using this pattern: `<object>.<property_name>` as follows:

In [291]:
# Construct 2d array
my_arr = np.array([
    [True, False, False],
    [False, True, False]
])

In [285]:
# Get number of dimensions property
my_arr.ndim

2

In [288]:
# Get the shape property (number of rows, number of columns)
my_arr.shape

(2, 3)

In [290]:
# Construct a 3d array
arr_3d = np.array([
    [
        [1, 2, 3],
        [4, 5, 6]
    ],
    [
        [7, 8, 9],
        [10, 11, 12]
    ]
])

# Get the shape (number of 2d arrays, number of rows, number of columns)
arr_3d.shape

(2, 2, 3)

Finally, the shape of an array can be altered using the `reshape()` method. This is particularly useful for quickly constructing arrays of a desired shape:

In [295]:
my_arr = np.arange(15)
my_arr = my_arr.reshape((5, 3))  # Note that this does NOT overwrite the my_arr object until you re-assign using '='

In [296]:
my_arr

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

The above can be simplified in 1 line of code:

In [297]:
my_arr = np.arange(15).reshape((5, 3))
my_arr

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

#### Accessing elements

Elements can be accessed using several approaches. 

1. Numerical
2. Logical

For the **Numerical** approach, numerical indices are utilized using the pattern suited to their shape:

In [312]:
# For a 1D array, similar to list
my_arr = np.array([3, 8, 1, 5])
print(my_arr)
my_arr[1]  # Get the 2nd element of the first (and only) dimension

[3 8 1 5]


8

In [311]:
# For a 2D array, the pattern is array[dim2_index, dim1_index]
my_arr = np.array([
    [5, 7, 4, 6],
    [2, 1, 9, 8]
])
print(my_arr)
my_arr[0, 1]  # First element in dim 2 (row 0) and second element in dim 1 (column 2)

[[5 7 4 6]
 [2 1 9 8]]


7

In [316]:
# For an n-dimensional array, the pattern is the same: array[dimN_index, dimN-1_index, dimN-2_index..., dim1_index]
my_arr = np.arange(125).reshape((5, 5, 5))
print(my_arr)
my_arr[2, 3, 1]  # 3rd element in dim 1 (matrix 3), 4th element in dim 2 (row 4), 2nd element in dim 3 (column 2)

[[[  0   1   2   3   4]
  [  5   6   7   8   9]
  [ 10  11  12  13  14]
  [ 15  16  17  18  19]
  [ 20  21  22  23  24]]

 [[ 25  26  27  28  29]
  [ 30  31  32  33  34]
  [ 35  36  37  38  39]
  [ 40  41  42  43  44]
  [ 45  46  47  48  49]]

 [[ 50  51  52  53  54]
  [ 55  56  57  58  59]
  [ 60  61  62  63  64]
  [ 65  66  67  68  69]
  [ 70  71  72  73  74]]

 [[ 75  76  77  78  79]
  [ 80  81  82  83  84]
  [ 85  86  87  88  89]
  [ 90  91  92  93  94]
  [ 95  96  97  98  99]]

 [[100 101 102 103 104]
  [105 106 107 108 109]
  [110 111 112 113 114]
  [115 116 117 118 119]
  [120 121 122 123 124]]]


66

For the **Logical** approach, we can use a boolean array to extract the element(s) of interest:

In [318]:
num_arr = np.array([1, 2, 3])
bool_arr = np.array([False, False, True])
num_arr[bool_arr]  #  We access the element of num_arr for which bool_arr is True

array([3])

This approach is extremely powerful when you can use logical operations to create a boolean array:

In [328]:
# Create a 2D matrix
dataset = np.array([
    [1, 2, 3, 4, 5],
    [6, 7, 8, 9, 10]
])
print(dataset)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]


In [351]:
# Create a boolean matrix for this dataset to test where values are greater than 3
bools = dataset > 3
print(bools)

[[False False False  True  True]
 [ True  True  True  True  True]]


In [349]:
# Extract the value(s) which satisfy this logical operation
dataset[bools]

array([ 4,  5,  6,  7,  8,  9, 10])

In [343]:
# Create a boolean matrix for this dataset to test where values are equal to 5 using np.equals()
bools = np.equal(dataset, 5)
print(bools)

[[False False False False  True]
 [False False False False False]]


In [344]:
# Extract the value(s) which satisfy this logical operation
dataset[bools]

array([5])

In [354]:
# Create a boolean matrix for this dataset to test where values are > 8 or < 3
bools = np.logical_or(dataset > 8, dataset < 3)
print(bools)
# Subset the data using these booleans
dataset[bools]

[[ True  True False False False]
 [False False False  True  True]]


array([ 1,  2,  9, 10])

Finally, we can use the **where** approach that is a hybrid of these two methods. 

In [408]:
# Find the numerical indices for values in the dataset > 6
indices = np.where(dataset > 6)
print(indices)
# Subset the data using these indices
dataset[indices]

(array([1, 1, 1, 1], dtype=int64), array([1, 2, 3, 4], dtype=int64))


array([ 7,  8,  9, 10])

#### Setting elements

Just as you can access elements of an array, you can also set them. This can be done with integer and logical indexing. 

Here is an example with simple integer indexing:

In [355]:
# Create a 2D matrix
dataset = np.array([
    [1, 2, 3, 4, 5],
    [6, 7, 8, 9, 10]
])
print(dataset)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]


In [358]:
# Change row 2, column 5 to the value 100
dataset[1, 4] = 100
dataset

array([[  1,   2,   3,   4,   5],
       [  6,   7,   8,   9, 100]])

You can also use logical indexing to set array values:

In [364]:
# Set every value > 3 to 0
dataset = np.array([
    [1, 2, 3, 4, 5],
    [6, 7, 8, 9, 10]
])
dataset[dataset > 3] = 0 
dataset

array([[1, 2, 3, 0, 0],
       [0, 0, 0, 0, 0]])

And, finally, you can use the `where()` method:

In [368]:
# Set all value < 7 to -1
dataset = np.array([
    [1, 2, 3, 4, 5],
    [6, 7, 8, 9, 10]
])
dataset[np.where(dataset < 7)] = -1
dataset

array([[-1, -1, -1, -1, -1],
       [-1,  7,  8,  9, 10]])

#### Any / All

`any()` and `all()` are two methods which determine whether an array satisfies a logical condition. `any()` is `True` if any element in the array satisfies the condition. `all()` is `True` if all elements of the array satisfy the condition. Examples:

In [372]:
dataset = np.array([
    [1, 2, 3, 4, 5],
    [6, 7, 8, 9, 10]
])

In [373]:
# Any values equal to 0?
np.any(dataset == 0)

False

In [375]:
# All values NOT equal to 0?
np.all(dataset != 0)

True

#### Mathematical methods

Arrays have a large number of built-in mathematic methods. Examples include `sum()` and `mean()`. They can also be used for multi-dimensional algebraic operations, such as matrix multiplication and dot products. Here are a small number of examples:

In [384]:
my_data = np.arange(9).reshape((3,3))
my_data

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [383]:
# Multiplication by scalar
my_data * 3

array([[ 0,  3,  6],
       [ 9, 12, 15],
       [18, 21, 24]])

In [380]:
# Addition by vector
my_vector = np.array([5, 10, 20])
my_data + my_vector

array([[ 5, 11, 22],
       [ 8, 14, 25],
       [11, 17, 28]])

In [407]:
# Sum of values
my_data.sum()

36

In [406]:
# Mean of values within dimension 2 (rows) -- "axis" specificies the dimension index
my_data.mean(axis=1)

array([1., 4., 7.])

In [405]:
# Max values within dimension 1 (columns)
my_data.max(axis=0)

array([6, 7, 8])

In [404]:
# Transposition
my_data.transpose()

array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

In [403]:
# Make new dataset
my_data2 = np.arange(100, 109).reshape((3,3))  # 3x3 matrix of 100:109

# Compute dot product
dot_prod = np.dot(my_data, my_data2)
dot_prod

array([[ 315,  318,  321],
       [1242, 1254, 1266],
       [2169, 2190, 2211]])

In [402]:
# Compute the pearson correlation of two 1d arrays
arr1 = np.array([1, 5, 6, 6, 7, 10])
arr2 = np.array([3, 3, 4, 3, 6, 9])
np.corrcoef(arr1, arr2)  # Correlation is ~.809

array([[1.        , 0.80873372],
       [0.80873372, 1.        ]])

## Pandas Series and DataFrame

`pandas` is arguably the most important library for data science in python. It provides both the `Series` and `DataFrame` objects, along with a large number of methods for working with them. Under the hood, it uses `numpy` so many `array` methods work with `pandas` objects. In this section, we will discuss the `Series` object and the `DataFrame` object, then introduce some core methods for working with them.

### Pandas `Series`

Similar to the 1D array, a pandas `Series` is an array where every element can have a name. See this example:

In [411]:
my_data = pd.Series(data={
    'one': 1,
    'two': 2,
    'three': 3
})
my_data

one      1
two      2
three    3
dtype: int64

Similar to a dictionary, the values can be accessed using the names:

In [412]:
my_data['two']

2

And similar to an `array`, the values can also be accessed using numbers and booleans:

In [413]:
# Access element 1
my_data[0]

1

In [415]:
# Access the element(s) which equals 3
my_data[my_data == 3]

three    3
dtype: int64

### Pandas `DataFrame`

The `DataFrame` is an extremely powerful datatype in python, and it is used ubiquitously throughout pythonic data science. A `DataFrame` is always a 2-dimensional `array` which contains named columns and rows. 

In [429]:
my_df = pd.DataFrame(data={
    'col_one': range(1, 5),
    'col_two': range(11, 15),
    'col_three': range(21, 25)
}, index = [
    'row_one', 'row_two', 'row_three', 'row_four'
])
my_df

Unnamed: 0,col_one,col_two,col_three
row_one,1,11,21
row_two,2,12,22
row_three,3,13,23
row_four,4,14,24


Methods for `DataFrame` objects are numerous and can be found [here](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html). In this module, we will only discuss the following:

1. Accessing data / setting data
2. 

<hr>

# Other programming concepts in Python

We will also briefly discuss control flow and functions in python. While these are useful techniques for python programming, they are not necessary for most typical data science activities in python. These are the topics which we will now summarize:

1. If...elif...else
2. Loops
3. Function definitions

## If...elif...else

These statements indicate code blocks that will only be executed given that a logical condition is met.

### If statements

`if` statements in python create a logic gate, such that some code will only execute if a logical condition is met. See an example here:

In [434]:
a = 1
b = 1

if a == b:
    # Execute this code only if a == b is True
    print("a is equal to b!")

a is equal to b!


The above example shows an `if` statement. The code in this statement only executes which the condition (`a == b`) is `True`. **Challenge:** Can you modify the above block so that the code will not execute?

### If...else statements

`else` statements are executed if no previous conditions are satisfied. In other words, if not of the `if` statements execute, only then will the `else` statement execute.

In [436]:
a = 1
b = 2

if a == b:
    print("a is equal to b!")
else:
    print("a is NOT equal to b!")

a is NOT equal to b!


### If...elif...else statements

`elif` is a phrase that means "else if". This means that if the preceeding logical conditions are not satisfied, only then is this statement tested. 

In [440]:
grade = 68

if grade > 90:
    # Only executes if grade > 90
    letter_grade = "A"
elif grade > 80:
    # Only executes if grade > 80 and grade <= 90
    letter_grade = "B"
elif grade > 70:
    # Only executes if grade > 70 and grade <= 80
    letter_grade = "C"
elif grade >= 60:
    # Only executes if grade > 60 and grade <= 70
    letter_grade = "D"
else:
    # Only executes if grade < 60
    letter_grade = "F"
    
print("Student earned a grade of " + letter_grade)

Student earned a grade of D


In the above example, each logical condition is tested in sequence. Only when a condition is not met is the next one tested. If a student has a grade of `68`, then every `elif` statement will be tested. If the student had a `96`, then no `elif` statements would have been tested.

## Loops

Loops allow for some code to be applied to every element of an iterable object, such as a list. 

### For loops

For loops are a type of finite loop in python (as opposed to `while` loops which we will not discuss here). A for loop iterates over an iterable object, such as a `list` or `tuple`. For every element of the object, code will be executed in succession. Here is an example:

In [447]:
grades = [85, 98, 45, 73]

# Loop over list of grades and print letter grade
for grade in grades:
    if grade > 90:
        # Only executes if grade > 90
        letter_grade = "A"
    elif grade > 80:
        # Only executes if grade > 80 and grade <= 90
        letter_grade = "B"
    elif grade > 70:
        # Only executes if grade > 70 and grade <= 80
        letter_grade = "C"
    elif grade >= 60:
        # Only executes if grade > 60 and grade <= 70
        letter_grade = "D"
    else:
        # Only executes if grade < 60
        letter_grade = "F"

    print("Student earned a grade of " + letter_grade)


Student earned a grade of B
Student earned a grade of A
Student earned a grade of F
Student earned a grade of C


Rather than using the list of grades directly, it may be useful to use the numerical indices of list elements. For example:

In [446]:
students = ['alice', 'kevin', 'sara', 'tim']
grades = [85, 98, 45, 73]

for i in range(len(grades)):
    
    grade = grades[i]
    student = students[i]
    
    if grade > 90:
        # Only executes if grade > 90
        letter_grade = "A"
    elif grade > 80:
        # Only executes if grade > 80 and grade <= 90
        letter_grade = "B"
    elif grade > 70:
        # Only executes if grade > 70 and grade <= 80
        letter_grade = "C"
    elif grade >= 60:
        # Only executes if grade > 60 and grade <= 70
        letter_grade = "D"
    else:
        # Only executes if grade < 60
        letter_grade = "F"

    print(student + " earned a grade of " + letter_grade)


alice earned a grade of B
kevin earned a grade of A
sara earned a grade of F
tim earned a grade of C


## Functions

Functions (aka "methods") are objects in python which take an input, perform computations, and return an output. Functions have arguments that help the function operate correctly

In [450]:
def square_it(x):
    print(x ** 2)
    
square_it(5)

25


Functions can also return a value. This is more common in python programming than simply printing the value:

In [452]:
def square_it(x):
    return x ** 2
    
result = square_it(5)
print(result)

25


**Challenge problem:** create a function with one argument, `grade`. The argument should convert `grade` to a letter grade and return this to the user. Then, use this function to simplify the for loop from earlier. 