# Exercise 2, Introduction to Python, part 2: variable types, syntax, and logic

Last class we got out feet wet. In class today, we'll work thorugh most of the key basics along with introducing our first two packages for data science: `numpy` and `scipy`.

We are essentially covering the core of the Python standard Library (this is what you are installing when you "install python"). [The doumentaion for all the tools included can be found here](https://docs.python.org/3.8/library/index.html). 

It's also worth noting that python is very forgiving for a lot of user decisions relating to formatting, naming, etc. but the community has decided on a set of standards documented in [Python Enhabncement Proposal 8 (aka PEP 8)](https://www.python.org/dev/peps/pep-0008/). It's worth taking a read, but there are a few things important conventions worth knowing:

1. names should reflect purpose not implimentation
2. indents are made with 4 spaces (not 2 spaces, not tabs)
3. variables, functions should be named all_lower_case_with_underscores
4. class objects should be named in CamelCase
5. packages should generally have short onewordlowercase names

In general, _any_ programming requires constantly checking the docs associated with the code. I usually keep tabs open to the documentation I need. For quick reference these can be found under `Help` in the menubar. 

In Jupyter you can also access the documentation for a function directly when you have typed by pressing `Shift-TAB` or opening the `Conetxtual Help` window. 

## Table of Contents

* [2.1 Python variable types](#python_types)
    * [Introducing sets](#sets)
    * [Introducing formatted strings](#fstrings)
    * [+= syntax](#plus-equals)
    * [Type Exercises](#type-ex)
* [2.2 Function syntax](#functions)
    * basic syntax
    * `*` and `**` operators (including `zip`)
    * Function exercises 
* [2.3 Control](#control)
    * `if` problems
    * `for` loop problems
    * `while` loop problems
* [2.4 Accessing packages](#packages)
    * e.g. `itertools`
    * e.g. `collections`
    * e.g. `datetime`
* [2.5 Introducing `numpy`](#intro-numpy)
    * [Creating 1D `numpy` arrays](#create-arrays)
    * [Basic operations with numbers](#numbers)
    * [Array properties and dtype](#dtype)
    * [Manipulating arrays](#manipulating)
    * [Dealing with NaNs](#NaN)
    * [Univariate operations with arrays](#univariate-ops)
    * [Bivariate operations with arrays](#bivariate-ops) 
    * [Subtleties: views vs copies](#subtleties)
    * [`numpy` exercises](#numpy-exercises) 
* [2.6 Introducing `scipy`](#intro-scipy)
    * [`scipy` exercises](#scipy-exercises)

## 2.1 Python variable types <a name="python_types">
    
Last class, we introduced the fundamental variable types that python uses

* Numbers
* Strings (and formatted strings)
* Lists
* Booleans (True / False)
* Tuples
* Dictionaries

Today we will review these and add one more

* Sets

### 2.1.1 Introducing sets <a name="sets">

There is one variable type built into Python that we did not discuss: Sets. Like a list or tuple it is a combination of other variables, but unlike these other types there is

1. There is no notion of the order (so no subscripting aka indexing)
2. It comes with functions that make sense for sets (intersection, union, etc)
    
Sets can be created from lists or tuples using the `set(.)` command.

In [53]:
a = [1,1,1,2,3,2,5,7,4,7,7]

b = set(a)
b

{1, 2, 3, 4, 5, 7}

In [9]:
b[0]

TypeError: 'set' object is not subscriptable

or you can create them directly using curly brackets `{.}`

In [20]:
# EXERCISE

c = {3, 6, 7, 9, 10}

b.??????(c)

SyntaxError: invalid syntax (<ipython-input-20-4fff2fe5742a>, line 6)

In [None]:
# EXERCISE
# What can we do with another set `c`? 
# Use jupyter's autofill command `TAB` to find out what functions there are. (Type `b.` and then 
# press `TAB`)

b.??????(c)

In [None]:
# EXERCISE
# Note that many functions have FUNCTION_update version. What is the difference?
# Play around and find out. Make sure to check what `b` and `c` equal after you run your function




In [18]:
# EXERCISE
# Play around with defining and manipulating sets here using autocomplete. 


b.

### 2.1.2 Introducting formatted strings <a name="fstring">

Very often, you want to be able to create strings that incorperates information from varaibles (for printing for example). These are called formatted strings and Python provides three ways of creating them.

1. % syntax
2. STRING.format syntax
3. f-string syntax

We're going to ignore (1) since this is mostly depreciated this point, and we're going to ignore (2) since (3) is way easier.

#### f-string syntax

Basically, you can insert the string version of variables in a string if you prepend`f` to the string and use `{ }` to surround the variable name. 

_Read the [python docs of string formatting](https://docs.python.org/3/tutorial/inputoutput.html#formatted-string-literals)_

In [23]:
age = 2

f"Ayla is {age} years old"

'Ayla is 2 years old'

You can even do calculations using python code

In [25]:
f"Ayla is {age} years old, but if you ask her she says she is almost {age+1}"

'Ayla is 2 years old, but if you ask her she says she is almost 3'

You can also ask python to format the way variables are included. For example, you can add padding 

In [32]:
f"Ayla is {age:10} years old"

'Ayla is          2 years old'

or specify the precision of a number

In [43]:
from math import pi

f"A circle's circumference divided by its diameter is around {pi:.3} but much closer to {pi:.20}"

"A circle's circumference divided by its diameter is around 3.14 but much closer to 3.141592653589793116"

There is a **lot** more you can ask Python to do in how it actually converts the variable to a string.To understand more read on [the format specifier mini-language](https://docs.python.org/3/library/string.html#formatspec). 

### 2.1.3 += Syntax <a name="plus-equals">

Very often, you want to modify a variable using an arithmatic operator (like `+`, `*`) etc. and resave that variable under the same name. You could write 

In [54]:
a = 5

a = a + 2
a

7

or you could use the syntax `+=`

In [55]:
a = 5

a += 2
a

7

In [None]:
a = 5

# Try seeing what other `?=` operators there are.

a ?= 2

### 2.1.4 Variable type excrcises <a name="type-ex">

In [None]:
# sorting, counting, comparing


In [None]:
file_name = ''


## 2.2 Function syntax <a name="functions">

## 2.3 Control <a name="control">

### 2.3.1 `if` exercises

In [None]:
# PROBLEM
# Write a script below that prints "even" if `a` is even and "odd" if it is odd. 

a = 5







In [None]:
# PROBLEM
# Write a Python program to print the mean & median of three given numbers, `a`, `b`, `c`. 
# Have it print the output nicely using f-strings. 


a = 3
b = 7
c = 4





### 2.3.2 `for` loop exercises

In [None]:
nums = [1, 1, 3, 5.7, 7, 8.3, 4.6, 5] 

# Write a Python program to print the mean, median, and mode of a list `nums` of arbitrary length.
# Hint: lists have methods of counting counting the number of items. 
# Have it print the output nicely using f-strings. 


In [None]:
# For loop syntax

In [None]:
# Problems with for loops

In [45]:
# list comprehnsion syntax

In [None]:
# list comprenesion syntax with if

### 2.3.3 `while` loop exerscises

In [None]:
# while examples

In [None]:
# list comprehension syntax with while

## 2.4 Accessing packages <a href="packages">

You can acess packages using 
* `import...`: to import a package under its original name 
* `import...as...`: to importpackage under a new name
* `from...import...`: to import particular functions from a package into your namespace without importing the whole package. 

Technically, there is one more way but the Python community will be very, very angry with you if you do this (and you **don't** want to see the Python community get angry).

* `from...import *` to import all functions from a package into your namespace

* e.g. `itertools`
* e.g. `datetime`

## 2.5 Introduction to `numpy` <a name="intro-numpy">
    
`numpy` is the heart of using python for fast, efficient data processing. Python has many advantages, but it was not optimized for data manipulation - instead it was optimized for ease of use and flexiblity. Specifically, the way that python stores sets is not efficient for computations. Each element of a list is really a "pointer" to a memeory location that contains the object in question. These locations could be all over the place so as you can imagine if you need to conduct operations that require manipulating many elements at the same time this will be very slow as the program must constantly search all over memory to complete the task. 
    
`numpy` introduces the notion of an `array` which stores elements sequentially in memory for fast processing. These arrays are typically numbers but they can contain any other type of python object and even custom types (see [dtype section](#dtype)). Furthermore, these arrays can be _multidimensional_ (e.g. 2D like a matrix, 3D etc.) 
    
    
Much of what is below comes from existing websites
* [Numpy Beginners Tutorial](https://numpy.org/doc/stable/user/absolute_beginners.html)
* [Numpy Tutorial on Linear Algebra](https://numpy.org/doc/stable/user/tutorial-svd.html)
    
If you are familiar with MatLab you may want to look at [NumPy for MatLab users](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html).
    
Typically when numpy is imported it is renamed `np` so that you aren't stuck typing `numpy` every time you want to do anything.

In [1]:
# Let's go!

import numpy as np

### 2.5.1 Creating `numpy` arrays <a name="create-arrays">
    
There are many ways to create numpy arrays:
* `np.array(.)` 
* `np.zeros(.)` 
* `np.ones(.)`
* `np.full(.)`
* `np.empty(.)` 
* `np.arange(.)`
* `np.linspace(.)`
    
`np.array(.)` converts another list object to an array

In [5]:
# Creating an array from a list

a = np.array([1, 2, 3, 4, 5, 6])
a

array([1, 2, 3, 4, 5, 6])

In [54]:
# They can be indexed just like lists

print(a[0:4])
print(a[-1])

[[1. 1.]
 [1. 1.]
 [1. 1.]]
[1. 1.]


They can also be indexed in really useful ways that lists can't. For example, if you wanted to spit out the 1st, 3rd, and 4th elements of a list you might try

In [59]:
# Oops

a = [10,20,30,40,50]
a[[1,3,4]]

TypeError: list indices must be integers or slices, not list

In [60]:
# But if you make `a` an array....

a = np.array([10,20,30,40,50])
a[[1,3,4]]

array([20, 40, 50])

The `==` operator also acts _on each element_ of the array **and** the array subscripting allows a Boolean list of equal length to be used. Putting these two facts together you can subselect elements of the array using conditions. 

In [63]:
print(a > 20)
print(a[a>20])

[False False  True  True  True]
[30 40 50]


If you want to know is _any_ elements of the array or _all_ elements of the array meet a critera use the `.any()` or `.all()` methods. 

In [65]:
print((a > 20).any())
print((a > 20).all())

True
False


In [6]:
# You can use a list of lists to create a multidimensional array
# NOTE: each of the sub-lists MUST be the same length.

a = np.array([[1, 2, 3], [10, 12, 13]])
a

array([[ 1,  2,  3],
       [10, 12, 13]])

In [10]:
# Indexing uses two coordinates ARRAY[ROW, COL]
# can also ask numpy to spit out an entire row or column using `:`

print(a[0,1])
print(a[1,:])
print(a[:,2])

2
[10 12 13]
[ 3 13]


`np.ones(.)`, `np.zeros(.)`, `np.empty(.)` take a number or a tuple to create an element 

In [16]:
# ones and zeros do exactly what you might expect
np.ones(6)

array([1., 1., 1., 1., 1., 1.])

In [17]:
np.ones((3,2))

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

`np.empty(.)` is a bit more subtle. Use this command if you know you'll be replacing the elements. It just crates an array without clearning memory so the numbers a garbage. It's faster than the other operations. 

In [19]:
np.empty(10)

array([-2.31584178e+077, -2.31584178e+077,  9.88131292e-323,
        0.00000000e+000,  0.00000000e+000,  0.00000000e+000,
        0.00000000e+000,  0.00000000e+000,  0.00000000e+000,
        0.00000000e+000])

`np.full(., .)` creates arrays filled with a specific values. 

In [20]:
np.full(10, 3.14)

array([3.14, 3.14, 3.14, 3.14, 3.14, 3.14, 3.14, 3.14, 3.14, 3.14])

In [23]:
# All of these functions have a function_like version. Use jupyter's `Shift+TAB` or `Contextual Help` fratures to figure out what they do. 

np.??????

SyntaxError: invalid syntax (<ipython-input-23-60e2e8f1eb9a>, line 3)

You can also generate sequential numbers one of two ways: `np.arange(.)`, `np.linspace(.)`. `np.arange(.)` works just like range but it creates a numpy array. Just like range the initial number is inlcuded but final number is not. 

In [29]:
# np.arange(START, STOP, STEP)

np.arange(1,15,1)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [28]:
np.arange(1,15,2)

array([ 1,  3,  5,  7,  9, 11, 13])

`np.linspace(START, END, LENGTH)` is similar but you specify how long you'd like the array to be with the third argument. Also note that the linspace command creates an array that **includes** the initial and final values. 

In [67]:
np.linspace(1, 15, 100)

array([ 1.        ,  1.14141414,  1.28282828,  1.42424242,  1.56565657,
        1.70707071,  1.84848485,  1.98989899,  2.13131313,  2.27272727,
        2.41414141,  2.55555556,  2.6969697 ,  2.83838384,  2.97979798,
        3.12121212,  3.26262626,  3.4040404 ,  3.54545455,  3.68686869,
        3.82828283,  3.96969697,  4.11111111,  4.25252525,  4.39393939,
        4.53535354,  4.67676768,  4.81818182,  4.95959596,  5.1010101 ,
        5.24242424,  5.38383838,  5.52525253,  5.66666667,  5.80808081,
        5.94949495,  6.09090909,  6.23232323,  6.37373737,  6.51515152,
        6.65656566,  6.7979798 ,  6.93939394,  7.08080808,  7.22222222,
        7.36363636,  7.50505051,  7.64646465,  7.78787879,  7.92929293,
        8.07070707,  8.21212121,  8.35353535,  8.49494949,  8.63636364,
        8.77777778,  8.91919192,  9.06060606,  9.2020202 ,  9.34343434,
        9.48484848,  9.62626263,  9.76767677,  9.90909091, 10.05050505,
       10.19191919, 10.33333333, 10.47474747, 10.61616162, 10.75

Finally, `np.diagonal` creates a sqaure 2D array with 0s off the diagonal. You feed it an array of values you would like to see along the diagonal

In [None]:
# EXERCISE
# Create some arrays and use the Boolean subscripting (e.g. a[a >20]) to select elements of the array. 






`numpy`'s random subpackage also includes ways of creating arrays with random values. 

* `np.random.rand(SHAPE)`:
* `np.rad

In [124]:
np.random.rand(10)

array([0.44903119, 0.43824396, 0.11718804, 0.00873917, 0.20884401,
       0.93293688, 0.52278392, 0.65129636, 0.38723622, 0.11027946])

### 2.5.2 Basic operations with numbers <a name="numbers">
    
You can do standard arithmatic with numbers and arrays (`+`, `*`, etc.). This applies the operation pointwise across each element of the array 

In [69]:
10*np.arange(1,10)

array([10, 20, 30, 40, 50, 60, 70, 80, 90])

In [50]:
# np.hstack and np.c both allow you to combine arrays "horizontally (axis 1)"


a = np.ones((3, 2))
b = np.ones((3, 2))

print(f"concatenate\n {np.concatenate([a, b], axis=1)}\n")
print(f"hstack\n{np.hstack([a, b])}\n")
print(f"c_\n{np.c_[a, b]}\n")

concatenate
 [[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

hstack
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

c_
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]



You can apply any function you want across an array. 

In [82]:
def squareer(x):
    return x**2

a = np.arange(1,100)
squareer(a)

array([   1,    4,    9,   16,   25,   36,   49,   64,   81,  100,  121,
        144,  169,  196,  225,  256,  289,  324,  361,  400,  441,  484,
        529,  576,  625,  676,  729,  784,  841,  900,  961, 1024, 1089,
       1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936,
       2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025,
       3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356,
       4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929,
       6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744,
       7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801])

but this essentially employs a `for` loop. To make this more efficient `numpy` has a command to generate what are called "universal functions" on arrays from standard python functions. 

### 2.5.3 Array properties and dtype <a name="dtype">

To find the shape and number of dimensions of an array you can use the `ARRAY.shape` and `ARRAY.ndim` variables. 

In [90]:
a = np.empty((10,5))

print(a.shape)
print(a.ndim)

(10, 5)
2


In order to store arrays efficiently, numpy needs to know something about what type of data is in an array. This is called the array's `dtype`. To access an array's dtype use the `ARRAY.dtype` variable. 

In [110]:
a.dtype

dtype('int64')

In [112]:
# EXERCISE
# Try to interpret the number in the dtype for the array below

np.array(['hi', 'there'])

array(['hi', 'there'], dtype='<U5')

### 2.5.4 Manuplulating arrays <a name="manipulating">
    
Numpy provides **a lot** of ways to manipulate arrays. [Check out the API reference here.](https://numpy.org/devdocs/reference/routines.array-manipulation.html) Here we will only look at a few key ones
    
* `ARRAY.reshape(a, newshape)`, `ARRAY.flatten(.)`, `np.transpose(a)`
* `np.concatenate(.)`, `np.hstack(.)`, `np.c_`, `np.vstack(.)`, `np.r_`, `np.split(.)`
* `ARRAY.sort(.)`, `ARRAY.argsort(.)`, `ARRAY.searchsorted(.)`

In [97]:
a = np.arange(0,10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [101]:
b = a.reshape((5,2))
b

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [103]:
c = b.reshape((2,5))
c

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

If you want to "flatten" a multidimensional array to a 1D array can use `ARRAY.reshape(-1)` or `ARRAY.flatten()`

In [109]:
print(f"flatten\n{c.flatten()}\n")
print(f"reshape\n{c.reshape(-1)}\n")

flatten
[0 1 2 3 4 5 6 7 8 9]

reshape
[0 1 2 3 4 5 6 7 8 9]



In [53]:
# np.concatenate, np.vstack and np.r_ all allow you to combine arrays "vertically" (along axis 0)

a = np.ones((3, 2))
b = np.ones((3, 2))

print(f"concatenate\n {np.concatenate([a, b], axis=0)}\n")
print(f"hstack\n{np.vstack([a, b])}\n")
print(f"r_\n{np.r_[a, b]}\n")

concatenate
 [[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]

hstack
[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]

c_
[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]



In [50]:
# np.concatenate, np.hstack, and np.c_ all allow you to combine arrays "horizontally" (along axis 1)

a = np.ones((3, 2))
b = np.ones((3, 2))

print(f"concatenate\n {np.concatenate([a, b], axis=1)}\n")
print(f"hstack\n{np.hstack([a, b])}\n")
print(f"c_\n{np.c_[a, b]}\n")

concatenate
 [[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

hstack
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

c_
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]



In [None]:
# EXERCISE
# Build a 2D array where the first row is the first 10 multiples of 1, 
# the second is the first 10 multiples of 2, 
# ..., and the last low is the first 10 multiples of 10. 










To sort arrays use the `ARRAY.sort(ARRAY)`, `ARRAY.argsort(ARRAY)`, `ARRAY.searchsorted(ARRAY, VALUE)` commands. `ARRAY.sort(ARRAY)` **does not return anything**. It simply sorts and resaves a a sorted version of an array (i.e. it acts "in-place")

In [118]:
a = np.array([3, 7, 4, 6, 1, 0, 8, 9, 3])

a.sort()
a

array([0, 1, 3, 3, 4, 6, 7, 8, 9])

There is a command to ranomly shuffle arrays that operates in the same way. 

In [120]:
# Try re-running this to see that `a` is shuffled differently each time. 

np.random.shuffle(a)
a

array([7, 8, 9, 6, 3, 4, 1, 0, 3])

`np.argsort(ARRAY)` returns a list of indicies that _would_ sort `ARRAY`

In [123]:
a = 10*np.arange(0,9)
np.random.shuffle(a)
a

array([70, 50, 20, 60, 30, 40,  0, 80, 10])

In [None]:
# EXCERSISE
# Using argsort, sort the names accoring to their birth month. 

names = ['Anne', 'Abel', 'Adam', 'Ali', 'Allison']
birth_months = [2, 7, 1, 11, 5]



### 2.5.5 Dealing with NaN <a name="NaN">

### 2.5.6 Univerate operations <a name="univariate-ops">

### 2.5.7 Bivariate operations <a name="bivariate-ops">

### 2.5.8 Subtleties: views vs copies <a name="subtleties">

* `np.copy(.)`
* `np.split(.)` vs `np.array_spit(.)`

### 2.5.8 `numpy` exercises <a name="numpy-exercises">

In [None]:
# create three matricies and conatenate

## 2.6 Introduction to `scipy` <a name="scipy-intro">

### 2.6.1 `scipy` exercises <a name="scipy-exercises">