# Exercise 2, Introduction to Python, part 2: variable types, syntax, and logic

Last class we got out feet wet. In class today, we'll work thorugh most of the key basics along with introducing our first two packages for data science: `numpy` and `scipy`.

We are essentially covering the core of the Python standard Library (this is what you are installing when you "install python"). [The doumentaion for all the tools included can be found here](https://docs.python.org/3.8/library/index.html). 

It's also worth noting that python is very forgiving for a lot of user decisions relating to formatting, naming, etc. but the community has decided on a set of standards documented in [Python Enhabncement Proposal 8 (aka PEP 8)](https://www.python.org/dev/peps/pep-0008/). It's worth taking a read, but there are a few things important conventions worth knowing:

1. names should reflect purpose not implimentation
2. indents are made with 4 spaces (not 2 spaces, not tabs)
3. variables, functions should be named all_lower_case_with_underscores
4. class objects should be named in CamelCase
5. packages should generally have short onewordlowercase names

In general, _any_ programming requires constantly checking the docs associated with the code. I usually keep tabs open to the documentation I need. For quick reference these can be found under `Help` in the menubar. 

In Jupyter you can also access the documentation for a function directly when you have typed by pressing `Shift-TAB` or opening the `Conetxtual Help` window. 

## Table of Contents

* [2.1 Python variable types](#python_types)
    * [Introducing sets](#sets)
    * [Introducing formatted strings](#fstrings)
    * [+= syntax](#plus-equals)
    * [Type Exercises](#type-ex)
* [2.2 Function syntax](#functions)
* [2.3 Control](#control)
    * [`if` problems](#if)
    * [`for` loop problems](#for)
    * [`while` loop problems](#while)
* [2.4 Accessing packages](#packages)
    * [e.g. `itertools`](#itertools)
* [2.5 Introducing `numpy`](#intro-numpy)
    * [Creating 1D `numpy` arrays](#create-arrays)
    * [Basic operations with numbers](#numbers)
    * [Array properties and dtype](#dtype)
    * [Manipulating arrays](#manipulating)
    * [Dealing with NaNs](#NaN)
    * [Univariate operations with arrays](#univariate-ops)
    * [Bivariate operations with arrays](#bivariate-ops) 
    * [Subtleties: views vs copies](#subtleties)
    * [`numpy` exercises](#numpy-exercises) 
* [2.6 Introducing `scipy`](#intro-scipy)
    * [`scipy` exercises](#scipy-exercises)

## 2.1 Python variable types <a name="python_types"></a>
    
Last class, we introduced the fundamental variable types that python uses

* Numbers
* Booleans (True / False)
* Strings (and formatted strings)
* Tuples
* Lists
* Dictionaries

Today we will review these and add one more

* Sets

### 2.1.1 Introducing sets <a name="sets"></a>

There is one variable type built into Python that we did not discuss: Sets. Like a list or tuple it is a combination of other variables, but unlike these other types there is

1. There is no notion of the order (so no subscripting aka indexing)
2. It comes with functions that make sense for sets (intersection, union, etc)
    
Sets can be created from lists or tuples using the `set(.)` command.

In [None]:
a = [1,1,1,2,3,2,5,7,4,7,7]

b = set(a)
b

In [None]:
b[0]

or you can create them directly using curly brackets `{.}`

In [None]:
# EXERCISE

c = {3, 6, 7, 9, 10}

b.??????(c)

In [None]:
# EXERCISE
# What can we do with another set `c`? 
# Use jupyter's autofill command `TAB` to find out what functions there are. (Type `b.` and then 
# press `TAB`)

b.??????(c)

In [None]:
# EXERCISE
# Note that many functions have FUNCTION_update version. What is the difference?
# Play around and find out. Make sure to check what `b` and `c` equal after you run your function




In [None]:
# EXERCISE
# Play around with defining and manipulating sets here using autocomplete. 


b.

### 2.1.2 Introducting formatted strings <a name="fstring"></a>

Very often, you want to be able to create strings that incorperates information from varaibles (for printing for example). These are called formatted strings and Python provides three ways of creating them.

1. % syntax
2. STRING.format syntax
3. f-string syntax

We're going to ignore (1) since this is mostly depreciated this point, and we're going to ignore (2) since (3) is way easier.

#### f-string syntax

Basically, you can insert the string version of variables in a string if you prepend`f` to the string and use `{ }` to surround the variable name. 

_Read the [python docs of string formatting](https://docs.python.org/3/tutorial/inputoutput.html#formatted-string-literals)_

In [None]:
age = 2

f"Ayla is {age} years old"

You can even do calculations using python code

In [None]:
f"Ayla is {age} years old, but if you ask her she says she is almost {age+1}"

You can also ask python to format the way variables are included. For example, you can add padding 

In [None]:
f"Ayla is {age:10} years old"

or specify the precision of a number

In [None]:
from math import pi

f"A circle's circumference divided by its diameter is around {pi:.3} but much closer to {pi:.20}"

There is a **lot** more you can ask Python to do in how it actually converts the variable to a string.To understand more read on [the format specifier mini-language](https://docs.python.org/3/library/string.html#formatspec). 

### <a name="plus-equals">2.1.3 += Syntax</a>

Very often, you want to modify a variable using an arithmatic operator (like `+`, `*`) etc. and resave that variable under the same name. You could write 

In [None]:
a = 5

a = a + 2
a

or you could use the syntax `+=`

In [None]:
a = 5

a += 2
a

In [None]:
a = 5

# Try seeing what other `?=` operators there are.

a ?= 2

### 2.1.4 Variable type excrcises <a name="type-ex">

In [None]:
# EXERCISE
# Find a list method to sort the list below

r = [1, 5, 7, 1, 3, 5, 7]







In [None]:
# EXERCISE
# find a string method that you can use to extract the file extension from the file name below. 

file_name = 'numpy_rocks.jpeg'






## 2.2 Function syntax <a name="function"></a>

## 2.3 Control <a name="control"></a>

### 2.3.1 `if` exercises <a name="if"></a>

In [None]:
# PROBLEM
# Write a script below that prints "even" if `a` is even and "odd" if it is odd. 
# Hint: use the modulo operator %

a = 5







In [None]:
# PROBLEM
# Write a Python program to print the mean & median of three given numbers, `a`, `b`, `c`. 
# Have it print the output nicely using f-strings. 


a = 3
b = 7
c = 4





### 2.3.2 `for` loop exercises <a name="for"></a>

In [None]:
# EXERCISE 
# Write a Python program to print each element of the list and how often it appears
# Hint: lists have methods of counting counting the number of items. Think about how sets might help you here.  

r = [1, 5, 7, 1, 3, 5, 7]






In [None]:
# EXERCISE 
# Use list comprehension syntax to construct a list that squares each element of the list `nums` below

nums = [-2.4, 1.2, 5.7, -5, 4.3]



In [None]:
# EXERCISE 
# Use list comprehension syntax with if to construct a list that squares each positive element of the list `nums` below
# and ignores all negative items

nums = [-2.4, 1.2, 5.7, -5, 4.3]



In [None]:
# PROBLEM 
# Write a Python program to print the mean, median, and mode of a list `nums` of arbitrary length.
# Have it print the output nicely using f-strings. 

nums = [1, 1, 3, 5.7, 7, 8.3, 4.6, 5] 






### 2.3.3 `while` loop exerscises <a name="while"></a>

In [None]:
# PROBLEM 
# Write a while loop that spits out Fibonacci sequence up until 100. 








## 2.4 Accessing packages <a name="packages"></a>

You can acess packages using 
* `import...`: to import a package under its original name 
* `import...as...`: to importpackage under a new name
* `from...import...`: to import particular functions from a package into your namespace without importing the whole package. 

Technically, there is one more way but the Python community will be very, very angry with you if you do this (and you **don't** want to see the Python community get angry).

* `from...import *` to import all functions from a package into your namespace.
    
The Python distribution comes with a lot of submoidules. To get used to accessing packages and looking things up we'll play with two.

## 2.4.1 e.g. itertools <a name="itertools"></a>

This is a package for common tasks one enounters when they want to iterate over a collection of objects. 

In [None]:
# EXERCISE 
# Find a function in the itertools package that allows you to iterate over every combination of one element in `A` and one in `B` 
# (e.g. (1, 'y')). Import it and use list comprehension to construct a list of all combinations.

A = [1, 2, 3]
B = ['x', 'y', 'z']







## 2.5 Introduction to `numpy` <a name="intro-numpy"></a>
    
`numpy` is the heart of using python for fast, efficient data processing. Python has many advantages, but it was not optimized for data manipulation - instead it was optimized for ease of use and flexiblity. Specifically, the way that python stores sets is not efficient for computations. Each element of a list is really a "pointer" to a memeory location that contains the object in question. These locations could be all over the place so as you can imagine if you need to conduct operations that require manipulating many elements at the same time this will be very slow as the program must constantly search all over memory to complete the task. 
    
`numpy` introduces the notion of an `array` which stores elements sequentially in memory for fast processing. These arrays are typically numbers but they can contain any other type of python object and even custom types (see [dtype section](#dtype)). Furthermore, these arrays can be _multidimensional_ (e.g. 2D like a matrix, 3D etc.) 
    
In addition to `arrays` numpy comes with efficient versions of many basic mathematical expressions (i.e. `np.cos`, `np.exp`, etc.) and constants (`np.pi` and `np.e`)
    
    
Much of what is below comes from existing websites
* [Numpy Beginners Tutorial](https://numpy.org/doc/stable/user/absolute_beginners.html)
* [Numpy Tutorial on Linear Algebra](https://numpy.org/doc/stable/user/tutorial-svd.html)
    
If you are familiar with MatLab you may want to look at [NumPy for MatLab users](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html).
    
Typically when numpy is imported it is renamed `np` so that you aren't stuck typing `numpy` every time you want to do anything.

In [None]:
# Let's go!

import numpy as np

### 2.5.1 Creating `numpy` arrays <a name="create-arrays"></a>
    
There are many ways to create numpy arrays:
* `np.array(.)` 
* `np.zeros(.)` 
* `np.ones(.)`
* `np.full(.)`
* `np.empty(.)` 
* `np.arange(.)`
* `np.linspace(.)`
    
`np.array(.)` converts another list object to an array

In [None]:
# Creating an array from a list

a = np.array([1, 2, 3, 4, 5, 6])
a

In [None]:
# They can be indexed just like lists

print(a[0:4])
print(a[-1])

They can also be indexed in really useful ways that lists can't. For example, if you wanted to spit out the 1st, 3rd, and 4th elements of a list you might try

In [None]:
# Oops

a = [10,20,30,40,50]
a[[1,3,4]]

In [None]:
# But if you make `a` an array....

a = np.array([10,20,30,40,50])
a[[1,3,4]]

The `==` operator also acts _on each element_ of the array **and** the array subscripting allows a Boolean list of equal length to be used. Putting these two facts together you can subselect elements of the array using conditions. 

In [None]:
print(a > 20)
print(a[a>20])

If you want to know is _any_ elements of the array or _all_ elements of the array meet a critera use the `.any()` or `.all()` methods. 

In [None]:
print((a > 20).any())
print((a > 20).all())

In [None]:
# You can use a list of lists to create a multidimensional array
# NOTE: each of the sub-lists MUST be the same length.

a = np.array([[1, 2, 3], [10, 12, 13]])
a

In [None]:
# Indexing uses two coordinates ARRAY[ROW, COL]
# can also ask numpy to spit out an entire row or column using `:`

print(a[0,1])
print(a[1,:])
print(a[:,2])

`np.ones(.)`, `np.zeros(.)`, `np.empty(.)` take a number or a tuple to create an element 

In [None]:
# ones and zeros do exactly what you might expect
np.ones(6)

In [None]:
np.ones((3,2))

`np.empty(.)` is a bit more subtle. Use this command if you know you'll be replacing the elements. It just crates an array without clearning memory so the numbers a garbage. It's faster than the other operations. 

In [None]:
np.empty(10)

`np.full(., .)` creates arrays filled with a specific values. 

In [None]:
np.full(10, 3.14)

In [None]:
# All of these functions have a function_like version. Use jupyter's `Shift+TAB` or `Contextual Help` fratures to figure out what they do. 

np.??????

You can also generate sequential numbers one of two ways: `np.arange(.)`, `np.linspace(.)`. `np.arange(.)` works just like range but it creates a numpy array. Just like range the initial number is inlcuded but final number is not. 

In [None]:
# np.arange(START, STOP, STEP)

np.arange(1,15,1)

In [None]:
np.arange(1,15,2)

`np.linspace(START, END, LENGTH)` is similar but you specify how long you'd like the array to be with the third argument. Also note that the linspace command creates an array that **includes** the initial and final values. 

In [None]:
np.linspace(1, 15, 100)

`numpy`'s random subpackage also includes ways of creating arrays with random values. 

* `np.random.rand(SHAPE)`: create an array of numbers randomly drawn from the uniform distribution between 0 and 1
* `np.radom.normal(MEAN, STD, SIZE)`: create an array of numbers ranomdly drawn from the normal distribution with mean = MEAN and standard deviation = STD

In [None]:
np.random.rand(10)

In [None]:
np.random.normal(1, 1, 10)

In [None]:
# EXERCISE
# Create some arrays and use the Boolean subscripting (e.g. a[a >20]) to select elements of the array. 






### 2.5.2 Basic operations with numbers <a name="numbers"></a>
    
You can do standard arithmatic with numbers and arrays (`+`, `*`, etc.). This applies the operation pointwise across each element of the array 

In [None]:
10*np.arange(1,10)

In [None]:
10*np.arange(1,10) + 1

You can apply any function you want across an array. 

In [None]:
def squareer(x):
    return x**2

a = np.arange(1,100)
squareer(a)

### 2.5.3 Array properties and dtype <a name="dtype">

To find the shape and number of dimensions of an array you can use the `ARRAY.shape` and `ARRAY.ndim` variables. 

In [None]:
a = np.empty((10,5))

print(a.shape)
print(a.ndim)

In order to store arrays efficiently, numpy needs to know something about what type of data is in an array. This is called the array's `dtype`. To access an array's dtype use the `ARRAY.dtype` variable. 

In [None]:
a.dtype

In [None]:
# EXERCISE
# Try to interpret the number in the dtype for the array below

np.array(['hi', 'there'])

### 2.5.4 Manuplulating arrays <a name="manipulating">
    
Numpy provides **a lot** of ways to manipulate arrays. [Check out the API reference here.](https://numpy.org/devdocs/reference/routines.array-manipulation.html) Here we will only look at a few key ones
    
* `ARRAY.reshape(a, newshape)`, `ARRAY.flatten(.)`, `np.transpose(a)`
* `np.concatenate(.)`, `np.hstack(.)`, `np.c_`, `np.vstack(.)`, `np.r_`, `np.split(.)`
* `ARRAY.sort(.)`, `ARRAY.argsort(.)`, `ARRAY.searchsorted(.)`

In [None]:
a = np.arange(0,10)
a

In [None]:
b = a.reshape((5,2))
b

In [None]:
c = b.reshape((2,5))
c

If you want to "flatten" a multidimensional array to a 1D array can use `ARRAY.reshape(-1)` or `ARRAY.flatten()`

In [None]:
print(f"flatten\n{c.flatten()}\n")
print(f"reshape\n{c.reshape(-1)}\n")

In [None]:
# np.concatenate, np.vstack and np.r_ all allow you to combine arrays "vertically" (along axis 0)

a = np.ones((3, 2))
b = np.ones((3, 2))

print(f"concatenate\n {np.concatenate([a, b], axis=0)}\n")
print(f"hstack\n{np.vstack([a, b])}\n")
print(f"r_\n{np.r_[a, b]}\n")

In [None]:
# np.concatenate, np.hstack, and np.c_ all allow you to combine arrays "horizontally" (along axis 1)

a = np.ones((3, 2))
b = np.ones((3, 2))

print(f"concatenate\n {np.concatenate([a, b], axis=1)}\n")
print(f"hstack\n{np.hstack([a, b])}\n")
print(f"c_\n{np.c_[a, b]}\n")

In [None]:
# EXERCISE
# Build a 2D array where the first row is the first 10 multiples of 1, 
# the second is the first 10 multiples of 2, 
# ..., and the last row is the first 10 multiples of 5. 










To sort arrays use the `ARRAY.sort(ARRAY)`, `ARRAY.argsort(ARRAY)`, `ARRAY.searchsorted(ARRAY, VALUE)` commands. `ARRAY.sort(ARRAY)` **does not return anything**. It simply sorts and resaves a a sorted version of an array (i.e. it acts "in-place")

In [None]:
a = np.array([3, 7, 4, 6, 1, 0, 8, 9, 3])

a.sort()
a

There is a command to ranomly shuffle arrays that operates in the same way. 

In [None]:
# Try re-running this to see that `a` is shuffled differently each time. 

np.random.shuffle(a)
a

`np.argsort(ARRAY)` returns a list of indicies that _would_ sort `ARRAY`

In [None]:
a = 10*np.arange(0,9)
np.random.shuffle(a)
a

In [None]:
# EXCERSISE
# Using argsort, sort the names accoring to their birth month. 

names = ['Anne', 'Abel', 'Adam', 'Ali', 'Allison']
birth_months = [2, 7, 1, 11, 5]





### 2.5.5 Dealing with NaN <a name="NaN">

`numpy` includes functionality for missing data via `np.nan` (Not a number) and extremely large numebrs with `np.Infinity`. Arrays can include these values along with other objects. Numpy also includes many functions for dealing with arrays that have NaNs:

* `p.nan_to_num(ARRAY)`: replaces np.nans with 0 and np.Infinity with a large number. (Can change the value with nan= keyword argument)
* `np.nanmean(ARRAY)`: mean ignoring nans



In [None]:
a = np.array([1.2, 3.4, np.nan, 10.1])

In [None]:
np.nan_to_num(a)

In [None]:
np.nan_to_num(a, nan=100)

In [None]:
# taking the regular means results in a nan value

a.mean()

In [None]:
# using nanmean ignoes the np.NaN. 

np.nanmean(a)

### 2.5.6 Univerate operations <a name="univariate-ops">

Numpy provides a **lot** of functions. Go to [the numpy reference](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs) to see what is available. Most important ones are `np.mean`, `np.sum`, `np.maximum`, `np.minimum`, etc. 

**EXERCISE**

Find the values of 

$$2\cos(x/\pi +3) -5$$

for 

$$ x = 0, .5, 1.0, 1.5, \dots, 12$$

In [None]:
# EXERCISE
# Find the maximum value of each column of the array below. Also find the maximum of each row

A = np.array([[1, 3, 4, 6, 2],
             [ 4, 10, -5, 3, 4], 
             [-1, 2, 4, 1, 7]])









### 2.5.7 Bivariate operations <a name="bivariate-ops">

`numpy` also provides a number of ways to combine arrays. Standard operators like `*` and `+` operate _pointwise_. In other words the two arrays must bethe same size and these operators combine each entry. 

In [None]:
# EXERCISE
# Define two 2D arrays of the same size and combine them using pointwise operators









Of course, very often we want to do _matrix_ multiplication. This is represented in Python 3 by the `@` symbol. 

In [None]:
a = np.random.rand(4,3)
a

In [None]:
b = np.arange(1,4)
b

In [None]:
a @ b

### 2.5.8 Subtleties: views vs copies <a name="subtleties">

* `np.copy(.)`
* `np.split(.)` vs `np.array_spit(.)`

### 2.5.8 `numpy` exercises <a name="numpy-exercises">

In [None]:
# create three matricies and conatenate

## 2.6 Introduction to `scipy` <a name="scipy-intro">
    
`scipy` is a package which contains a number of convinience functions for mathematics and data analysis
    
* **scipy.cluster**: Vector quantization / Kmeans
* **scipy.constants**: Physical and mathematical constants
* **scipy.fftpack**: Fourier transform
* **scipy.integrate**: Integration routines
* **scipy.interpolate**: Interpolation
* **scipy.io**: Data input and output
* **scipy.linalg**: Linear algebra routines
* **scipy.ndimage**: n-dimensional image package
* **scipy.odr**: Orthogonal distance regression
* **scipy.optimize**: Optimization
* **scipy.signal**: Signal processing
* **scipy.sparse**: Sparse matrices
* **scipy.spatial**: Spatial data structures and algorithms
* **scipy.special**: Any special mathematical functions
* **scipy.stats**: Statistics

In [None]:
import scipy as sp

### 2.6.1 `scipy` exercises <a name="scipy-exercises">

In [None]:
# EXCERSISE
# scipy show and tell: Pick a submodule of interest, use the jupyter autocomplete functionality or the scipy docs to find a 
# function of interest udnerstand its arguments and try to do a calculation using that function. Play around with the arguments.  







