# A recap of what we've seen so far


## Variables

* Numbers (integer and floats)


In [None]:
# Defining numbers
x = 5
y = 2.5

# Operations on numbers
z = x + y
w = z**2

# other functions defined in math module (also in numpy)
import math
g = math.exp(w)
h = math.sin(math.sqrt(g))


* Boolean (True and False values)

In [None]:
# Defining booleans using True and False
b1 = True
b2 = False

# Defining booleans using conditional expressions:
b3 = 5 > 6
print("b3 = " + str(b3))

# Combining conditional expressions using logical operators
b4 = (x > y) or (z == 10)
print("b4 = " + str(b4))

b5 = (x > y) and (z == 10)
print("b5 = " + str(b5))

* Strings

In [None]:
msg = "Hello World"
print(msg)

# concatenating strings with +
msg1 = "This is"
msg2 = "a string"
msg3 = msg1 + " " + msg2
print(msg3)

# repeating strings with *
msg4 = "swo"
print(msg4*5)

### Collections of values

* `numpy` arrays and matrices

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# a numpy array from a list of numbers
a = np.array([1, 2, 3, 4, 5])
print("a = " + str(a))

In [None]:
# a numpy array using some built-in functions

# numbers in [0, 10), stepping by 2
b = np.arange(0, 10, 2)
print("b = " + str(b))

# 3 equally spaced numbers in [0, 10]
c = np.linspace(0, 10, 3)
print("c = " + str(c))

In [None]:
# 3 logarithmically spaced numbers in [10^0, 10^2]
d = np.logspace(0, 2, 3)
print("d = " + str(d))

In [None]:
# 100 samples from N(5, 2^2)
samples = np.random.normal(5, 2, 100)
print(samples)
plt.hist(samples)

In [None]:
# 20 uniform samples between 0 and 1
samples = np.random.random(20)
print(samples)

In [None]:
# 100 uniform samples between 4 and 10
samples = (10-4)*np.random.random(100) + 4
print(samples)
plt.hist(samples)

In [None]:
# A 5x2 random matrix with iid N(0, 1) entries
A = np.random.normal(0, 1, [5,2])
print(A)

In [None]:
# An 7 x 3 all zeros matrix (That we may fill up later)
Z = np.zeros([7,3])
print(Z)

In [None]:
# numpy array indexing

# set the 0-th column to all 1s
Z[:, 0] = 1

# set the 1-st column to random numbers
Z[:, 1] = np.random.random(7)

# set the 2-nd column to be the 0-th column + 2 times the 1-st column
Z[:, 2] = Z[:,0] + 2*Z[:, 1]

# Set the 3-rd row to all zeros
Z[3,:] = np.zeros([1, 3])

# Set the element in the 5 row, 2nd column to -10
Z[5, 2] = -10

print(Z)


### To Do (Data Types):
* Other collections including Lists, Dictionaries, Tuples and Sets
* List comprehensions
* Classes

## Functions
* Reusable blocks of code
* Defining functions using the `def` keyword and supplying an argument list
* Returning a value using the `return` keyword
* Specifying optional arguments and default values

In [None]:
def gaussian_rbf(x, mu = 0, sigma = 1):
    return np.exp(-(x-mu)**2/(2.0*sigma**2))

In [None]:
x = np.linspace(-10, 10, 100)
y = gaussian_rbf(x)
plt.plot(x, y)

In [None]:
y = gaussian_rbf(x, mu = -5)
plt.plot(x, y)

In [None]:
y = gaussian_rbf(x, sigma = 3)
plt.plot(x, y)

In [None]:
y = gaussian_rbf(x, mu = 5, sigma = 3)
plt.plot(x, y)

### To Do (Functions):
* Other ways to return values
* Function scope
* Recursion
* Parallelization

## Control Structures

### Conditional execution
We saw how to use boolean expressions to determine whether to execute certain parts of our code using the `if ... elif ... else` structure

```
if (condition):
...
elif (another condition):
..
elif (another condition):
..
else:
...
```

In [None]:
# a plain if statement (run several times)
p = np.random.random()

if(p < 0.25):
    print("p = " + str(p))

In [None]:
# an if-else statement - a biased coin flip
p = np.random.random()

if(p < 0.45):
    print("Heads")
else:
    print("Tails")

In [None]:
# an if ... elif ... else statement - a biased die roll

p = np.random.random()

if(p < 0.1):
    print("1")
elif(p < 0.2):
    print("2")
elif(p < 0.3):
    print("3")
elif(p < 0.4):
    print("4")
elif(p < 0.5):
    print("5")
else:
    print("6")

### Exercise (1D random walk)

Imagine a particle on the real number line at position $x$. Every second, it takes a step
$$ x \rightarrow x \pm s$$
for some step size $s$. The sign of the step (i.e. the $\pm$ above) determines the direction the particle moves. If it moves (for example) like:
$$ x \rightarrow x + s$$
this means it moves to the *right* by an amount $s$. If instead, it moves like:
$$ x \rightarrow x - s$$
it moves to the **left**. 

At each step, we'll think of the direction (either left or right) taken as a random variable: with some probability $p$ it steps to the right, and with probability $1-p$ it steps to the left. 

Write a function `random_walk` to perform a 1D random walk, starting at the position $x = 0$. The function should take the following arguments as input:
* the number of steps to take, $n$
* an optional step size $s$, with default value 1
* an optional number $p$ describing the probability that the particle will move right versus left, with default value of 0.5

The function should return the sequence of positions $(x_i)$ the particle visits during its random walk.

To help with this, define a function `bernoulli_rv` takes in a probability $p$ and returns a 1 with probability $p$ and a -1 with probability $1-p$

In [None]:
def bernoulli_rv(p):
    # return 1 with probability p, -1 with probability 1-p
    
        
def random_walk(n, s=1, p = 0.5):
    positions = np.zeros(n)
    
    # calculate the position of the particle at each of the n steps
               
    return positions


pos = random_walk(1000, p = 0.5)
plt.plot(pos)
plt.xlabel('Time')
plt.ylabel('Position')


### Exercise
Write a function `sample_rw_params` that performs $m$ random walks, each using the default values for $p$ and $s$. For each of the random walks, calculate the save the final position, then after the $m$ random walks, calculate the mean and variance of this final position, and return it as a pair of numbers. The function should accept as input arguments

1. $m$ the number of random walks to perform
2. $n$ the number of steps each random walk should take
3. `plot_hist`, an optional Boolean (default: `False`) that indicates if we should plot the histogram of the final position after the $m$ random walks


In [None]:
def sample_rw_params(m, n, plot_hist=False):
    
    # Calculate mu and sig2 here
    
    return mu, sig2


In [None]:
mu, sig2 = sample_rw_params(500, 1000, plot_hist=True)

# Python Lists and List Comprehensions

* Python lists are **first-class data types** (i.e. they are part of the Python language, and are not defined in some external library like e.g. `numpy` arrays).

* Python lists can be lists of Numbers (like `numpy` arrays), Booleans, Strings, or any other data type. Because of this we think of list as a **Collection** data type.



In [None]:
# A list of numbers
number_list = [0, 10, 20, 30, 40, 50]

# A list of strings
string_list = ["A", "B", "CDEFG"]

# A list of lists
list_list = [number_list, string_list]
print(list_list)

# An empty list
empty_list = []
print(empty_list)

In [None]:
# A list of numpy arrays
a1 = np.random.random(5)
a2 = np.random.random(10)
a3 = np.random.random([5, 10])
np_list = [a1, a2, a3]
print(np_list)

Like 1D `numpy` arrays, we can index into a list to extract individual elements:

In [None]:
z = number_list[2]
print("2-nd element in number_list = " + str(z))

Like 1D `numpy` arrays, lists are **0-indexed** meaning, if a list has e.g. 6 entries, they are indexed by 0, 1, 2, ..., 5, respectively.

In [None]:
print(number_list[0])

In [None]:
# This will result in an error
print(number_list[6])

Like 1D `numpy` arrays, we can use "slice" notation to extract a range of list entries.

In [None]:
# extract the elements at indices 0, 1, 2 (but not 3)
number_list[0:3]

In [None]:
# extract the elements at indices 3, 4, 5 (but not 6, since there is no element at index 6)
number_list[3:6]

In general, `number_list[m:n]` will extract `n-m` elements, starting and including the element at position `m`.

All of this applies to any Python list:

In [None]:
list_list[1]

In [None]:
string_list[1:3]

In [None]:
list_2d = [ [1, 2, 3], [4, 5, 6], [7, 8, 9]]
list_2d[0:2]

Unlike `numpy` 2D arrays, we can't use the notation `arr[i,j]` to get the `j`-th element of the `i`-th list of a list of lists.

In [None]:
# this results in an error
list_2d[0, 2]

In [None]:
# instead you have to use
list_2d[0][2]

In [None]:
# that is get the 0-th list of list_2d, then get the 2-nd element of that list
list_2d[0]

## Adding and removing elements to a list

* Given a list, you can add an element to the end of the list using the `.append()` function:

In [None]:
number_list.append(10)
print(number_list)

* You can remove an element using the `.remove(x)` function. It will remove the first instance of `x` in the list.

In [None]:
number_list.remove(10)
print(number_list)

* You can remove and return an element at index i using the `.pop(i)` function. If you don't specify `i`, it will remove the last element, so in that sense, it's the opposite of the `.append()` function.

In [None]:
el = number_list.pop()
print(number_list)

## List comprehensions

List comprehensions are a powerful way of creating lists. Consider the following `for`-loop to calculate the squares of all numbers between 0 and 9:

In [None]:
squares = []
for i in range(10):
    squares.append(i**2)
print(squares)

An alternative way to do this is to use **list comprehensions**:

In [None]:
squares2 = [i**2 for i in range(10)]
print(squares2)

List comprehensions are a consise way of building up lists. Because of its consiseness, it is considered more "**Pythonic**".

List comprehensions can be nested, like nested for loops. Consider the following nested loop to create a list of lists:

In [None]:
products = []
for i in range(4):
    products_i = []
    for j in range(4):
        products_i.append(i*j)
    products.append(products_i)
    
print(products)

In [None]:
products2 = [ [i*j for j in range(4)] for i in range(4)]
print(products2)

The elements that go into the list (e.g. `i**2` or `i*j`) can be any valid python code. Typically this code will use `i` or `j` in some way.

The use of `range(4)` can be replaced by anything that can be iterated over. In addition to ranges, you could use other lists or even `numpy` arrays.

### Exercise

For 10 equally spaced values $x_i$ between 0 and $2\pi$, form the a list of lists representing the design matrix

$$\Phi_{i,j} = \sin(j\cdot x_i) $$

for $j = 0, 1, ..., 3.$

That is your list `Phi_list` should be a list-of-list where the `j`-th entry of the `i`-th list is `Phi_list[i][j] = sin(j*x_i)`.

Because `numpy` arrays can be built from Python arrays, list comprehensions can be used to consisely specify `numpy` arrays.

In [None]:
Phi = np.array(Phi_list)
print(Phi)

Consider the following loop:

In [None]:
bad_words = ['a', 'f', 's']

user_post = ['i', 'f', 'b', 'c', 's', 'd', 'a']

filtered_post = []
for word in user_post:
    if word not in bad_words:
        filtered_post.append(word)
        
print(filtered_post)


We can construct `filtered_post` with a list comprehension using the **`if` clause**

In [None]:
filtered_post = [word for word in user_post if word not in bad_words]

print(filtered_post)

The `if` clause in a list comprehension specifies a filter on the items in `user_post` to include in the final list. Here we're saying to add `word` to the list for the `word` in `user_post` that satisfies the condition `not in bad_words`.

In general list comprehensions take the form:
`my_list = [ f(x) for x in A if c(x) ]`

where `f(x)` is some transformation or other function that depends on `x`, `A` is the set of all `x` that can possibly be included in the list, and `c(x)` is some boolean expression determines whether to include `x` in the list.

### Exercise
Load the dataset in `lec6.txt`. This dataset contains rows with missing data - some of the entries will be *Not-a-Number* (`NaN`). Use a list comprehension to filter out the data that contains missing entries. For a `numpy` array, you can use the condition `np.any(np.isnan(x))` check if any entries are `NaN`.

### The 1D random walk using list comprehensions

If we notice that the final position of the 1D random walk after $n$ steps is the sum of $n$ Bernoulli random variables that take on values $\left\{-s, s\right\}$, we can calculate the distribution of final positions after $n$ steps using list comprehensions.

In [None]:
n = 1000
s = 1
m = 500

# Perform m random walks, each having n steps
R = [ np.sum( [s*bernoulli_rv(0.5) for j in range(n)] ) for i in range(m)]

print("mean = " + str(np.mean(R)))
print("var = " + str(np.var(R)))
plt.hist(R)

This actually consists of two list comprehensions.

The inner comprehension:

```
[s*bernoulli_rv(0.5) for j in range(n)]
```

creates a list of $n$ numbers in $\left\{-s, s\right\}$. 

We then take the sum of these values to get the final position:

```
np.sum( [s*bernoulli_rv(0.5) for j in range(n)] )
```

This is the result of a single random walk. We want to do `m` of them, so we form a list of `m` such sums:

```
[ np.sum( [s*bernoulli_rv(0.5) for j in range(n)] ) for i in range(m)]
```