In [1]:
import numpy as np
import time
import os

## Lists

A list is similar to an array in MATLAB.  A list can stores items with different data types.  You should know the following list operations:

1.  Indexing into a list
2.  Inserting into a list
3.  List comprehension
4.  Slicing

In [16]:
# Create a list of integers
x = [1, 2, 3, 4]

# Index into the list
print("The element at position 3 is: {}".format(x[2]))

# Or from the end
print("The last element is: {}".format(x[-1]))

# Append to the list
x.append(3) # using the inplace append function
print(x)
print(x + ['oh', 'my', 'god']) # or by performing list addition

The element at position 3 is: 3
The last element is: 4
[1, 2, 3, 4, 3]
[1, 2, 3, 4, 3, 'oh', 'my', 'god']


In [33]:
# Create a list of consecutive integers
a = [1, 2, 3, 4, 5]

# or use range(start=0, stop, step = 1)
a = list(range(1, 6))

# Now create another list of the same length
b = [3, 5, 6, 7, 3]

# Add the corresponding elements in each list
# Option 1: The wrong way
sum = a + b

print("a = {}".format(a))
print("b = {}".format(b))
print("a + b = {}".format(sum))

a = [1, 2, 3, 4, 5]
b = [3, 5, 6, 7, 3]
a + b = [1, 2, 3, 4, 5, 3, 5, 6, 7, 3]


In [23]:
# Option 2:  The lengthy way

# Create a function that sums two lists
def sum(a, b):
    # Use an assertion statement to enforce constraints on inputs
    assert(len(a) == len(b)), "Lengths don't match!"
    s = []
    for i in range(len(a)):
        s.append(a[i] + b[i])
    return s

print("a = {}".format(a))
print("b = {}".format(b))
print("a + b = {}".format(sum(a, b)))

# Now let's change a to [1, 2, 3, 4, 5, 6]
# a = list(range(1, 6))
# print("a = {}".format(a))
# print("b = {}".format(b))
# print("a + b = {}".format(sum(a, b)))

a = [1, 2, 3, 4, 5]
b = [3, 5, 6, 7, 3]
a + b = [4, 7, 9, 11, 8]


In [31]:
# Option 3: The slightly cooler way

# Use list comprehension

a = [i for i in range(1, 6)]
b = [3, 5, 6, 7, 3]

# Use the zip function to zip together many objects
sum = [i + j for i, j in zip(a, b)]
print("a + b = {}".format(sum))

a + b = [4, 7, 9, 11, 8]


**Exercise:** Square elements of an integer list using list comprehension

**Exercise:** Given a list of even and odd natural numbers, create a list replacing all even numbers with 'even' and all odd numbers with 'odd'

In [35]:
x = [1, 2, 3]
x[1] = 4

# Use tuples when you want to secure your data
x = (1, 2, 3)
x[1] = 4

print(x)

TypeError: 'tuple' object does not support item assignment

In [48]:
# Slicing a list to obtain first 4 elements
x = [6, 2, 4, 6, 6, 4, 3, 7]
print(x[:4])
# Slice to obtain elements 5 to 7
print(x[4:-1])

[6, 2, 4, 6]
[6, 4, 3]


## Numpy

Numpy is a scientific computing library in Python that offers a multitude of methods that manipulate multidimensional arrays.  The methods are very similar to those available in MATLAB.

"If you master linear algebra and probability then you are set for life" - Greg Wornell

My recommendation is to always use numpy arrays instead of lists.  Great improvement in speed.

In [49]:
# This is a list of lists or a matrix
x = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
print(x)
print(x[2][2])

print("\nUsing Numpy\n")
# This can be created in numpy in one line
x = np.eye(3)
print(x)
print(x[2, 2])

[[1, 0, 0], [0, 1, 0], [0, 0, 1]]
1

Using Numpy

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
1.0


In [42]:
# Other useful methods
a = np.zeros((3, 3))
print(a)

b = np.ones((3, 4))
print(b)

c = np.random.random((3, 3))
print(c)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[[0.22917935 0.73760941 0.50622467]
 [0.53396193 0.2129907  0.41130008]
 [0.39449403 0.74430247 0.28396855]]


In [51]:
# Create an array
a = np.array([[1.4, 2., 6], [4, 5, 100.3], [-2, 3, -5.5]])
print(a)

# Print shape of a
print("The shape of the array is {}".format(a.shape))

# Let's slice it up and print the first column
print(a[:, 0])

# Reshape the above slice into a column vector
print(a[:, 0].reshape(-1, 1))

# Pull out subarray containing first two rows and last two columns
print(a[:2, 1:])

[[  1.4   2.    6. ]
 [  4.    5.  100.3]
 [ -2.    3.   -5.5]]
The shape of the array is (3, 3)
[ 1.4  4.  -2. ]
[[ 1.4]
 [ 4. ]
 [-2. ]]
[[  2.    6. ]
 [  5.  100.3]]


In [54]:
# Create an array with uniformly spaced elements between 0 and 1
x = np.arange(0, 1, 0.01)
print(x)
y = np.linspace(0, 1, 101)
print(y)

[0.   0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1  0.11 0.12 0.13
 0.14 0.15 0.16 0.17 0.18 0.19 0.2  0.21 0.22 0.23 0.24 0.25 0.26 0.27
 0.28 0.29 0.3  0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4  0.41
 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5  0.51 0.52 0.53 0.54 0.55
 0.56 0.57 0.58 0.59 0.6  0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69
 0.7  0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8  0.81 0.82 0.83
 0.84 0.85 0.86 0.87 0.88 0.89 0.9  0.91 0.92 0.93 0.94 0.95 0.96 0.97
 0.98 0.99]
[0.   0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1  0.11 0.12 0.13
 0.14 0.15 0.16 0.17 0.18 0.19 0.2  0.21 0.22 0.23 0.24 0.25 0.26 0.27
 0.28 0.29 0.3  0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4  0.41
 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5  0.51 0.52 0.53 0.54 0.55
 0.56 0.57 0.58 0.59 0.6  0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69
 0.7  0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8  0.81 0.82 0.83
 0.84 0.85 0.86 0.87 0.88 0.89 0.9  0.91 0.92 0.93 0.94 0.95 0.96

In [57]:
# Create two numpy arrays 
x = np.random.random((2, 3))
y = np.random.random((2, 3))

print("x = {}".format(x))
print("y = {}".format(y))

# Let's add these two arrays
add = x + y
print("x + y = {}".format(add))

# Let's subtract x - y
print("x - y = {}".format(x - y))

# Elementwise multiplication
print("x*y = {}".format(x*y))

# Square all elements of x
print("x^2 = {}".format(x**2))

# Elementwise division
print("x/y = {}".format(x/y))

x = [[0.57611218 0.12404367 0.6694266 ]
 [0.22948147 0.15221884 0.90956653]]
y = [[0.87481318 0.55312563 0.13230495]
 [0.38215113 0.79769897 0.08044353]]
x + y = [[1.45092536 0.6771693  0.80173156]
 [0.6116326  0.94991781 0.99001006]]
x - y = [[-0.298701   -0.42908196  0.53712165]
 [-0.15266965 -0.64548013  0.829123  ]]
x*y = [[0.50399053 0.06861173 0.08856846]
 [0.0876966  0.12142481 0.07316874]]
x^2 = [[0.33190524 0.01538683 0.44813198]
 [0.05266175 0.02317058 0.82731127]]
x/y = [[ 0.65855453  0.22425949  5.05972441]
 [ 0.60049927  0.19082241 11.30689486]]


In [60]:
# Matrix multiplication 
z = np.random.random((3, 2))
print("z = {}".format(z))

# Perform y x z
print("yxz = {}".format(y@z))

# Could also use np.dot, but don't recommend if using Python 3
print("yxz = {}".format(np.dot(y, z)))

z = [[0.10133764 0.49489058]
 [0.56414367 0.66513964]
 [0.48771982 0.6191673 ]]
yxz = [[0.46522158 0.88276149]
 [0.52797702 0.7695122 ]]
yxz = [[0.46522158 0.88276149]
 [0.52797702 0.7695122 ]]


In [62]:
# Numpy performs broadcasting

# Add a scalar value of 6.246 to every element in x
print("x = {}".format(x))
print("x + 6.246 = {}".format(x + 6.246))

# Add the row vector [1, 0, 0] to the rows of x
print("x + [1, 0, 0] = {}".format(x + np.array([1, 0, 0])))

x = [[0.57611218 0.12404367 0.6694266 ]
 [0.22948147 0.15221884 0.90956653]]
x + 6.246 = [[6.82211218 6.37004367 6.9154266 ]
 [6.47548147 6.39821884 7.15556653]]
x + [1, 0, 0] = [[1.57611218 0.12404367 0.6694266 ]
 [1.22948147 0.15221884 0.90956653]]


**Exercise**:  Add the column vector [1, 0] to columns of x

Great! We've got some list and numpy basics out of the way.  Now let's put use these to perform some very simple operations that often arise in machine learning and reinforcement learning.

## Softmax Operations

Consider a setting where your task is to navigate an environment by opening doors.  Each door ir represented by a triplet (a, b, c) which encodes some property of the doors.  Assume that such an encoding is learned by some deep neural network.<br>

We know that each door provides a reward after opening it.  The reward obtained by opening some door with encoding (a, b, c) is 0.77a - 0.2b + 0.43c.  Everytime you enter a new room you have four possible doors you can choose to venture through next.<br>

Assume that you have collected rewards and are currently at room X.  You see four rooms with encodings (0, 0, 1), (0, 1, 0), (1, 0, 1), (2, 1, 0.5).  You make your decision to travel to the next room by considering the total reward you would have after moving to that room.

Let your current reward be 0.1.  Now let's find the rewards we would obtain by navigating to each room.  This can be found my matrix multiplication.  Let the matrix of room encodings be:

$$
B = \begin{bmatrix}
0 & 0 & 1 & 2\\
0 & 1 & 0 & 1\\
1 & 0 & 1 & 0.5\\
\end{bmatrix}.
$$

Let the key be

$$
K = \begin{bmatrix}
0.77 & -0.20 & 0.43
\end{bmatrix}.
$$

Then the reward can be calculated as $R = KB$.

In [63]:
B = np.array(
[[0, 0, 1, 2],
[0, 1, 0, 1],
[1, 0, 1, 0.5]]
)

K = np.array([0.77, -0.20, 0.43]).reshape(1, -1)

R = K@B
print(R)

[[ 0.43  -0.2    1.2    1.555]]


Now we want to calculate the reward obtained by moving to each room.

In [65]:
R_move = 0.1 + R
print(R_move)

[[ 0.53  -0.1    1.3    1.655]]


In reinforcement learning settings the rewards as above provide a summary of the environment.  In order to choose which room to move to we would first like to construct a categorical distribution over the rooms using the softmax.<br>

Let $x = [x_1, x_2, \dots, x_n]$ then
$$
\text{softmax}(x)_i = \frac{e^{x_i}}{\sum_j e^{z_j}}
$$

In [74]:
def softmax(x):
    return np.exp(x)/np.sum(np.exp(x), axis = 1)

p = softmax(R_move).reshape(max(list(R_move.shape)), )
print(p)

[0.14765433 0.07863948 0.31889884 0.45480735]


One method for choosing an action take in RL is to choose the one that maximizes rewards.

In [79]:
best_action = np.argmax(p)
best_action_reward = R_move[0, best_action]
print(best_action)
print(best_action_reward)

3
1.6550000000000002


Okay nothing great there.  In RL settings we want our agent to learn from mistakes and explore the environment well.  To do this we will actually sample a random action using the categorical distribution.

In [98]:
random_action = np.random.choice(np.arange(len(p)), p=p)
random_action_reward = R_move[0, random_action]
print(random_action)
print(random_action_reward)

2
1.3


**Exercise**:  Change to the weighted softmax and see how the categorical distribution changes.
    
$$
\text{softmax}(x)_i = \frac{e^{x_i/\tau}}{\sum_j e^{z_j/\tau}}
$$    

Finally, in Q-learning settings we will have buffers to store our previous actions.  Given a buffer (list) with all our previous actions, remove all occurences of the current action chosen.

In [100]:
buffer = [2, 1, 3, 2, 0, 0, 1, 0, 2, 1, 0, 2, 1, 1, 2, 3, 3, 0, 1]

new_buffer = [i for i in buffer if not i == random_action]

print(new_buffer)

[1, 3, 0, 0, 1, 0, 1, 0, 1, 1, 3, 3, 0, 1]
