## Setup

If you haven't already, please follow the [setup instructions](https://jennselby.github.io/MachineLearningCourseNotes/#setup-and-tools) to get all of the necessary software (Github is optional).

## Documentation/Sources
* [Class Notes](https://jennselby.github.io/MachineLearningCourseNotes/#recurrent-neural-networks)
* [https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/](https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/) for information on sequence classification with keras
* [https://keras.io/](https://keras.io/) Keras API documentation
* [Keras recurrent tutorial](https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent)

# Exercise: Understand the Weight in RNNs

## Part A: Exploring Simple Recurrent Layers

Before we dive into something as complicated as LSTMs, Let's take a deeper look at simple recurrent layer weights.

In [1]:
import numpy
from keras.layers import SimpleRNN
from keras.models import Sequential
from keras.layers import LSTM

Using TensorFlow backend.


The neurons in the recurrent layer pass their output to the next layer, but also back to themselves. The input shape says that we'll be passing in one-dimensional inputs of unspecified length (the None is what makes it unspecified).

In [2]:
one_unit_SRNN = Sequential()
one_unit_SRNN.add(SimpleRNN(units=1, input_shape=(None, 1), activation='linear', use_bias=False))

In [4]:
one_unit_SRNN_weights = one_unit_SRNN.get_weights()
one_unit_SRNN_weights

[array([[1.4368719]], dtype=float32), array([[-1.]], dtype=float32)]

We can set the weights to whatever we want, to test out what happens with different weight values.

In [None]:
one_unit_SRNN_weights[0][0][0] = 1
one_unit_SRNN_weights[1][0][0] = 1
one_unit_SRNN.set_weights(one_unit_SRNN_weights)
one_unit_SRNN.get_weights()

We can then pass in different input values, to see what the model outputs.

The code below passes in a single sample that has three time steps.

In [None]:
one_unit_SRNN.predict(numpy.array([ [[3], [3], [7]] ]))

# Part A
Figure out what the two weights in the one_unit_SRNN model control. Be sure to test your hypothesis thoroughly. Use different weights and different inputs.

In [7]:
one_unit_SRNN_weights[0][0][0] = 5
one_unit_SRNN_weights[1][0][0] = 0.1
one_unit_SRNN.set_weights(one_unit_SRNN_weights)
one_unit_SRNN.predict(numpy.array([ [[1], [2], [3], [4]] ]))

array([[21.605]], dtype=float32)

In [9]:
print(1*5)
print(1*5*0.1+2*5)
print((1*5*0.1+2*5)*0.1 + 3*5)
print(((1*5*0.1+2*5)*0.1 + 3*5)*0.1 + 4 *5)

5
10.5
16.05
21.605


The first weight is multiplied against everything
For an array of 'n' elements, the second weight is raised to the 'n - i' power and then multiplied against that value

This means that the first weight is being applied to the inputs. Each value is multiplied by 5 once, because they are only multiplied against that weight when being inputted

The second weight is applied to the previous output. The first value is multiplied by the 2nd weight three times, because the model has three outputs. Meanwhile, the fourth and final input is not multiplied by the 2nd weight, becaues the model is done inputting values.

## Part B: Slightly larger simple recurrent model

In [10]:
two_unit_SRNN = Sequential()
two_unit_SRNN.add(SimpleRNN(units=2, input_shape=(None, 1), activation='linear', use_bias=False))

In [11]:
two_unit_SRNN_weights = two_unit_SRNN.get_weights()
two_unit_SRNN_weights

[array([[-0.0588845,  1.3168381]], dtype=float32),
 array([[ 0.03392483, -0.9994244 ],
        [-0.9994244 , -0.03392483]], dtype=float32)]

In [None]:
two_unit_SRNN_weights[0][0][0] = 1
two_unit_SRNN_weights[0][0][1] = 1
two_unit_SRNN_weights[1][0][0] = 0
two_unit_SRNN_weights[1][0][1] = 1
two_unit_SRNN_weights[1][1][0] = 0
two_unit_SRNN_weights[1][1][1] = 1
two_unit_SRNN.set_weights(two_unit_SRNN_weights)
two_unit_SRNN.get_weights()

This passes in a single sample with four time steps.

In [None]:
two_unit_SRNN.predict(numpy.array([ [[3], [3], [7], [5]] ]))

# Part B
What do each of the six weights of the two_unit_SRNN control? Again, test out your hypotheses carefully.

In [18]:
a, b, c, d, e, f = (1, 1, 0, 0 , 0.25, 0)
two_unit_SRNN_weights[0][0][0] = a
two_unit_SRNN_weights[0][0][1] = b
two_unit_SRNN_weights[1][0][0] = c
two_unit_SRNN_weights[1][0][1] = d
two_unit_SRNN_weights[1][1][0] = e
two_unit_SRNN_weights[1][1][1] = f
two_unit_SRNN.set_weights(two_unit_SRNN_weights)
#3, 3, 7, 5
two_unit_SRNN.predict(numpy.array([ [[4], [5], [8], [6]] ]))

array([[8., 6.]], dtype=float32)

In [19]:
def unit1(input1, u1, u2):
    return input1 * a + u1 * c + u2 * e
def unit2(input2, u1, u2):
    return input2 * b + u1 * d + u2 * f
sum1_0 = unit1(4, 0, 0)
sum2_0 = unit2(4, 0, 0)
sum1_1 = unit1(5, sum1_0, sum2_0)
sum2_1 = unit2(5, sum1_0, sum2_0)
sum1_2 = unit1(8, sum1_1, sum2_1)
sum2_2 = unit2(8, sum1_1, sum2_1)
sum1_3 = unit1(6, sum1_2, sum2_2)
sum2_3 = unit2(6, sum1_2, sum2_2)
print(sum1_0, sum2_0, sum1_1, sum2_1, sum1_2, sum2_2, sum1_3, sum2_3)

4.0 4 6.0 5.0 9.25 8.0 8.0 6.0


The unit1 and unit2 functions are effectively a recreation of the recurrent units in the neural network. The units take an input and the output from itself and the other unit.

It has been a while, so I don't remember the exact process I used to find out what each weight did. It was clear that weights A and B changed the input, because when weights C,D,E,and F were set to 0, the last input was simply multiplied by weight A or B to get a result. 

I could also tell that weight C was similar to the 2nd weight from Part A, since when C was set to a value, and D, E, and F were set to 0, the results were similar to Part A. However, I struggled to figure out the pattern when both C and D were non-zero numbers. I think somehow I figured out that weight E was similar to weight C. 

The current weights are an example of a case where the output does not rapidly grow. Since both weights D and F are zero for unit 2, this means unit 2 does not have any memory, and the output of unit 2 is based on the input only. Unit 1 has weight C at zero, which means it forgets its own output. Weight E is not zero, and this weight is multiplied against the result of unit 2. 

Unit 1's output is the input plus the output of Unit 2, and Unit 2's output is the current input. Unit 2 is forgetting everything, and Unit 1 is forgetting everything except Unit 2, which is already forgetting, so these two units will keep resetting to zero. 

A is the weight for the input to unit 1

B is the weight for the input to unit 2

C is the weight for unit1 to unit1 (self)

D is the weight for unit1 to unit2

E is the weight for unit2 to unit1

F is the weight for unit2 to unit2 (self)




## Part C: Exploring LSTMs (Optional Extension)


In [20]:
one_unit_LSTM = Sequential()
one_unit_LSTM.add(LSTM(units=1, input_shape=(None, 1),
                       activation='linear', recurrent_activation='linear',
                       use_bias=False, unit_forget_bias=False,
                       kernel_initializer='zeros',
                       recurrent_initializer='zeros',
                       return_sequences=True))

In [21]:
one_unit_LSTM_weights = one_unit_LSTM.get_weights()
one_unit_LSTM_weights

[array([[0., 0., 0., 0.]], dtype=float32),
 array([[0., 0., 0., 0.]], dtype=float32)]

In [22]:
one_unit_LSTM_weights[0][0][0] = 1
one_unit_LSTM_weights[0][0][1] = 0
one_unit_LSTM_weights[0][0][2] = 1
one_unit_LSTM_weights[0][0][3] = 1
one_unit_LSTM_weights[1][0][0] = 0
one_unit_LSTM_weights[1][0][1] = 0
one_unit_LSTM_weights[1][0][2] = 0
one_unit_LSTM_weights[1][0][3] = 0
one_unit_LSTM.set_weights(one_unit_LSTM_weights)
one_unit_LSTM.get_weights()

[array([[1., 0., 1., 1.]], dtype=float32),
 array([[0., 0., 0., 0.]], dtype=float32)]

In [23]:
one_unit_LSTM_weights[0][0][0] = 1
one_unit_LSTM_weights[0][0][1] = 1
one_unit_LSTM_weights[0][0][2] = 1
one_unit_LSTM_weights[0][0][3] = 1
one_unit_LSTM_weights[1][0][0] = 0
one_unit_LSTM_weights[1][0][1] = 0
one_unit_LSTM_weights[1][0][2] = 0
one_unit_LSTM_weights[1][0][3] = 0
one_unit_LSTM.set_weights(one_unit_LSTM_weights)
one_unit_LSTM.predict(numpy.array([ [[1], [1], [2], [3], [1], [2], [3]] ]))

array([[[  1.],
        [  2.],
        [ 16.],
        [ 99.],
        [ 34.],
        [144.],
        [675.]]], dtype=float32)

### Part C (Optional Extension)
Conceptually, the [LSTM](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) has several _gates_:

* __Forget gate__: these weights allow some long-term memories to be forgotten.
* __Input gate__: these weights decide what new information will be added to the context cell.
* __Output gate__: these weights decide what pieces of the new information and updated context will be passed on to the output.

It also has a __cell__ that can hold onto information from the current input (as well as things it has remembered from previous inputs), so that it can be used in later outputs.

Identify which weights in the one_unit_LSTM model are connected with the context and which are associated with the three gates. This is considerably more difficult to do by looking at the inputs and outputs, so you could also treat this as a code reading exercise and look through the keras code to find the answer.

_Note_: The output from the predict call is what the linked explanation calls $h_{t}$.

In [24]:
W = one_unit_LSTM.layers[0].get_weights()[0]
U = one_unit_LSTM.layers[0].get_weights()[1]
units = int(int(one_unit_LSTM.layers[0].trainable_weights[0].shape[1])/4)
W_i = W[:, :units] #input
W_f = W[:, units: units * 2] #forget
W_c = W[:, units * 2: units * 3] #cell state
W_o = W[:, units * 3:] #output

U_i = U[:, :units] #input
U_f = U[:, units: units * 2] #forget
U_c = U[:, units * 2: units * 3] #cell state
U_o = U[:, units * 3:] #output
print(W_i, W_f, W_c, W_o, U_i, U_f, U_c, U_o)


[[1.]] [[1.]] [[1.]] [[1.]] [[0.]] [[0.]] [[0.]] [[0.]]


#### If the first 4 weights are 1 and the last 4 weights are 0, the behavior can be predicted
I predicted that for input a, b, c, 

output_0 = a^3 = (0/a + a) * a^2

output_1 = (output_0)/a + b) * b^2

output_2 = (output_1/b + c) * c^2

I don't know what the last 3 weights do, but making the 5th, 7th, and 8th weights = 1 will make the output infinity. 

For inputs 1, 1, 2, 3, 1, 2, 3, we expect:

output_0 = 1^3 = 1

output_1 = (1/1+ 1) * 1^2 = 2

output_2 = (2/1 + 2) * 2^2 = 4 * 4 = 16

output_3 = (16/2 + 3) * 3^2 = 11 * 9 = 99

output_4 = (99/3 + 1) * 1^2 = 34

output_5 = (34/1 + 2) * 2^2 = 36 * 4 = 144

output_6 = (144/2 + 3) * 3^2 = 75 * 9 = 675

As we can see from above, the actual output matches this prediction