## Setup

If you haven't already, please follow the [setup instructions](https://jennselby.github.io/MachineLearningCourseNotes/#setup-and-tools) to get all of the necessary software (Github is optional).

## Documentation/Sources
* [Class Notes](https://jennselby.github.io/MachineLearningCourseNotes/#recurrent-neural-networks)
* [https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/](https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/) for information on sequence classification with keras
* [https://keras.io/](https://keras.io/) Keras API documentation
* [Keras recurrent tutorial](https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent)

# Exercise: Understand the Weight in RNNs

## Part A: Exploring Simple Recurrent Layers

Before we dive into something as complicated as LSTMs, Let's take a deeper look at simple recurrent layer weights.

In [3]:
import numpy
from keras.layers import SimpleRNN
from keras.models import Sequential
from keras.layers import LSTM

Using TensorFlow backend.


The neurons in the recurrent layer pass their output to the next layer, but also back to themselves. The input shape says that we'll be passing in one-dimensional inputs of unspecified length (the None is what makes it unspecified).

In [4]:
one_unit_SRNN = Sequential()
one_unit_SRNN.add(SimpleRNN(units=1, input_shape=(None, 1), activation='linear', use_bias=False))

In [5]:
one_unit_SRNN_weights = one_unit_SRNN.get_weights()
one_unit_SRNN_weights

[array([[0.50457704]], dtype=float32), array([[1.]], dtype=float32)]

We can set the weights to whatever we want, to test out what happens with different weight values.

In [6]:
one_unit_SRNN_weights[0][0][0] = 1
one_unit_SRNN_weights[1][0][0] = 1
one_unit_SRNN.set_weights(one_unit_SRNN_weights)
one_unit_SRNN.get_weights()

[array([[1.]], dtype=float32), array([[1.]], dtype=float32)]

We can then pass in different input values, to see what the model outputs.

The code below passes in a single sample that has three time steps.

In [7]:
one_unit_SRNN.predict(numpy.array([ [[3], [3], [7]] ]))

array([[13.]], dtype=float32)

# Part A
Figure out what the two weights in the one_unit_SRNN model control. Be sure to test your hypothesis thoroughly. Use different weights and different inputs.

In [170]:
one_unit_SRNN_weights[0][0][0] = 1
one_unit_SRNN_weights[1][0][0] = 0.1
one_unit_SRNN.set_weights(one_unit_SRNN_weights)
one_unit_SRNN.predict(numpy.array([ [[1], [1], [1], [1]] ]))

array([[1.1110001]], dtype=float32)

In [171]:
print(0.75*13)
print(0.5*13)
print(0.5*10.9375)
print(3*0.1**3+3*0.1**2+7*0.1+8)
print(0.1**3+0.1**2+0.1**1+1)

9.75
6.5
5.46875
8.733
1.111


The first weight is multiplied against everything
For an array of 'n' elements, the second weight is raised to the 'n - i' power and then multiplied against that value

## Part B: Slightly larger simple recurrent model

In [67]:
two_unit_SRNN = Sequential()
two_unit_SRNN.add(SimpleRNN(units=2, input_shape=(None, 1), activation='linear', use_bias=False))

In [68]:
two_unit_SRNN_weights = two_unit_SRNN.get_weights()
two_unit_SRNN_weights

[array([[-0.7061047,  0.8480488]], dtype=float32),
 array([[ 0.28723228, -0.95786095],
        [-0.95786095, -0.28723228]], dtype=float32)]

In [69]:
two_unit_SRNN_weights[0][0][0] = 1
two_unit_SRNN_weights[0][0][1] = 1
two_unit_SRNN_weights[1][0][0] = 0
two_unit_SRNN_weights[1][0][1] = 1
two_unit_SRNN_weights[1][1][0] = 0
two_unit_SRNN_weights[1][1][1] = 1
two_unit_SRNN.set_weights(two_unit_SRNN_weights)
two_unit_SRNN.get_weights()

[array([[1., 1.]], dtype=float32), array([[0., 1.],
        [0., 1.]], dtype=float32)]

This passes in a single sample with four time steps.

In [70]:
two_unit_SRNN.predict(numpy.array([ [[3], [3], [7], [5]] ]))

array([[ 5., 31.]], dtype=float32)

# Part B
What do each of the six weights of the two_unit_SRNN control? Again, test out your hypotheses carefully.

In [238]:
a, b, c, d, e, f = (1, 1, 0, 0 , 0.25, 0)
two_unit_SRNN_weights[0][0][0] = a
two_unit_SRNN_weights[0][0][1] = b
two_unit_SRNN_weights[1][0][0] = c
two_unit_SRNN_weights[1][0][1] = d
two_unit_SRNN_weights[1][1][0] = e
two_unit_SRNN_weights[1][1][1] = f
two_unit_SRNN.set_weights(two_unit_SRNN_weights)
#3, 3, 7, 5
two_unit_SRNN.predict(numpy.array([ [[4], [4], [8], [6]] ]))

array([[8., 6.]], dtype=float32)

In [239]:
print(0.5 * 3 * 2 * 2 * 2 + 0.5 * 3 * 2 * 2 + 7 * 0.5 * 2 + 5 * 0.5 )
print(0.5 * 3 * 1 ** 3 + 0.5 * 3 * 1 ** 2 + 0.5 * 7 * 1 + 5 * 0.5)
def unit1(input1, u1, u2):
    return input1 * a + u1 * c + u2 * e
def unit2(input2, u1, u2):
    return input2 * b + u1 * d + u2 * f
sum1_0 = unit1(4, 0, 0)
sum2_0 = unit2(4, 0, 0)
sum1_1 = unit1(4, sum1_0, sum2_0)
sum2_1 = unit2(4, sum1_0, sum2_0)
sum1_2 = unit1(8, sum1_1, sum2_1)
sum2_2 = unit2(8, sum1_1, sum2_1)
sum1_3 = unit1(6, sum1_2, sum2_2)
sum2_3 = unit2(6, sum1_2, sum2_2)
print(sum1_0, sum2_0, sum1_1, sum2_1, sum1_2, sum2_2, sum1_3, sum2_3)

27.5
9.0
4.0 4 5.0 4.0 9.0 8.0 8.0 6.0


A is the weight for the input to unit 1
B is the weight for the input to unit 2
C is the weight for unit1 to unit1 (self)
D is the weight for unit1 to unit2
E is the weight for unit2 to unit1
F is the weight for unit2 to unit2 (self)

## Part C: Exploring LSTMs (Optional Extension)


In [378]:
one_unit_LSTM = Sequential()
one_unit_LSTM.add(LSTM(units=1, input_shape=(None, 1),
                       activation='linear', recurrent_activation='linear',
                       use_bias=False, unit_forget_bias=False,
                       kernel_initializer='zeros',
                       recurrent_initializer='zeros',
                       return_sequences=True))

In [379]:
one_unit_LSTM_weights = one_unit_LSTM.get_weights()
one_unit_LSTM_weights

[array([[0., 0., 0., 0.]], dtype=float32),
 array([[0., 0., 0., 0.]], dtype=float32)]

In [380]:
one_unit_LSTM_weights[0][0][0] = 1
one_unit_LSTM_weights[0][0][1] = 0
one_unit_LSTM_weights[0][0][2] = 1
one_unit_LSTM_weights[0][0][3] = 1
one_unit_LSTM_weights[1][0][0] = 0
one_unit_LSTM_weights[1][0][1] = 0
one_unit_LSTM_weights[1][0][2] = 0
one_unit_LSTM_weights[1][0][3] = 0
one_unit_LSTM.set_weights(one_unit_LSTM_weights)
one_unit_LSTM.get_weights()

[array([[1., 0., 1., 1.]], dtype=float32),
 array([[0., 0., 0., 0.]], dtype=float32)]

In [442]:
one_unit_LSTM_weights[0][0][0] = 1
one_unit_LSTM_weights[0][0][1] = 1
one_unit_LSTM_weights[0][0][2] = 1
one_unit_LSTM_weights[0][0][3] = 1
one_unit_LSTM_weights[1][0][0] = 0
one_unit_LSTM_weights[1][0][1] = 0
one_unit_LSTM_weights[1][0][2] = 0
one_unit_LSTM_weights[1][0][3] = 0
one_unit_LSTM.set_weights(one_unit_LSTM_weights)
one_unit_LSTM.predict(numpy.array([ [[1], [1], [2], [3], [1], [2], [3]] ]))

array([[[  1.],
        [  2.],
        [ 16.],
        [ 99.],
        [ 34.],
        [144.],
        [675.]]], dtype=float32)

### Part C (Optional Extension)
Conceptually, the [LSTM](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) has several _gates_:

* __Forget gate__: these weights allow some long-term memories to be forgotten.
* __Input gate__: these weights decide what new information will be added to the context cell.
* __Output gate__: these weights decide what pieces of the new information and updated context will be passed on to the output.

It also has a __cell__ that can hold onto information from the current input (as well as things it has remembered from previous inputs), so that it can be used in later outputs.

Identify which weights in the one_unit_LSTM model are connected with the context and which are associated with the three gates. This is considerably more difficult to do by looking at the inputs and outputs, so you could also treat this as a code reading exercise and look through the keras code to find the answer.

_Note_: The output from the predict call is what the linked explanation calls $h_{t}$.

In [443]:
W = one_unit_LSTM.layers[0].get_weights()[0]
U = one_unit_LSTM.layers[0].get_weights()[1]
units = int(int(one_unit_LSTM.layers[0].trainable_weights[0].shape[1])/4)
W_i = W[:, :units] #input
W_f = W[:, units: units * 2] #forget
W_c = W[:, units * 2: units * 3] #cell state
W_o = W[:, units * 3:] #output

U_i = U[:, :units] #input
U_f = U[:, units: units * 2] #forget
U_c = U[:, units * 2: units * 3] #cell state
U_o = U[:, units * 3:] #output
print(W_i, W_f, W_c, W_o, U_i, U_f, U_c, U_o)


[[1.]] [[1.]] [[1.]] [[1.]] [[0.]] [[0.]] [[0.]] [[0.]]


#### If the first 4 weights are 1 and the last 4 weights are 0, the behavior can be predicted
For input a, b, c
output_0 = a^3 = (0/a + a) * a^2
output_1 = (output_2)/a + b) * b^2
output_2 = (output_1/b + c) * c^2

I don't know what the last 3 weights do, but making the 5th, 7th, and 8th weights = 1 will make the output infinity. 