# Understanding the Weights in RNNs

## Instructions
0. If you haven't already, follow [the setup instructions here](https://jennselby.github.io/MachineLearningCourseNotes/#setting-up-python3) to get all necessary software installed.
0. Look at the code in [Part A: Single Unit Simple Recurrent Layer](#Part-A:-Single-Unit-Simple-Recurrent-Layer) and complete the [Part A Exercise](#Part-A-Exercise)
0. Look at the code in [Part B: Two Unit Simple Recurrent Layer](#Part-B:-Two-Unit-Simple-Recurrent-Layer) and complete the [Part B Exercise](#Part-B-Exercise)
0. Optionally, look at the code in [Part C: LSTM Layer](#Part-C:-LSTM-Layer) and complete the [Part C Exercise](#Part-C-Exercise)

## Documentation/Sources
* [Class Notes](https://jennselby.github.io/MachineLearningCourseNotes/#recurrent-neural-networks)
* [https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/](https://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-networks-python-keras/) for information on sequence classification with keras
* [https://keras.io/](https://keras.io/) Keras API documentation
* [Keras recurrent tutorial](https://github.com/Vict0rSch/deep_learning/tree/master/keras/recurrent)

## Part A: Single Unit Simple Recurrent Layer

Before we dive into something as complicated as LSTMs, Let's take a deeper look at simple recurrent layer weights.

In [1]:
import numpy
from keras.layers import SimpleRNN
from keras.models import Sequential
from keras.layers import LSTM

The neurons in the recurrent layer pass their output to the next layer, but also back to themselves. The input shape says that we'll be passing in one-dimensional inputs of unspecified length (the None is what makes it unspecified).

In [2]:
one_unit_SRNN = Sequential()
one_unit_SRNN.add(SimpleRNN(units=1, input_shape=(None, 1), activation='linear', use_bias=False))

In [3]:
one_unit_SRNN_weights = one_unit_SRNN.get_weights()
one_unit_SRNN_weights

[array([[-0.69055295]], dtype=float32), array([[1.]], dtype=float32)]

We can set the weights to whatever we want, to test out what happens with different weight values.

In [4]:
one_unit_SRNN_weights[0][0][0] = 2
one_unit_SRNN_weights[1][0][0] = 0.5
one_unit_SRNN.set_weights(one_unit_SRNN_weights)
one_unit_SRNN.get_weights()

[array([[2.]], dtype=float32), array([[0.5]], dtype=float32)]

We can then pass in different input values, to see what the model outputs.

The code below passes in a single sample that has three time steps.

In [5]:
one_unit_SRNN.predict(numpy.array([ [[2], [2], [0]] ]))

array([[3.]], dtype=float32)

## Part A Exercise
Figure out what the two weights in the one_unit_SRNN model control. Be sure to test your hypothesis thoroughly. Use different weights and different inputs.

---
---
Each time step, the first weight multiplies the input, and the second weight multiplies the feedback from the previous time steps. These results are then added and passed on. 

### Example:

WEIGHTS: 2, 0.5
IN: 2, 2, 0

T1:
I: 2 (2\*2) -> 4, R: 0 (0\*0.5) -> 0, O: 4+0 = 4

T2: 
I: 2 (2\*2) -> 4, R: 4 (4\*0.5) -> 2, O: 4+2 = 6

T3:
I: 0 (2\*0) -> 0, R: 6 (6\*0.5) -> 3, O: 0+3 = 3

OUT: 3


## Part B: Two Unit Simple Recurrent Layer

In [6]:
two_unit_SRNN = Sequential()
two_unit_SRNN.add(SimpleRNN(units=2, input_shape=(None, 1), activation='linear', use_bias=False))

In [7]:
two_unit_SRNN_weights = two_unit_SRNN.get_weights()
two_unit_SRNN_weights

[array([[ 0.08152485, -1.3975402 ]], dtype=float32),
 array([[ 0.6152247,  0.7883518],
        [-0.7883518,  0.6152247]], dtype=float32)]

In [8]:
two_unit_SRNN_weights[0][0][0] = 1
two_unit_SRNN_weights[0][0][1] = 1
two_unit_SRNN_weights[1][0][0] = 0
two_unit_SRNN_weights[1][0][1] = 1
two_unit_SRNN_weights[1][1][0] = 0
two_unit_SRNN_weights[1][1][1] = 1
two_unit_SRNN.set_weights(two_unit_SRNN_weights)
two_unit_SRNN.get_weights()

[array([[1., 1.]], dtype=float32),
 array([[0., 1.],
        [0., 1.]], dtype=float32)]

This passes in a single sample with four time steps.

In [9]:
two_unit_SRNN.predict(numpy.array([ [[3], [3], [7], [5]] ]))

array([[ 5., 31.]], dtype=float32)

## Part B Exercise
What do each of the six weights of the two_unit_SRNN control? Again, test out your hypotheses carefully.

---
---

Each unit needs weights to multiply the input from the timestep. They also need weights to multiply all of the units feedback from previous timesteps. Thus, they have n+n^2 total weights.

The first two weights correspond to the first weight in the single unit RNN. This is given by `[0][][]`. The next four are the weights for feedback, `[1][][]`. `[1][0][]` corresponds to the unit giving feedback, and `[1][][0]` corresponds to the unit receiving feedback.  

Otherwise, the RNN should work the same as the single unit model.

## Part C: LSTM Layer
### Optional

In [10]:
one_unit_LSTM = Sequential()
one_unit_LSTM.add(LSTM(units=1, input_shape=(None, 1),
                       activation='linear', recurrent_activation='linear',
                       use_bias=False, unit_forget_bias=False,
                       kernel_initializer='zeros',
                       recurrent_initializer='zeros',
                       return_sequences=True))

In [11]:
one_unit_LSTM_weights = one_unit_LSTM.get_weights()
one_unit_LSTM_weights

[array([[0., 0., 0., 0.]], dtype=float32),
 array([[0., 0., 0., 0.]], dtype=float32)]

In [12]:
one_unit_LSTM_weights[0][0][0] = 1
one_unit_LSTM_weights[0][0][1] = 0
one_unit_LSTM_weights[0][0][2] = 1
one_unit_LSTM_weights[0][0][3] = 1
one_unit_LSTM_weights[1][0][0] = 0
one_unit_LSTM_weights[1][0][1] = 0
one_unit_LSTM_weights[1][0][2] = 0
one_unit_LSTM_weights[1][0][3] = 0
one_unit_LSTM.set_weights(one_unit_LSTM_weights)
one_unit_LSTM.get_weights()

[array([[1., 0., 1., 1.]], dtype=float32),
 array([[0., 0., 0., 0.]], dtype=float32)]

In [13]:
one_unit_LSTM.predict(numpy.array([ [[0], [1], [2], [4]] ]))

array([[[ 0.],
        [ 1.],
        [ 8.],
        [64.]]], dtype=float32)

## Part C Exercise
### Optional
Conceptually, the [LSTM](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) has several _gates_:

* __Forget gate__: these weights allow some long-term memories to be forgotten.
* __Input gate__: these weights decide what new information will be added to the context cell.
* __Output gate__: these weights decide what pieces of the new information and updated context will be passed on to the output.

It also has a __cell__ that can hold onto information from the current input (as well as things it has remembered from previous inputs), so that it can be used in later outputs.

Identify which weights in the one_unit_LSTM model are connected with the context and which are associated with the three gates. This is considerably more difficult to do by looking at the inputs and outputs, so you could also treat this as a code reading exercise and look through the keras code to find the answer.

_Note_: The output from the predict call is what the linked explanation calls $h_{t}$.

---
---

Looking through the keras [codebase](https://github.com/keras-team/keras/blob/bd968bf156b4346ac58e679ccd92f02796294885/keras/layers/recurrent.py#L2384) for recurrent neural networks, we can see that in the LSTMCell class, the order of the weights are referenced as `x_i, x_f, x_c, x_o`. This most likely corresponds to input, forget, context, and output. This explains the `[][][0]` indicies. The first indice, `[0][][]`, most likely corresponds to what is being multiplied -- either the timestep input or the feedback. Given that all the `[1][][]` indices are being set to zero, and we get a non-zero output, we can conclude that `[0][][]` corresponds to the timestep input and `[1][][]` corresponds to the feedback.