# RNN Shapes through the network

It helps to track the 'shape' of the data through the RNN network to understand what is happening under-the-hood. Understanding the mathematics is not as important as knowing the flow of the information.

For example, whenever you hear ***N* x *T* x *D*** matrix, you should think of a box.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

In [2]:
import tensorflow as tf
from tensorflow.keras.layers import Input, SimpleRNN, Dense, Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam, SGD

## Things you should know

Below are the important size variables, whose meaning should be permanently stored in your memory!

* **N** number of samples
* **T** sequence length
* **D** number of input features
* **M** number of hidden neurons
* **K** number of neurons in outer dense layer

Lets make some synthetic three-dimensional data of univariate continuous random values taken from a normal distribution, for a simple RNN regression (no activation function required in outer layer).

In [3]:
# Set size variables
N = 1
T = 10
D = 3
K = 2

# Fake data
X = np.random.randn(N, T, D)

In [4]:
# Make regression RNN network

M = 5

i = Input(shape=(T, D)) # (10 x 3)

x = SimpleRNN(M)(i)

x = Dense(K)(x)

model = Model(i, x)

In [5]:
# Get predicted output (1 x 2, i.e. one sample and two output results)

y_hat = model.predict(X)

print(y_hat)

[[-0.0652515 -0.1478728]]


In [7]:
# View model structure

model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 10, 3)]           0         
                                                                 
 simple_rnn (SimpleRNN)      (None, 5)                 45        
                                                                 
 dense (Dense)               (None, 2)                 12        
                                                                 
Total params: 57
Trainable params: 57
Non-trainable params: 0
_________________________________________________________________


In [8]:
# Check the weights - there are three arrays of weights for each of the three layers

model.layers[1].get_weights()

[array([[-0.15654248, -0.5145449 , -0.16511536, -0.70954716, -0.13775456],
        [ 0.2526881 , -0.35481632,  0.05409688,  0.0730921 , -0.04302329],
        [ 0.72688645,  0.7206072 , -0.33525246, -0.45841673, -0.6241146 ]],
       dtype=float32),
 array([[ 0.72892296, -0.25976282, -0.52256536, -0.318251  , -0.16381761],
        [ 0.22756657,  0.8399016 , -0.2946632 ,  0.3781304 , -0.11388477],
        [-0.5640798 , -0.01988399, -0.4627396 , -0.17590362, -0.6605669 ],
        [-0.16779234,  0.42310807, -0.0911424 , -0.78809524,  0.40425766],
        [-0.26560187, -0.2183408 , -0.64626944,  0.32202098,  0.6003509 ]],
       dtype=float32),
 array([0., 0., 0., 0., 0.], dtype=float32)]

In [9]:
# Print out shape of weight arrays - remember that D=3 and M=5

a, b, c = model.layers[1].get_weights()

print(a.shape, b.shape, c.shape)

(3, 5) (5, 5) (5,)


**Note that these shapes should make sense:**

    a is input weight by hidden weight (D x M)

    b is hidden weight by hidden weight (M x M)

    c is bias term (vector of length M)
    
**Now you can assign the 'weight' variables with *confidence*.** 

In [10]:
# Input and hidden weights
Wx, Wh, bh = model.layers[1].get_weights()

# Outer layer weights
Wo, bo = model.layers[2].get_weights()

In [11]:
print(Wx.shape, Wh.shape, bh.shape)

print(Wo.shape, bo.shape)

(3, 5) (5, 5) (5,)
(5, 2) (2,)


## Manual RNN calculation

Using For Loop, run a single sample (x) through the calculations that are performed under-the-hood of a simple RNN model:

    h = np.tanh(x[t].dot(Wx) + h_last.dot(Wh) + bh)
    y = h.dot(Wo) + bo

In [12]:
# Initial hidden state
h_last = np.zeros(M)

# One and only sample
x = X[0]

# Store outputs
yhats = []

for t in range(T):
    # Calculate hidden value at hidden layer
    h = np.tanh(x[t].dot(Wx) + h_last.dot(Wh) + bh)
    y = h.dot(Wo) + bo
    yhats.append(y)
    
    # NB: Assign h to h_last so it has correct value for next iteration
    h_last = h

# We only care about final yhat value
print(yhats[-1])

[-0.0652515  -0.14787285]


**Compare to model's prediction of -0.0652515 and -0.1478728...it is exactly the same! This is what you wanted. You have confirmed that these are the calculations done in a simple RNN (with one sample only).**

In [None]:
# EXERCISE: Modify code to run multiple samples at once, i.e. N > 1 
# The code should produce the same result even when you have multiple samples