# Understanding RNNs by Shapes

**The best way to understand a Recurrent Neural Network is by implementing a simple RNN to track the shape of the data and model parameters as they pass through, starting with the input data:**

<img src="image.png" alt="Input" width="600"/>

**You should know the following by heart before building an RNN model:**

* **`N` - number of samples**
* **`T` - sequence length (no of timesteps)**
* **`D` - number of input features, or feature dimensionality**
* **`M` - number of hidden neurons per unit**
* **`K` - number of output units**

**The input data is *N* x *T* x *D*, to output *N* x *K*. Note that K > 1 is possible for a regression task if it is multidimensional, e.g. predicting latitude and longitude coordinates.**

**In order to manually build RNN model (to compare with TensorFlow), you need the equation to find the hidden value at the hidden layer, using the model weights from the simple RNN:**

$$
  h_1 = \delta ({W_h^T}{h_{t-1} + b_h}\, + {W_x^T}{x_t + b_x})
$$

**To generate the manual predictions, use the model weights from the outer layer to multiply input by weight plus bias term.**

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Flatten, SimpleRNN
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD, Adam

%matplotlib inline

In [2]:
# Create dummy data with size variables (only working with one sample)

N = 1

T = 10

D = 3

K = 2

X = np.random.randn(N, T, D)

In [3]:
X.shape

(1, 10, 3)

In [5]:
# ------------------------------------ TensorFlow RNN

M = 5

i = Input(shape=(T, D))

x = SimpleRNN(M)(i)

x = Dense(K)(x)

model = Model(i, x)

In [6]:
# Predict with model - this means nothing since you don't have actual values to compare to (Y)

y_hat = model.predict(X)

print(y_hat)

[[-0.63061917  0.12452571]]


**Note that we have one sample prediction with two output nodes, i.e. *N* x *K*.**

In [7]:
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 10, 3)]           0         
                                                                 
 simple_rnn (SimpleRNN)      (None, 5)                 45        
                                                                 
 dense (Dense)               (None, 2)                 12        
                                                                 
Total params: 57
Trainable params: 57
Non-trainable params: 0
_________________________________________________________________


In [8]:
# Array of weights

model.layers[1].get_weights()

[array([[ 0.43968147,  0.21117991, -0.8349118 ,  0.18456918,  0.43172258],
        [-0.3318646 , -0.5044817 ,  0.41810113, -0.22511888,  0.40751594],
        [ 0.3039102 ,  0.7216609 , -0.07823646,  0.63885313, -0.50507176]],
       dtype=float32),
 array([[ 0.49027526, -0.24690863,  0.6648533 , -0.4100977 , -0.297416  ],
        [ 0.57537746, -0.11377349, -0.38948414,  0.49453896, -0.50963676],
        [ 0.01049809,  0.08345936,  0.602319  ,  0.74983966,  0.26053122],
        [-0.65056723, -0.18075931,  0.17830363,  0.13051908, -0.70374775],
        [-0.07227032, -0.9415159 , -0.10812978,  0.08919635,  0.29778627]],
       dtype=float32),
 array([0., 0., 0., 0., 0.], dtype=float32)]

In [14]:
a, b, c = model.layers[1].get_weights()

print("MODEL WEIGHTS:")
print("Input to Hidden (D x M):", a.shape)
print("Hidden to Hidden (M x M):", b.shape)
print("Bias Term (M):", c.shape)

MODEL WEIGHTS:
Input to Hidden (D x M): (3, 5)
Hidden to Hidden (M x M): (5, 5)
Bias Term (M): (5,)


In [15]:
# Index the weights at each layer appropriately

Wx, Wh, bh = model.layers[1].get_weights()

# Output layer
Wo, bo = model.layers[2].get_weights()

In [17]:
# -------------------------------- Manual RNN calculation

# Initial hidden state
h_last = np.zeros(M)

# One and only sample
x = X[0]

# Store output
y_hats = []

for t in range(T):
    # hidden value at hidden layer - tanh(x*w1 + xt-1*w2 + b)
    h = np.tanh(x[t].dot(Wx) + h_last.dot(Wh) + bh)
    # Only care about this on last iteration
    y = h.dot(Wo) + bo
    y_hats.append(y)
    
    # IMPORTANT: assign h to h_last
    h_last = h
    
# Print final output
print(y_hats[-1])

[-0.63061914  0.12452582]


**The results are exactly the same as predicting with the simple RNN model.**

**EXERCISE:**

* **Calculate the output for mutiple samples at once (N > 1)**