# 7 Understanding Simple Recurrent Neural Networks in Keras

In [23]:
import numpy as np
import pandas as pd
from tensorflow.keras.layers import Dense, SimpleRNN
from tensorflow.keras.models import Sequential
from sklearn.preprocessing import MinMaxScaler

## 7.1 Keras `SimpleRNN` Layer

In [2]:
def create_RNN(
    num_hidden_units: int,
    num_dense_units: int,
    input_shape: (int, int),
    activations: [str],
) -> Sequential:
    model = Sequential()
    model.add(
        SimpleRNN(num_hidden_units, input_shape=input_shape, activation=activations[0])
    )
    model.add(Dense(units=num_dense_units, activation=activations[1]))

    model.compile(loss="mean_squared_error", optimizer="adam")

    return model


m = 2
T = 3
demo_model = create_RNN(m, 1, (T, 1), activations=["linear", "linear"])

demo_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn (SimpleRNN)      (None, 2)                 8         
                                                                 
 dense (Dense)               (None, 1)                 3         
                                                                 
Total params: 11
Trainable params: 11
Non-trainable params: 0
_________________________________________________________________


Why 11 parameters, you ask?

In [3]:
[w.numpy().shape for w in demo_model.layers[0].weights]

[(1, 2), (2, 2), (2,)]

In [4]:
[w.numpy().shape for w in demo_model.layers[1].weights]

[(2, 1), (1,)]

And they are all randomly initialized right now (with the exception of biases which are initialized to zeros).

In [5]:
w_x = demo_model.get_weights()[0]
w_h = demo_model.get_weights()[1]
b_h = demo_model.get_weights()[2]
w_y = demo_model.get_weights()[3]
b_y = demo_model.get_weights()[4]

print(f"{w_x = }\n{w_h = }\n{b_h = }\n{w_y = }\n{b_y = }")

w_x = array([[-1.3690753 , -0.34511125]], dtype=float32)
w_h = array([[-0.8884481 ,  0.45897698],
       [-0.45897698, -0.8884482 ]], dtype=float32)
b_h = array([0., 0.], dtype=float32)
w_y = array([[ 1.3722934 ],
       [-0.35240567]], dtype=float32)
b_y = array([0.], dtype=float32)


Notice how the dimensions of our parameters are independent of the number of time steps. The same weight matrices are used in every time step within the same forward pass.

Next, let us look at a single forward pass in this super-simple model. Recall that "linear" activation functions do nothing as they are the identity function $f(x) = x$.

In [6]:
x = np.array([1, 2, 3])
x_input = np.reshape(x, (1, T, 1))  # 1 sequence, T=3 time steps, 1 feature
y_pred = demo_model.predict(x_input)
y_pred



2023-03-30 20:34:44.151133: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


array([[-3.068111]], dtype=float32)

Now let us replicate this result using linear algebra.

In [7]:
h = np.zeros((m, T + 1))
for t in range(T):
    h[:, t + 1] = np.dot(x[t], w_x) + np.dot(h[:, t], w_h) + b_h
o_3 = np.dot(h[:, T], w_y) + b_y
o_3

array([-3.06811105])

💥 BAM! They're equal.

## 7.2 Running the RNN on Sunspots Dataset

Read the data and split it into training and test. The data is originally from https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-sunspots.csv