[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/drdave-teaching/OPIM5509Files/blob/main/OPIM5509_Module4_Files/RNNs_By_Hand_basic.ipynb)

# RNNs by Hand
--------------------------------
**Dr. Dave Wanik - University of Connecticut**
Being able to count the trainable parameters by hand and describing the output shape of each layer will help you ensure that you actually know how these algorithms work. It will crystallize why you need to prep your data as 3D tensors.

Here's a cheat sheat for counting parms in deep learning models:
* **Link:** https://towardsdatascience.com/counting-no-of-parameters-in-deep-learning-models-by-hand-8f1716241889

And here's the blog with animated RNN, LSTM and GRU
* **Link:** https://towardsdatascience.com/animated-rnn-lstm-and-gru-ef124d06cf45

In [None]:
from tensorflow.keras.layers import Input, Dense, SimpleRNN, LSTM, GRU, Conv2D
from tensorflow.keras.layers import Bidirectional
from tensorflow.keras.models import Model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding

# Dense Neural Networks
(or feed-forward neural networks, FFNN)

* i, input size
* h, size of hidden layer
* o, output size
For one hidden layer,

```
num_params
= connections between layers + biases in every layer
= (i×h + h×o) + (h+o)
```

The example on the webpage assumes you have an input, a hidden layer and an output. For our examples with RNNs, we will assume o=0, and just use i and h. See below.

### One Simple RNN
One simpleRNN layer followed by a dense layer.

* `g`, no. of FFNNs in a unit (RNN has 1, GRU has 3, LSTM has 4)
* `h`, size of hidden units
* `i`, dimension/size of input

Since every FFNN (DNN) has `h(h+i) + h` parameters, we have
num_params = `g × [h(h+i) + h]`

Recall that the SimpleRNN only has one 'gate' or FFNN (you can see this in the cell!)

![alt text](https://miro.medium.com/max/1928/1*xn5kA92_J5KLaKcP7BMRLA.gif)



## One Simple RNN Layer (basic)

In [None]:
# here's the script for the image above

# for an SimpleRNN, the input shape is "input_shape=(n_steps, n_features)"
# this corresponds to the graph in "Animated!"
n_steps=50 # doesn't matter!
n_features=3 # the 3 green dots... APPL, GOOGLE, FB
model = Sequential()
# parms in SimpleRNN is
model.add((SimpleRNN(2, activation='relu', input_shape=(n_steps, n_features)))) # the two red dots

# it is those 2 red dots that will go into the dense layer (don't forget to add 1 for the bias!)
model.add(Dense(1)) # this dense layer is not show in the animation, but it's needed! # predict netflix!
model.summary()

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_2 (SimpleRNN)    (None, 2)                 12        
                                                                 
 dense_6 (Dense)             (None, 1)                 3         
                                                                 
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________


In [None]:
# try the math
# for each layer


# TRAINABLE PARAMETERS
# the general RNN layer formula is g × [h(h+i) + h]
g = 1 # there's only 1 FFNN in a simpleRNN cell (look above!)
h = 2
i = 3 # this is number of features, not the lookback!
print(g*(h*(h+i) + h))

# SHAPE
# (None, 2) where 2 are the number of hidden units
# so the time series is now just a flattened input of 2 going into a dense layer
# this is what the 2 in simpleRNN(2) means! just 2 red dots.


# dense layer
# TRAINABLE PARAMETERS
# the dense layer is (i×h + h×o) + (h+o)
# but we ignore h since there is not output
h = 1 # the dense layer has a 1
i = 2 # 2 hidden node inputs
o = 0 # there is no output
print((i*h + h*o) + (h+o))

# output shape is (NONE,1)

12
3


## One Simple RNN Layer (advanced)

In [None]:
# here's a related quiz question

# for an SimpleRNN, the input shape is "input_shape=(n_steps, n_features)"
# this corresponds to the graph in "Animated!"
n_steps=50
n_features=30 # having 30 stocks for covariates
model = Sequential()
model.add((SimpleRNN(25, activation='relu', input_shape=(n_steps, n_features))))
# it is those 25 red dots going into the dense layer, so you need 26 parms
model.add(Dense(1))
model.summary()

Model: "sequential_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_3 (SimpleRNN)    (None, 25)                1400      
                                                                 
 dense_7 (Dense)             (None, 1)                 26        
                                                                 
Total params: 1,426
Trainable params: 1,426
Non-trainable params: 0
_________________________________________________________________


In [None]:
# try the math
# for each layer


# TRAINABLE PARAMETERS
# the general RNN layer formula is g × [h(h+i) + h]
g = 1
h = 25
i = 30 # this is number of features, not the lookback!
print(g*(h*(h+i) + h)) #answer = 1400

# SHAPE
# (None, 25) where 25 are the number of hidden units
# so the time series is now just a flattened input of 25 going into a dense layer
# this is what the 25 in simpleRNN(25) means! just 25 red dots.


# dense layer
# TRAINABLE PARAMETERS
# the dense layer is (i×h + h×o) + (h+o)
# but we ignore h since there is not output
h = 1 # the dense layer has a 1
i = 25
o = 0 # there is no output
print((i*h + h*o) + (h+o)) #answer = 26

# output shape is (NONE,1)

1400
26


## One LSTM Layer (basic)
Here is what an LSTM looks like. Recall that it has four 'gates' or FFNNs.

![alt text](https://miro.medium.com/max/2250/1*goJVQs-p9kgLODFNyhl9zA.gif)

In [None]:
# here is the code that corresponds to the image

# for an LSTM, the input shape is "input_shape=(n_steps, n_features)"
# same example as above, just presented a different way
n_steps= 50 # doesn't matter! it will loop.
n_features= 3 # these are the 3 green dots
model = Sequential()
model.add((LSTM(2,  # these are the 2 red dots
                activation='relu', input_shape=(n_steps, n_features))))
model.add(Dense(1)) # not shown, but you need it and should realize that the
                    # 2 dark red dots are what will go into the dense layer
model.summary()

Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_2 (LSTM)               (None, 2)                 48        
                                                                 
 dense_8 (Dense)             (None, 1)                 3         
                                                                 
Total params: 51
Trainable params: 51
Non-trainable params: 0
_________________________________________________________________


In [None]:
# try the math

# simple RNN
# TRAINABLE PARAMETERS
# the generic RNN layer is g × [h(h+i) + h]
g = 4 #LSTM has 4 FFNNs!
h = 2 # hidden units within LSTM, the two red dots
i = 3 # this is number of features, not the lookback! these are your 3 stocks (green dots!)
print(g*(h*(h+i) + h))

# SHAPE
# (None, 2) where 2 are the number of hidden units
# so the time series is now just a flattened input of 2 going into a dense layer

# dense layer
# TRAINABLE PARAMETERS
# the dense layer is (i×h + h×o) + (h+o)
# but we ignore h since there is not output
h = 1 # the dense layer has 1 output
i = 2 # these are all 4 inputs going into a dense layer
o = 0 # there is no output
print((i*h + h*o) + (h+o))

48
3


## One LSTM Layer (advanced)

In [None]:
# for an LSTM, the input shape is "input_shape=(n_steps, n_features)"
# same example as above, just presented a different way
n_steps= 30 # lookback
n_features= 5 # 5 different stocks, 5 green dots
model = Sequential()
model.add((LSTM(4, activation='relu', input_shape=(n_steps, n_features)))) # hidden units = 4 means 4 red dots
model.add(Dense(1))
model.summary()

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_3 (LSTM)               (None, 4)                 160       
                                                                 
 dense_9 (Dense)             (None, 1)                 5         
                                                                 
Total params: 165
Trainable params: 165
Non-trainable params: 0
_________________________________________________________________


In [None]:
# try the math

# TRAINABLE PARAMETERS
# the generic RNN forumla is g × [h(h+i) + h]
g = 4 #LSTM has 4! these are the 4 FFNNs
h = 4 # hidden units within LSTM (you get to decide this! it's the red dots...)
i = 5 # this is number of features, not the lookback! the green dots... your 5 stocks
print(g*(h*(h+i) + h)) #answer = 160

# SHAPE
# (None, 4) where 4 are the number of hidden units
# so the time series is now just a flattened input of 4 going into a dense layer

# dense layer
# TRAINABLE PARAMETERS
# the dense layer is (i×h + h×o) + (h+o)
# but we ignore h since there is not output
h = 1 # the dense layer has a 1
i = 4 # these are all 4 inputs going into a dense layer
o = 0 # there is no output
print((i*h + h*o) + (h+o)) #answer = 5

160
5


## One GRU Layer (basic)
This is what a GRU looks like - note that it has three 'gates'.

![alt text](https://miro.medium.com/max/2214/1*lNNJOWnMjxLzdUnUQqwKcw.gif)

Caution: TensorFlow version difference!
Link: https://stackoverflow.com/questions/57318930/calculating-the-number-of-parameters-of-a-gru-layer-keras

Be careful of the bias term! Otherwise you need to add

In [None]:
# here is the example from the image
# and here is a related example
n_steps=50 # doesn't matter
n_features=3 # three stocks (FB, APPL, GOOG), three green dots
model = Sequential()
model.add((GRU(2, activation='relu', input_shape=(n_steps, n_features), # 2 red dots
               reset_after=False)))  # try this as False - helps math work out
model.add(Dense(1))
model.summary()

# if you don't say reset_after = False, you should add the bias terms
# which are bias_shape = (2, 3 * self.units)

Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 gru_3 (GRU)                 (None, 2)                 36        
                                                                 
 dense_10 (Dense)            (None, 1)                 3         
                                                                 
Total params: 39
Trainable params: 39
Non-trainable params: 0
_________________________________________________________________


In [None]:
# here is the math for that example
# try the math

# gru
# TRAINABLE PARAMETERS
# the general RNN layer is g × [h(h+i) + h]
g = 3 #GRU has 3 FFNNs (this is ALWAYS TRUE for GRU)
h = 2 # hidden units within GRU (RED DOTS)
i = 3 # this is number of features, not the lookback! (GREEN DOTS)
print('# of trainable parms in gru_1 = ', g*(h*(h+i) + h))

# SHAPE
# (None, 2) where 2 are the number of hidden units
# so the time series is now just a flattened input of 2 going into a dense layer

# dense layer
# TRAINABLE PARAMETERS
# the dense layer is (i×h + h×o) + (h+o)
# but we ignore h since there is not output
h = 1 # the dense layer has a 1
i = 2 # these are all 2 inputs going into a dense layer
o = 0 # there is no output
print((i*h + h*o) + (h+o))

# of trainable parms in gru_1 =  36
3


## One GRU Layer (advanced)

In [None]:
# and here is a related example
n_steps=30000000 # so many time steps!
n_features=5 # five stocks = five green dots = FB, GOOG, APPL, GE, AMD
model = Sequential()
model.add((GRU(4, activation='relu', input_shape=(n_steps, n_features), # 4 red dots
               reset_after=False)))  # try this as False for no extra bias
model.add(Dense(1))
model.summary()

# if you don't say reset_after = False, you should add the bias terms
# which are bias_shape = (2, 3 * self.units)

Model: "sequential_12"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 gru_4 (GRU)                 (None, 4)                 120       
                                                                 
 dense_11 (Dense)            (None, 1)                 5         
                                                                 
Total params: 125
Trainable params: 125
Non-trainable params: 0
_________________________________________________________________


In [None]:
# try the math

# gru
# TRAINABLE PARAMETERS
# the general RNN layer is g × [h(h+i) + h]
g = 3 #GRU has 3!
h = 4 # hidden units within GRU
i = 5 # this is number of features, not the lookback!
print('# of trainable parms in gru_1 = ', g*(h*(h+i) + h))
print(g*(h*(h+i) + h))

# SHAPE
# (None, 4) where 4 are the number of hidden units
# so the time series is now just a flattened input of 4 going into a dense layer

# dense layer
# TRAINABLE PARAMETERS
# the dense layer is (i×h + h×o) + (h+o)
# but we ignore h since there is not output
h = 1 # the dense layer has a 1
i = 4 # these are all 4 inputs going into a dense layer
o = 0 # there is no output
print((i*h + h*o) + (h+o))

# of trainable parms in gru_1 =  120
120
5


# Advanced (stacking, mixing and matching.)
We will cover this in future lectures - provided as FYI.


### Two GRU layers going into a SimpleRNN
This is the fun part! Since you are returning sequences - the output shape will be 3D... you are storing all outputs from the DNNs within each GRU layer!

This is where n_steps actually gets used in the output size. Don't forget to set `return_sequences=True` when stacking layers - except for the last one that goes into the Dense layer.

In [None]:
n_steps=30 # this matters for output shape when we return sequences!
n_features=5 # these are 5 stocks (FB, APPL, GE, NETFLIX, AMD)

model = Sequential()
model.add((GRU(4, return_sequences=True, activation='relu', input_shape=(n_steps, n_features))))
model.add((GRU(2, return_sequences=True, activation='relu')))
model.add((SimpleRNN(25, activation='relu')))
model.add(Dense(1))
model.summary()

Model: "sequential_13"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 gru_5 (GRU)                 (None, 30, 4)             132       
                                                                 
 gru_6 (GRU)                 (None, 30, 2)             48        
                                                                 
 simple_rnn_4 (SimpleRNN)    (None, 25)                700       
                                                                 
 dense_12 (Dense)            (None, 1)                 26        
                                                                 
Total params: 906
Trainable params: 906
Non-trainable params: 0
_________________________________________________________________


In [None]:
# try the math

# gru_1
# TRAINABLE PARAMETERS
# the general RNN layer formula is g × [h(h+i) + h]
g = 3 #GRU has 3!
h = 4 # hidden units within GRU
i = 5 # this is number of features, not the lookback!
print(g*(h*(h+i) + h)) #answer = 132

# SHAPE
# (None, 30, 4) where:
# 30 is the number of time steps (yes, it's appeared now!)
# 4 are the number of hidden units
# so the time series is now a derivative time series - it's a time series of
# dark red dots from the animation!

# gru_2
# TRAINABLE PARAMETERS
# the general RNN layer is g × [h(h+i) + h]
g = 3 #GRU has 3!
h = 2 # hidden units within GRU (we get to choose this)
i = 4 # this is number of features, not the lookback! this is inherited from previous layer
print(g*(h*(h+i) + h)) #answer = 132

# SHAPE
# (None, 4) where 4 are the number of hidden units
# so the time series is now just a flattened input of 4 going into a dense layer


# dense layer
# TRAINABLE PARAMETERS
# the dense layer is (i×h + h×o) + (h+o)
# but we ignore h since there is not output
h = 1 # the dense layer has a 1
i = 4 # these are all 4 inputs going into a dense layer
o = 0 # there is no output
print((i*h + h*o) + (h+o)) #answer = 5

120
42
5


### One SimpleRNN going into an LSTM
Left to students as an exercise.

In [None]:
n_steps=15 # lookback
n_features=30 # 30 different stocks

model = Sequential()
model.add((SimpleRNN(20, return_sequences=True, activation='relu', input_shape=(n_steps, n_features))))
model.add((LSTM(4, activation='relu'))) # see how there is NO RETURN SEQUENCES!!!
model.add(Dense(1))                             # you just keep the last hidden state
model.summary()

Model: "sequential_14"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_5 (SimpleRNN)    (None, 15, 20)            1020      
                                                                 
 lstm_4 (LSTM)               (None, 4)                 400       
                                                                 
 dense_13 (Dense)            (None, 1)                 5         
                                                                 
Total params: 1,425
Trainable params: 1,425
Non-trainable params: 0
_________________________________________________________________


### Monster #1
Left as an exercise for students.

In [None]:
n_steps=30
n_features=30
model = Sequential()
model.add((SimpleRNN(30, return_sequences=True, activation='relu', input_shape=(n_steps, n_features))))
model.add((GRU(30, return_sequences=True,activation='relu')))
model.add((LSTM(30,activation='relu')))
model.add((Dense(30,activation='relu')))
model.add(Dense(1))
model.summary()

Model: "sequential_15"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_6 (SimpleRNN)    (None, 30, 30)            1830      
                                                                 
 gru_7 (GRU)                 (None, 30, 30)            5580      
                                                                 
 lstm_5 (LSTM)               (None, 30)                7320      
                                                                 
 dense_14 (Dense)            (None, 30)                930       
                                                                 
 dense_15 (Dense)            (None, 1)                 31        
                                                                 
Total params: 15,691
Trainable params: 15,691
Non-trainable params: 0
_________________________________________________________________


### Monster #2
Left as an exercise for students.

In [None]:
n_steps=50
n_features=40
model = Sequential()
model.add((SimpleRNN(30, return_sequences=True, activation='relu', input_shape=(n_steps, n_features))))
model.add((GRU(20, return_sequences=True,activation='relu')))
model.add((GRU(25, return_sequences=True,activation='relu')))
model.add((GRU(22, return_sequences=True,activation='relu')))
model.add((GRU(21, return_sequences=True,activation='relu')))
model.add((SimpleRNN(10,activation='relu')))
model.add((Dense(50,activation='relu')))
model.add((Dense(50,activation='relu')))
model.add((Dense(50,activation='relu')))
model.add((Dense(50,activation='relu')))
model.add(Dense(1))
model.summary()

Model: "sequential_16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 simple_rnn_7 (SimpleRNN)    (None, 50, 30)            2130      
                                                                 
 gru_8 (GRU)                 (None, 50, 20)            3120      
                                                                 
 gru_9 (GRU)                 (None, 50, 25)            3525      
                                                                 
 gru_10 (GRU)                (None, 50, 22)            3234      
                                                                 
 gru_11 (GRU)                (None, 50, 21)            2835      
                                                                 
 simple_rnn_8 (SimpleRNN)    (None, 10)                320       
                                                                 
 dense_16 (Dense)            (None, 50)              