<img style="float: left;;" src='Figures/alinco.png' /></a>

# <center> <font color= #000047> Vanilla RNNs, GRUs and the `scan` function.</font>
    

In this notebook, you will learn how to define the forward method for vanilla RNNs and GRUs. Additionally, you will see how to define and use the function `scan` to compute forward propagation for RNNs.


In [1]:
import numpy as np
from numpy import random

An implementation of the `sigmoid` function is provided below so you can use it in this notebook.

In [2]:
def sigmoid(x): # Sigmoid function
    return 1.0 / (1.0 + np.exp(-x))

# Part 1: Forward method for vanilla RNNs and GRUs

In this part of the notebook, you'll see the implementation of the forward method for a vanilla RNN and you'll implement that same method for a GRU. For this excersice you'll use a set of random weights and variables with the following dimensions:

- Embedding size (`emb`) : 128
- Hidden state size (`h_dim`) : (16,1)

The weights `w_` and biases `b_` are initialized with dimensions (`h_dim`, `emb + h_dim`) and (`h_dim`, 1). We expect the hidden state `h_t` to be a column vector with size (`h_dim`,1) and the initial hidden state `h_0` is a vector of zeros.

In [4]:
random.seed(10)
emb = 128 #el tamaño de los embedings
h_dim = 16 #Dimensión de los estados ocultos
T = 256 # Numero de variables en la secuencia

h_0 = np.zeros((h_dim,1))

#Inicializar los pesos y los bias 

W1 = random.standard_normal((h_dim, emb + h_dim))
W2 = random.standard_normal((h_dim, emb + h_dim))
W3 = random.standard_normal((h_dim, emb + h_dim))

b1 = random.standard_normal((h_dim,1))
b2 = random.standard_normal((h_dim,1))
b3 = random.standard_normal((h_dim,1))

# La entrada la vamos a generar de manera aleatoria
X = random.standard_normal((T, emb, 1))
weights = [W1,W2,W3,b1,b2,b3]

## 1.1 Forward method for vanilla RNNs

The vanilla RNN cell is quite straight forward. Its most general structure is presented in the next figure: 

<img src="Figures/RNN.PNG" width="400"/>

As you saw in the lecture videos, the computations made in a vanilla RNN cell are equivalent to the following equations:

\begin{equation}
h^{<t>}=g(W_{h}[h^{<t-1>},x^{<t>}] + b_h)
\label{eq: htRNN}
\end{equation}
    
\begin{equation}
\hat{y}^{<t>}=g(W_{yh}h^{<t>} + b_y)
\label{eq: ytRNN}
\end{equation}

where $[h^{<t-1>},x^{<t>}]$ means that $h^{<t-1>}$ and $x^{<t>}$ are concatenated together. In the next cell we provide the implementation of the forward method for a vanilla RNN. 

In [17]:
#Funcion para la propagación de una celda de tipo RNN
def forward_V_RNN(inputs, weights):
    x, h_t = inputs
    #weights
    wh,_,_,bh,_,_ = weights
    
    #El siguiente estado oculto 
    h_t = np.dot(wh, np.concatenate(h_t, x) + bh)
    h_t = sigmoid(h_t)
    
    return h_t

As you can see, we omitted the computation of $\hat{y}^{<t>}$. This was done for the sake of simplicity, so you can focus on the way that hidden states are updated here and in the GRU cell.

## 1.2 Forward method for GRUs

A GRU cell have more computations than the ones that vanilla RNNs have. You can see this visually in the following diagram:

<img src="Figures/GRU.PNG" width="400"/>

As you saw in the lecture videos, GRUs have relevance $\Gamma_r$ and update $\Gamma_u$ gates that control how the hidden state $h^{<t>}$ is updated on every time step. With these gates, GRUs are capable of keeping relevant information in the hidden state even for long sequences. The equations needed for the forward method in GRUs are provided below: 

\begin{equation}
\Gamma_r=\sigma{(W_r[h^{<t-1>}, x^{<t>}]+b_r)}
\end{equation}

\begin{equation}
\Gamma_u=\sigma{(W_u[h^{<t-1>}, x^{<t>}]+b_u)}
\end{equation}

\begin{equation}
c^{<t>}=\tanh{(W_h[\Gamma_r*h^{<t-1>},x^{<t>}]+b_h)}
\end{equation}

\begin{equation}
h^{<t>}=\Gamma_u*c^{<t>}+(1-\Gamma_u)*h^{<t-1>}
\end{equation}

In the next cell, please implement the forward method for a GRU cell by computing the update `u` and relevance `r` gates, and the candidate hidden state `c`. 

In [18]:
def forward_GRU(inputs, weights):
    x, h_t = inputs
    #pesos de la celda GRU
    wu, wr, wc, wy, bu, br, bc, by = weights
    
    #Actualizar la celda
    Sigma_u = np.dot(wu, np.concatenate([h_t, x])) + bu
    Sigma_u = sigmoid(Sigma_u)
    
    #Compuerta relevante
    Sigma_r = np.dot(wr, np.concatenate([h_t, x])) + br
    Sigma_r = sigmoid(Sigma_r)
    
    #Estado oculto candidato
    c = np.dot(wc,np.concatenate([Sigma_r*h_t, x])) + bc
    c = np.tanh(c)
    
    #Nuevo estado oculto
    h_t = Sigma_u*c + (1-Sigma_u)*h_t
    
    y_t = np.dot(wy, np.concatenate(h_t, x) + by)
    y_t = sigmoid()
    
    return h_t, y_t

Run the following cell to check your implementation.

In [19]:
forward_GRU([X[1],h_0],  weights)

ValueError: not enough values to unpack (expected 8, got 6)

# Part 2: Implementation of the `scan` function

In the lectures you saw how the `scan` function is used for forward propagation in RNNs. It takes as inputs:

- `fn` : the function to be called recurrently (i.e. `forward_GRU`)
- `elems` : the list of inputs for each time step (`X`)
- `weights` : the parameters needed to compute `fn`
- `h_0` : the initial hidden state

`scan` goes through all the elements `x` in `elems`, calls the function `fn` with arguments ([`x`, `h_t`],`weights`), stores the computed hidden state `h_t` and appends the result to a list `ys`. Complete the following cell by calling `fn` with arguments ([`x`, `h_t`],`weights`).

In [20]:
def scan(fn, elems, weights, h_0=None):
    h_t = h_0
    ys = []
    
    for x in elems: 
        y, ht = fn([x,h_t], weights)
        ys.append(y)
    return ys,h_t

In [21]:
random.seed(10)
emb = 128 #el tamaño de los embedings
h_dim = 16 #Dimensión de los estados ocultos
T = 256 # Numero de variables en la secuencia

h_0 = np.zeros((h_dim,1))

#Inicializar los pesos y los bias 

W1 = random.standard_normal((h_dim, emb + h_dim))
W2 = random.standard_normal((h_dim, emb + h_dim))
W3 = random.standard_normal((h_dim, emb + h_dim))
W4 = random.standard_normal((h_dim, emb + h_dim))

b1 = random.standard_normal((h_dim,1))
b2 = random.standard_normal((h_dim,1))
b3 = random.standard_normal((h_dim,1))
b4 = random.standard_normal((h_dim,1))

# La entrada la vamos a generar de manera aleatoria
X = random.standard_normal((T, emb, 1))
weights = [W1,W2,W3,W4,b1,b2,b3,b4]

In [26]:
def forward_GRU(inputs, weights):
    x, h_t = inputs
    #pesos de la celda GRU
    wu, wr, wc, wy, bu, br, bc, by = weights
    
    #Actualizar la celda
    Sigma_u = np.dot(wu, np.concatenate([h_t, x])) + bu
    Sigma_u = sigmoid(Sigma_u)
    
    #Compuerta relevante
    Sigma_r = np.dot(wr, np.concatenate([h_t, x])) + br
    Sigma_r = sigmoid(Sigma_r)
    
    #Estado oculto candidato
    c = np.dot(wc,np.concatenate([Sigma_r*h_t, x])) + bc
    c = np.tanh(c)
    
    #Nuevo estado oculto
    h_t = Sigma_u*c + (1-Sigma_u)*h_t
    
    y_t = np.dot(wy, np.concatenate([h_t, x])) + by
    y_t = sigmoid(y_t)
    
    return h_t, y_t

In [27]:
forward_GRU([X[1],h_0],  weights)

(array([[-5.46717881e-05],
        [-2.01051221e-03],
        [ 7.38404514e-01],
        [ 4.06626731e-02],
        [-3.68213834e-14],
        [ 8.35937080e-02],
        [ 9.56615365e-01],
        [ 2.79540419e-04],
        [ 6.82018364e-16],
        [-9.98789512e-01],
        [-9.99263559e-01],
        [-9.51715245e-01],
        [ 4.15343481e-04],
        [ 1.52341517e-01],
        [ 9.92411835e-01],
        [-1.25969214e-01]]),
 array([[9.99999723e-01],
        [4.59058123e-03],
        [1.12000678e-08],
        [5.53938178e-03],
        [9.99999995e-01],
        [3.57987302e-02],
        [9.99999696e-01],
        [8.32964217e-01],
        [4.82861689e-01],
        [9.99999495e-01],
        [1.24257596e-04],
        [4.38120418e-02],
        [9.78201400e-01],
        [2.60661708e-10],
        [9.99828333e-01],
        [9.98022744e-01]]))

# Part 3: Comparison between vanilla RNNs and GRUs

You have already seen how forward propagation is computed for vanilla RNNs and GRUs. As a quick recap, you need to have a forward method for the recurrent cell and a function like `scan` to go through all the elements from a sequence using a forward method. You saw that GRUs performed more computations than vanilla RNNs, and you can check that they have 3 times more parameters. In the next two cells, we compute forward propagation for a sequence with 256 time steps (`T`) for an RNN and a GRU with the same hidden state `h_t` size (`h_dim`=16).  

In [29]:
# vanilla RNNs
from time import perf_counter
tic = perf_counter()
ys, h_T = scan(forward_GRU, X, weights, h_0)
toc = perf_counter()
Run_time = (toc-tic)*1000
print(f'se tomó {Run_time} ms en ejecutar el metodo forward_GRU')


se tomó 9.91020000037679 ms en ejecutar el metodo forward_GRU


In [31]:
X.shape

(256, 128, 1)

In [None]:
# GRUs


As you were told in the lectures, GRUs take more time to compute (However, sometimes, although a rare occurrence, Vanilla RNNs take more time. Can you figure out what might cause this ?). This means that training and prediction would take more time for a GRU than for a vanilla RNN. However, GRUs allow you to propagate relevant information even for long sequences, so when selecting an architecture for NLP you should assess the tradeoff between computational time and performance. 