# CS - 587 : Exercise 5a ~ RNN formulation
## Scope:
The goal of this assignment is to get familiar with the structure of a Recurrent Neural Network (RNN) cell and model.

In your assignment you will implement step-by-step the structure and functionalities of an RNN cell and model. In this part, you will only need to define the forward computations, ignoring backpropagation (for now).

In [8]:
import numpy as np
import os
import random

# **Utility Functions**
You should use them, but you don't need to modify them
<a id='softmax_reference'></a>

In [9]:
def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)


def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# **RNN cell**
- Compute the hidden state (output of RNN cell), $\alpha^{\langle t\rangle}$.
- Use the hidden state to compute the prediction, $\hat{y}^{\langle t\rangle}$ . Use the [soft-max that is imported in the script](#softmax_reference).
- Store $(\alpha^{\langle t\rangle}, \alpha^{\langle t-1\rangle}, x^{\langle t\rangle}, parameters)$ in a python tuple, and return $\alpha^{\langle t\rangle}, \hat{y}^{\langle t\rangle}$ and the tuple.

In [10]:
#########################################################################################
# Task 1   TODO:                                                                        #
# 1. Define the operations that update 'a' at each RNN step                             #
#    (i.e. the value of 'a' that exists in the RNN cell)                                #
# 2. Store the value in 'a_next'                                                        #
# 3. Define the prediction of the cell ('y_pred')                                       #
# 4. Allowed functions that you can use are:                                            #
#         np.tanh,       np.dot,       our softmax (see above)                          #
# 5. Store computed values in a tuple with order 'a_next', 'a_prev', 'xt', 'parameters' #
#########################################################################################

def rnn_cell_forward(xt, a_prev, parameters):
    """
    Implements a single forward step of the RNN-cell as described in Figure (1) in the pdf.

    Arguments:
        xt          - your input data at timestep "t"  ~ numpy array of shape (n_x, m)
        a_prev      - Hidden state at timestep "t-1"   ~ numpy array of shape (n_a, m)
        parameters  - python dictionary containing:
            + Wax - Weight matrix multiplying the input          ~ numpy array of shape (n_a, n_x)
            + Waa - Weight matrix multiplying the hidden state   ~ numpy array of shape (n_a, n_a)
            + Wya - Weight matrix relating the hidden-state to the output ~ numpy array of shape (n_y, n_a)
            + ba - Bias  ~ numpy array of shape (n_a, 1)
            + by - Bias relating the hidden-state to the output  ~ numpy array of shape (n_y, 1)
    Returns:
        a_next  - next hidden state             ~ numpy array of shape (n_a, m)
        yt_pred - prediction at timestep "t"    ~ numpy array of shape (n_y, m)
        cache   - tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
    """
    
    # Retrieve parameters from "parameters"
    Wax = parameters["Wax"]
    Waa = parameters["Waa"]
    Wya = parameters["Wya"]
    ba = parameters["ba"]
    by = parameters["by"]
    
    ##################### START CODE HERE (≈3-4 lines) #####################
    
    # 1. compute next activation state using the formula in the RNN cell figure
    a_next = np.tanh(np.dot(Wax, xt) + np.dot(Waa, a_prev) + ba)

    # 2. compute output of the current cell using the formula given above
    yt_pred = softmax(np.dot(Wya, a_next) + by)
    
    # 3. store values you need for backward propagation in cache
    cache = (a_next, a_prev, xt, parameters)
  
    ############################ END CODE HERE #############################
    
    return a_next, yt_pred, cache

# **RNN model**

A simple implementation of the forward pass:<br>

You will fill in the `rnn_forward` function with the appropriate code to perform the following steps (in the order that they are mentioned):
- Create two arrays of zeros with shapes
    - $(n_a , m, T_x) \to$ to store the hidden states (name it `a`)
    - $(n_y , m, T_x) \to$ to store the predictions (name it `y_pred`)
- Initialize the 2D hidden state `a_next` by setting it equal to the initial hidden state, `a0`.
- For each time-step t:
    - Get the corresponding data to be fed to the RNN cell. This will be a slice of $x$, with $x$ having a shape of $(n_x, m, T_x)$.
    - Update the hidden state $\alpha^{\langle t\rangle}$ using the **a_next**, the prediction $\hat{y}^{\langle t\rangle}$ and the tuple that stores the values you need for backward propagation, by running `rnn_cell_forward`. Note that `a` has shape $(n_a , m, T_x)$.
    - Store the current hidden state in the 3D tensor `a`.
    - Store the current $\hat{y}^{\langle t\rangle}$ prediction (named `yt_pred` in your code) in the tensor $\hat{y}_{pred}$.
    - Add the current values you need for backward propagation (that are store in your tuple), to the list caches.

In [11]:
#####################################################################################
# Task 2   TODO:                                                                    #
# 1. Define the tensors that store the hidden states a, and the predictions y       #
# FOR EACH TIMESTEP                                                                 #
# 2  Feed the current input x to the RNN cell (rnn_cell_forward)                    #
# 3. Use the outputs of 2.2 to define the current prediction of the cell            # 
#    and the next hidden state a_next.                                              # 
#                                                                                   #
# 4. Store the  prediction and next hidden state values in the 3D tensors a, y_pred #
# 5. Append the tuple that has the values of all the parameters of the cell into    #
#    the list named caches                                                          #
#####################################################################################

def rnn_forward(x, a0, parameters):
    """
    Implement the forward propagation of the recurrent neural network described in Figure (2) in the pdf.

    Arguments:
        x  - Input data for every time-step  ~ numpy array of shape (n_x, m, T_x)
        a0 - Initial hidden state            ~ numpy array of shape (n_a, m)
        parameters - python dictionary containing:
            + Waa - Weight matrix multiplying the hidden state   ~ numpy array of shape (n_a, n_a)
            + Wax - Weight matrix multiplying the input          ~ numpy array of shape (n_a, n_x)
            + Wya - Weight matrix relating the hidden-state to the output ~ numpy array of shape (n_y, n_a)
            + ba - Bias   ~ numpy array of shape (n_a, 1)
            + by - Bias relating the hidden-state to the output  ~ numpy array of shape (n_y, 1)

    Returns:
        a      - Hidden states for every time-step    ~ numpy array of shape (n_a, m, T_x)
        y_pred - Predictions for every time-step      ~ numpy array of shape (n_y, m, T_x)
        caches - tuple of values needed for the backward pass, contains (list of caches, x)
    """
    
    # Initialize "caches" which will contain the list of all caches
    caches = []
    
    # Retrieve dimensions from shapes of x and parameters["Wya"]
    n_x, m, T_x = x.shape
    n_y, n_a = parameters["Wya"].shape
    
    ############################ START CODE HERE ###########################
    
    # 1. initialize "a" and "y_pred" with zeros
    a = np.zeros((n_a, m, T_x))
    y_pred = np.zeros((n_y, m, T_x))

    # 2. Initialize a_next
    a_next = a0

    # 3. loop over all time-steps T_x
    for t in range(T_x):
        xt = x[:, :, t]
        a_next, yt_pred, cache = rnn_cell_forward(xt, a_next, parameters)
        a[:, :, t] = a_next
        y_pred[:, :, t] = yt_pred
        caches.append(cache)

    ############################ END CODE HERE #############################
    
    # store values needed for backward propagation in cache
    caches = (caches, x)
    
    return a, y_pred, caches


**Check if everything works**:

`a[4][1]` should have `[-0.99999375  0.77911235 -0.99861469 -0.99833267]`

`y_pred[1][3]` should have `[ 0.79560373  0.86224861  0.11118257  0.81515947]`

In [12]:
np.random.seed(1)
x_tmp = np.random.randn(3, 10, 4)
a0_tmp = np.random.randn(5, 10)
parameters_tmp = {}
parameters_tmp['Waa'] = np.random.randn(5,5)
parameters_tmp['Wax'] = np.random.randn(5,3)
parameters_tmp['Wya'] = np.random.randn(2,5)
parameters_tmp['ba'] = np.random.randn(5,1)
parameters_tmp['by'] = np.random.randn(2,1)

a_tmp, y_pred_tmp, caches_tmp = rnn_forward(x_tmp, a0_tmp, parameters_tmp)
print(f"a[4][1] = {a_tmp[4][1]}\n")

print(f"y_pred[1][3] = {y_pred_tmp[1][3]}")

a[4][1] = [-0.99999375  0.77911235 -0.99861469 -0.99833267]

y_pred[1][3] = [0.79560373 0.86224861 0.11118257 0.81515947]


--- 
# **BONUS part** (up to 10% bonus)

Implement a GRU cell, as shown in the figure in the assignment pdf.

Good luck :)

In [6]:
def GRU_cell_forward(xt, a_prev, parameters):
    """
    Implement a single forward step of the GRU-cell as described in Figure in your assignment.

    Arguments:
        xt     - your input data at timestep "t"  ~ numpy array of shape (n_x, m)
        a_prev - Hidden state at timestep "t-1"   ~ numpy array of shape (n_a, m)
    
        parameters - python dictionary containing:
            + Wz - Weight matrix of the input filter gate    ~ numpy array of shape (n_a, n_a + n_x)
            + Wr - Weight matrix of the forget/reset gate    ~ numpy array of shape (n_a, n_a + n_x)
            + Wh - Weight matrix of the update gate          ~ numpy array of shape (n_a, n_a + n_x)
            + Wy - Weight matrix relating the hidden-state to the output ~ numpy array of shape (n_y, n_a)
            + by - Bias relating the hidden-state to the output          ~ numpy array of shape (n_y, 1)

    Returns:
        a_next  - next hidden state           ~ numpy array of shape (n_a, m)
        yt_pred - prediction at timestep "t"  ~ numpy array of shape (n_y, m)
        cache   - tuple of values needed for the backward pass, contains (a_next, c_next, a_prev, xt, parameters)
    
    """
    # Retrieve parameters from "parameters"
    Wz = parameters["Wz"] # input filter gate weight
  
    Wr = parameters["Wr"] # update reset weight (notice the variable name)
    Wh = parameters["Wh"] # update hidden weight (notice the variable name)
    
    Wy = parameters["Wy"] # prediction weight
    by = parameters["by"]
    
    # Retrieve dimensions from shapes of xt and Wy
    n_x, m = xt.shape
    n_y, n_a = Wy.shape

    ############################ START CODE HERE ###########################
    # 1. Concatenate 'a_prev' and 'xt' (≈1-3 lines)
  
    # 2. Compute values for 'zt', 'rt', 'ht_sl', 'a_next' using the given formulas of the figure (≈4-6 lines)

    # 3. Compute prediction of the GRU cell (≈1 line)
    
    ############################ END CODE HERE #############################
    
    # store values needed for backward propagation in cache
    cache = (a_next, a_prev, zt, rt, ht_sl, xt, parameters)

    return a_next, yt_pred, cache

**Test your implementation**

You should get:
- `a_next[4]` = `[-0.62573656  0.16011555  0.53546141 -0.59492405  0.3480744   0.13205611]`

- `a_next.shape` = `(5, 6)`

- `yt[1]` = `[0.4316982  0.02887545 0.52224397 0.21755424 0.04521213 0.36990543]`

- `yt.shape` = `(2, 6)`

- `cache[1][3]` = `[-1.11731035  0.2344157   1.65980218  0.74204416 -0.19183555 -0.88762896]`

- `len(cache)` = `7`

In [7]:
np.random.seed(1)
xt_tmp = np.random.randn(3,6)
a_prev_tmp = np.random.randn(5,6)

parameters_tmp = {}
parameters_tmp['Wz'] = np.random.randn(5, 5+3)

parameters_tmp['Wr'] = np.random.randn(5, 5+3)
parameters_tmp['Wh'] = np.random.randn(5, 5+3)


parameters_tmp['Wy'] = np.random.randn(2,5)
parameters_tmp['by'] = np.random.randn(2,1)

a_next_tmp, yt_tmp, cache_tmp = GRU_cell_forward(xt_tmp, a_prev_tmp, parameters_tmp)
print(f"a_next[4] = {a_next_tmp[4]}")
print(f"a_next.shape = {a_next_tmp.shape}\n")

print(f"yt[1] = {yt_tmp[1]}")
print(f"yt.shape = {yt_tmp.shape}\n")

print(f"cache[1][3] = {cache_tmp[1][3]}")
print(f"len(cache) = {len(cache_tmp)}")

NameError: name 'a_next' is not defined