## Building a Recurrent Neural Network from scratch using numpy.

Author: Abhishek Mishra, Graduate Student, Max Planck Institute of Neurobiology, Munich

This notebook is based on Sequence Models course on coursera taught by **Andrew Ng**, as part of Deep Learning Specialization.  

In [9]:
import numpy as np
from rnn_module import *

## 1 - Forward propagation for basic Recurrent Neural Network

The basic RNN that we will implement has the structure below. Here, $T_x = T_y$.

<img src="images/RNN_1.png" style="width:500;height:300px;">
<caption><center> **Figure 1**: Basic RNN model </center></caption>

## 1.1 - RNN Cell

A RNN can be seen as the repitition of a single cell. First we will implement the computation for a single time step. Later we will use this function over loop for all time steps to calculate forward propagation.

<img src="images/rnn_step_forward.png" style="width:700px;height:300px;">
<caption><center> **Figure 2**: Basic RNN cell. Takes as input $x^{\langle t \rangle}$ (current input) and $a^{\langle t-1\rangle}$ (previous hidden state containing information from the past), and outputs $a^{\langle t \rangle}$ which is given to the next RNN cell and also used to predict $y^{\langle t \rangle}$</caption></center>

In [10]:
def rnn_cell_forward(xt, a_prev, parameters):
    Wax = parameters["Wax"]
    Waa = parameters["Waa"]
    Wya = parameters["Wya"]
    ba = parameters["ba"]
    by = parameters["by"]
    
    a_next = np.tanh(np.matmul(Wax,xt)+np.matmul(Waa,a_prev)+ba)
    yt_pred = softmax(np.matmul(Wya,a_prev)+by)
    
    cache = (a_next, a_prev, xt, parameters)
    
    return a_next, yt_pred, cache

In [13]:
np.random.seed(1)
xt = np.random.randn(3,10)
a_prev = np.random.randn(5,10)
Waa = np.random.randn(5,5)
Wax = np.random.randn(5,3)
Wya = np.random.randn(2,5)
ba = np.random.randn(5,1)
by = np.random.randn(2,1)
parameters = {"Waa":Waa, "Wax":Wax, "Wya":Wya, "ba":ba,"by":by}

a_next, yt_pred, cache = rnn_cell_forward(xt, a_prev, parameters)
print "a_next[4] = ", a_next[4]
print "a_next.shape = ", a_next.shape
print "yt_pred[1] =", yt_pred[1]
print "yt_pred.shape = ", yt_pred.shape

a_next[4] =  [ 0.59584544  0.18141802  0.61311866  0.99808218  0.85016201  0.99980978
 -0.18887155  0.99815551  0.6531151   0.82872037]
a_next.shape =  (5, 10)
yt_pred[1] = [ 0.11805736  0.00150244  0.16286607  0.99959053  0.4818786   0.00199492
  0.00664413  0.95503173  0.04582838  0.99990195]
yt_pred.shape =  (2, 10)


## 1.2 - RNN forward pass

Basic RNN is just a repetition of single RNN cell.

<img src="images/rnn.png" style="width:800px;height:300px;">
<caption><center> **Figure 3**: Basic RNN. The input sequence $x = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle})$  is carried over $T_x$ time steps. The network outputs $y = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle})$. </center></caption>

In [16]:
def rnn_forward(x, a0, parameters):
    caches = []
    n_x, m, T_x = x.shape
    n_y, n_a = parameters["Wya"].shape
    
    a = np.zeros((n_a, m, T_x))
    y_pred = np.zeros((n_y, m, T_x))
    
    a_next = a0
    for t in range(T_x):
        a_next, yt_pred, cache = rnn_cell_forward(x[:,:,t],a_next,parameters)
        a[:,:,t] = a_next
        y_pred[:,:,t] = yt_pred
        caches.append(cache)
        
    caches = (caches, x)
    
    return a, y_pred, caches

In [17]:
np.random.seed(1)
x = np.random.randn(3,10,4)
a0 = np.random.randn(5,10)
Waa = np.random.randn(5,5)
Wax = np.random.randn(5,3)
Wya = np.random.randn(2,5)
ba = np.random.randn(5,1)
by = np.random.randn(2,1)
parameters = {"Waa":Waa, "Wax":Wax,"Wya":Wya, "ba":ba,"by":by}

a, y_pred, caches = rnn_forward(x, a0, parameters)
print "a[4][1] = ", a[4][1]
print "a.shape = ", a.shape
print "y_pred[1][3] = ",y_pred[1][3]
print "y_pred.shape = ", y_pred.shape
print "caches[1][1][3] = ", caches[1][1][3]
print "len(caches) = ", len(caches)

a[4][1] =  [-0.99999375  0.77911235 -0.99861469 -0.99833267]
a.shape =  (5, 10, 4)
y_pred[1][3] =  [ 0.96251083  0.79560373  0.86224861  0.11118257]
y_pred.shape =  (2, 10, 4)
caches[1][1][3] =  [-1.1425182  -0.34934272 -0.20889423  0.58662319]
len(caches) =  2


#### We have implemented the forward propagation for basic RNN.