#### **Welcome to Assignment 4 on Deep Learning for Computer Vision.**
In this assignment you will get a chance to implement LSTM cell from Scratch and Usage of Recurrent Neural Network for 1D time series Prediction task .

#### **Instructions**
1. Use Python 3.x to run this notebook
3. Write your code only in between the lines 'YOUR CODE STARTS HERE' and 'YOUR CODE ENDS HERE'.
you should not change anything else code cells, if you do, the answers you are supposed to get at the end of this assignment might be wrong.
4. Read documentation of each function carefully.

## Question 1:

Given a sequence of values of a 1D input time series from time t = 1 to t = 5, predict the value of the time series at t = 6 using RNN.

Here we trained an RNN in such a way that, given values of input time series from t = 1 to t=i ; it will predict the value at t= i+1.

Hint : Design an RNN using pytorch's nn.RNN to create an RNN layer , then add a fully-connected layer to get the required output size.

Choose 32 as the number of features in the RNN output and in the hidden state. Also, choose number of layers to be 1 to make up the RNN, typically such number varies depending on different tasks. The value greater than 1 means that you'll create a stacked RNN. Also, use "batch_first =True". Here, "batch_first" implies whether or not the input/output of the RNN will have the batch_size as the first dimension (batch_size, seq_length, hidden_dim). 

Which of the following options is True ?

1.   -2.3214
2.   -2.2879
3.   -2.3118
4.   -2.2993



In [None]:
import torch
from torch import nn
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline


## Fixing the seed for Reproducibility
np.random.seed(1)
torch.manual_seed(1)

## Define 1D input time series, which spans from t= 1 to t=6.
input_series = np.random.randn(6,1)


class RNN(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(RNN, self).__init__()

        ### YOUR CODE STARTS HERE

        self.hidden_dim=hidden_dim

        # define an RNN with specified parameters
        # batch_first means that the first dim of the input and output will be the batch_size
        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
        
        # last, fully-connected layer
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, x, hidden):
        # x (batch_size, seq_length, input_size)
        # hidden (n_layers, batch_size, hidden_dim)
        # r_out (batch_size, time_step, hidden_size)
        batch_size = x.size(0)
        
        # get RNN outputs
        r_out, hidden = self.rnn(x, hidden)
        # shape output to be (batch_size*seq_length, hidden_dim)
        r_out = r_out.view(-1, self.hidden_dim)  
        
        # get final output 
        output = self.fc(r_out)
        
        return output, hidden
        ### YOUR CODE ENDS HERE

# decide on hyperparameters
input_size=1    ## 1D input
output_size=1   ## 1D output
hidden_dim=32  ## Hidden state feature dimension of RNN
n_layers=1     ## No. of stacked layers in RNN

# instantiate an RNN
rnn = RNN(input_size, output_size, hidden_dim, n_layers)

# MSE loss and Adam optimizer with a learning rate of 0.01
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(rnn.parameters(), lr=0.01)

# train the RNN
def train(rnn, n_steps, print_every):
    
    # initialize the hidden state
    hidden = None      
    
    for batch_i, step in enumerate(range(n_steps)):
        # defining the training data 
        x = input_series[:-1]
        y = input_series[1:]
        
        # convert data into Tensors
        x_tensor = torch.Tensor(x).unsqueeze(0) # unsqueeze gives a 1, batch_size dimension
        y_tensor = torch.Tensor(y)

        # outputs from the rnn
        prediction, hidden = rnn(x_tensor, hidden)

        ## Representing Memory ##
        # make a new variable for hidden and detach the hidden state from its history
        # this way, we don't backpropagate through the entire history
        hidden = hidden.data

        # calculate the loss
        loss = criterion(prediction, y_tensor)
        # zero gradients
        optimizer.zero_grad()
        # perform backprop and update weights
        loss.backward()
        optimizer.step()

        # display loss and predictions
        if batch_i%print_every == 0:  
            print (batch_i)      
            print('Loss: ', loss.item())
            print ('Predicted Value: ', prediction.data.numpy().flatten())
            print ('True Value: ', y_tensor.data.numpy().flatten())
            
    
    return rnn,prediction[-1]

# train the rnn and monitor results
trained_rnn,final_prediction = train(rnn, n_steps = 75, print_every= 11)
print ('Final predicted value of input time series at t=6: ',final_prediction.item())

0
Loss:  1.5117348432540894
Predicted Value:  [-0.18635052 -0.00746428  0.05009793  0.05678663 -0.02306957]
True Value:  [-0.6117564  -0.5281718  -1.0729686   0.86540765 -2.3015387 ]
11
Loss:  0.134530708193779
Predicted Value:  [ 0.00849447 -0.31584212 -1.2586135   1.2727582  -2.5075972 ]
True Value:  [-0.6117564  -0.5281718  -1.0729686   0.86540765 -2.3015387 ]
22
Loss:  0.19528238475322723
Predicted Value:  [-1.2275374  -0.65807223 -0.54608953  0.45658743 -1.933278  ]
True Value:  [-0.6117564  -0.5281718  -1.0729686   0.86540765 -2.3015387 ]
33
Loss:  0.08917436003684998
Predicted Value:  [-0.07242055 -0.87209296 -1.0512483   0.8208605  -2.486608  ]
True Value:  [-0.6117564  -0.5281718  -1.0729686   0.86540765 -2.3015387 ]
44
Loss:  0.03353014588356018
Predicted Value:  [-0.893936  -0.4294107 -1.1992261  0.8064023 -2.0589504]
True Value:  [-0.6117564  -0.5281718  -1.0729686   0.86540765 -2.3015387 ]
55
Loss:  0.009381274692714214
Predicted Value:  [-0.52645415 -0.6100819  -0.9821840

## Question 2:

Given a Multivariate input time sequence and all the trainable parameters of LSTM Cell; Implement all the functionalities of the LSTM cell in order to predict the hidden state and output at time=t; given LSTM "cell state" at previous time step (t= t-1), LSTM "hidden state" at previous time step ( t= t-1) and the input at time=t. Hint : Follow the following sets of equation for implementing the functionality of LSTM Cell.

Forget GATE: $f_{t} = \sigma(W_{f}[ a_{t-1} ; x_{t}]  + b_{f}) $ (Note that ";" denotes contatenation operation.)

Update GATE: $i_{t} = \sigma(W_{i}[ a_{t-1} ; x_{t} ] + b_i )$

Memory GATE: $\tilde{c}_{t} = tanh(W_c[ a_{t-1} ; x_{t} ] + b_c )$
            update step -> $c_{t} =  f_{t} * c_{t-1} + i_{t} * \tilde{c}_{t}$  (This operation determines how much information to keep from past and how much to add from current step information)

Output GATE: $o_{t} = \sigma(W_o [ a_{t-1} ; x_{t} ] + b_o)$
           Final Output: $a_{t} = o_{t}*tanh(c_t) $
( Note: For implementing "tanh" operation; use numpy.tanh libary function)


Compute the value of a specific component of LSTM cell "Output" (y), i.e. y[1, 3, 4]? 

Which of the following options is true for the value of y[1, 3, 4]?


1.   0.1483
2.   0.1724
3.   0.2108
4.   0.2471



In [None]:
import numpy as np
np.random.seed(2)

## Function implements Sigmoid Activation
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

## Function implements Softmax Activation
def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0)

## Function implements LSTM "forward pass" of a single time step..i.e. given x at time step t, hidden state 
##at previous time step Memory state at previous time step , this function computes predicted output y at time step t. 

def lstm_forward_pass(xt, a_prev, c_prev, parameters):
    """
    Implement a single forward step of the LSTM-cell 

    Arguments:
    xt -- your input data at timestep "t"
    a_prev -- Hidden state at timestep "t-1"
    c_prev -- Memory state at timestep "t-1"

    # Trainable Parameters of a LSTM cell
    Wf -- Weight matrix of the forget gate; bf -- Bias of the forget gate
    Wi -- Weight matrix of the update gate; bi -- Bias of the update gate
    Wc -- Weight matrix of the first "tanh"; bc --  Bias of the first "tanh"
    Wo -- Weight matrix of the output gate; bo --  Bias of the output gate
    Wy -- Weight matrix relating the hidden-state to the output; by -- Bias relating the hidden-state to the output
                        
    The LSTM Cell MUST Returns:
    a_next -- next hidden state
    c_next -- next memory state
    yt_pred -- LSTM output prediction at timestep "t"
    cache -- tuple of values needed for the backward pass, contains (a_next, c_next, a_prev, c_prev, xt, parameters)
    
    """

    # Retrieve parameters from "parameters"
    Wf = parameters["Wf"]; bf = parameters["bf"]
    Wi = parameters["Wi"]; bi = parameters["bi"]
    Wc = parameters["Wc"]; bc = parameters["bc"]
    Wo = parameters["Wo"]; bo = parameters["bo"]
    Wy = parameters["Wy"]; by = parameters["by"]
    
    # Retrieve dimensions from shapes of xt and Wy
    n_x, m = xt.shape
    n_y, n_a = Wy.shape

    ### START CODE HERE ###
    # Concatenate a_prev and xt
    concat = np.concatenate((a_prev, xt), axis=0)

    # Compute values for ft, it, cct, c_next, ot, a_next using the formulae 
    ft = sigmoid(Wf @ concat + bf)
    it = sigmoid(Wi @ concat + bi)
    cct = np.tanh(Wc @ concat + bc)
    c_next = ft * c_prev + it * cct
    ot = sigmoid(Wo @ concat + bo)
    a_next = ot * np.tanh(c_next)
    
    # Compute prediction of the LSTM cell 
    yt_pred = softmax(Wy @ a_next + by)
    ### END CODE HERE ###

    # store values needed for backward propagation in cache
    cache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters)

    return a_next, c_next, yt_pred, cache

def lstm_forward(x, a0, parameters):
    """
    Implement the forward propagation of the recurrent neural network using an LSTM-cell.

    Arguments:
    x -- Input data for every time-step
    a0 -- Initial hidden state of LSTM cell
    parameters 
    Wf -- Weight matrix of the forget gate ;bf -- Bias of the forget gate
    Wi -- Weight matrix of the update gate ;bi -- Bias of the update gate
    Wc -- Weight matrix of the first "tanh";bc -- Bias of the first "tanh"
    Wo -- Weight matrix of the output gate; bo -- Bias of the output gate
    Wy -- Weight matrix relating the hidden-state to the output; by -- Bias relating the hidden-state to the output
                        
    This Function call MUST Returns:
    a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
    y -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
    caches -- tuple of values needed for the backward pass, contains (list of all the caches, x)
    """

    # Initialize "caches", which will track the list of all the caches
    caches = []
    
    ### START CODE HERE ###
    # Retrieve dimensions from shapes of x and parameters['Wy'] (≈2 lines)
    n_x, m, T_x = x.shape
    n_y, n_a = parameters['Wy'].shape
    
    # initialize "a", "c" and "y" with zeros
    a = np.zeros(shape=(n_a, m, T_x))
    c = np.zeros_like(a)
    y = np.zeros(shape=(n_y, m, T_x))
    
    # Initialize a_next and c_next
    a_next = a0
    c_next = np.zeros_like(a_next)
    
    # loop over all time-steps
    for t in range(T_x):
        # Update next hidden state, next memory state, compute the prediction, get the cache 
        a_next, c_next, yt, cache = lstm_forward_pass(x[:, :, t], a_next, c_next, parameters)
        # Save the value of the new "next" hidden state in a
        a[:,:,t] = a_next
        # Save the value of the prediction in y 
        y[:,:,t] = yt
        # Save the value of the next cell state 
        c[:,:,t] = c_next
        # Append the cache into caches
        caches.append(cache)
        
    ### END CODE HERE ###
    
    # store values needed for backward propagation in cache
    caches = (caches, x)

    return a, y, c, caches

# Input time sequence
x = np.random.randn(3,10,7)

# Initial Hidden state of LSTM
a0 = np.random.randn(5,10)

# Weight and Bias Parameters of FORGET gate
Weight_f = np.random.randn(5, 8); bias_f = np.random.randn(5,1)

# Weight and Bias Parameters of UPDATE gate
Weight_i = np.random.randn(5, 8); bias_i = np.random.randn(5,1)

# Weight and Bias Parameters of OUTPUT gate
Weight_o = np.random.randn(5, 8); bias_o = np.random.randn(5,1)

# Weight and Bias Parameters of MEMORY gate (updating the cell)
Weight_c = np.random.randn(5, 8); bias_c = np.random.randn(5,1)

# Weight and bias for transforming hidden state output to final LSTM output for downstream application
Weight_y = np.random.randn(2,5); bias_y = np.random.randn(2,1)

LSTM_parameters = {"Wf": Weight_f, "Wi": Weight_i, "Wo": Weight_o, "Wc": Weight_c, "Wy": Weight_y, "bf": bias_f, "bi": bias_i, "bo": bias_o, "bc": bias_c, "by": bias_y}

a, y, c, caches = lstm_forward(x, a0, LSTM_parameters)

## Print the specific component value of LSTM cell "Output" (y) ;i.e. y[1,3,4]
print("y[1][3][4] =", y[1][3][4])
print("y.shape = ", y.shape)

## Print the specific component value of LSTM "hidden state" Output (a) ;i.e. a[2,1,5]
print("a[2][1][5] = ", a[2][1][5])
print("a.shape = ", a.shape)


y[1][3][4] = 0.21083866421151456
y.shape =  (2, 10, 7)
a[2][1][5] =  0.023019435434183624
a.shape =  (5, 10, 7)


## Question 3:

Using the exact same code as given for question 2, find the value of a specific component of LSTM hidden state output(a) ; i.e. a[2,1,5].

Which of the values is true for a[2, 1, 5]?



1.   0.0140
2.   0.0230
3.   0.0272
4.   0.0312


