## Warm-up 1 - Building an RNN from scratch

In [None]:
#Import dependencies. We'll only be using numpy for this part!
import numpy as np

In this problem, you'll be building the forward pass for a simple RNN layer from scratch, using concepts you've seen in the note and the slides. We've implemented the Backpropagation Through Time for you, but we'd recommend going through the code and trying to understand what exactly it's doing.

In [None]:
class RNN(): 
    
    def __init__(self, input_dim, output_dim, hidden_dim = 10): 
        #Initialize the three weight matrices here, use np.random.rand and normalize by a factor of 1000
        #TODO:
        self.Wa = np.random.rand(hidden_dim, hidden_dim)/1000
        self.Wx = np.random.rand(hidden_dim, input_dim)/1000
        self.Wy = np.random.rand(output_dim, hidden_dim)/1000
        
        #keep track of previous inputs
        self.prev_inps = 0
        
    #TODO: implement softmax activation using numpy
    def softmax(self, x): 
        return np.exp(x) / np.sum(np.exp(x))
    
    def forward_pass(self, inputs):
        #Code the forward pass for an RNN
        #TODO: 
        a = np.zeros(self.Wa.shape[0])
        
        self.prev_inps = inputs 
        self.prev_a_s = {0 : a}
        
        #forward steps for the RNN, compute a_t and y_t 
        for key, input_vec in enumerate(inputs): 
            a_key = np.tanh(self.Wa@a + self.Wx@input_vec)
            self.prev_a_s[key + 1] = a_key
        
        y = self.softmax(self.Wy@a_key)
            
        return y, a_key
    
    
    #Backprop Through Time using Cross Entropy loss function
    def backward_pass(self, dy): 

        n = len(self.prev_inps)
        dWy = np.outer(dy, self.prev_a_s[n])

        dWa= np.zeros(self.Wa.shape)
        dWx = np.zeros(self.Wx.shape)
        d_a = self.Wy.T @ dy

        # Backpropagate through time
        for t in reversed(range(n)):
            temp = ((1 - self.prev_a_s[t + 1] ** 2) * d_a)
            dWa += temp @ self.prev_a_s[t + 1].T
            dWx += np.outer(temp, self.prev_inps[t].T)
            d_a = self.Wa @ temp
        
        return dWa, dWx, dWy
    
    def update_weights(self, dWa, dWx, dWy, learn_rate):
        self.Wa -= learn_rate * dWa
        self.Wx -= learn_rate * dWx
        self.Wy -= learn_rate * dWy
        
    def classify(self, y_vec): 
        
        for i in range(len(y_vec)): 
            if y_vec[i] > 0.5: 
                preds = 1
            else: 
                preds = 0
                
        return preds
        

### Data Generation
Let's construct a problem to solve using our very own RNN. Consider this very simple toy example - you receive multiple sequences of baggage weights - each sequence has 5 pieces of luggage, with a max weight of 40 pounds, and you have 100 passengers each with their own baggage sequence. The total weight your flight can carry is 1570 pounds. You have a simple task - see if your flight can handle the input baggage weights or not. Note that you could essentially sum each sequence and see if the total weight is > 1570 - neural networks are definitely overkill in this scenario, but let's say the airport is inefficient really wants to use more computational resources than required.

In [None]:
#Data generation functions
def generate_sequence(input_dims):
    #input dims is the (100, 5)
    data = np.random.choice(40, input_dims) #generate random ints in the given range
    total_sum = sum([sum(arr) for arr in data])
    label = 1 if total_sum > 1570 else 0
    
    return data, label

In [None]:
#First generate 1000 samples
X = [0] * 1000
y = np.zeros(1000)

for row in range(1000): 
    x_gen, y_gen = generate_sequence((100, 5))
    X[row] = x_gen
    y[row] = y_gen
    
#split into training and testing data using an 80/20 split
X_train = X[:800]
y_train = y[:800]
X_test = X[800:]
y_test = y[800:]

In [None]:
#initialize an RNN with output dim 2, we want to see the probability of each class
rnn = RNN(5, 2, 10)

In [None]:
#Train RNN for 10 epochs
for i in range (10): 
    for j in range(len(X_train)): 
        x = X_train[j]
        y = y_train[j]
        
        class_probs = rnn.forward_pass(x)[0]
        class_pred = rnn.classify(class_probs)
            
        dy = class_probs
        dy[int(y)] -= 1
                
        derivatives = rnn.backward_pass(dy)
        rnn.update_weights(derivatives[0], derivatives[1], derivatives[2], 0.05)
        
    

Now, let's test our simple RNN on our test data:

In [None]:
num_correct_test = 0
for j in range(len(X_test)): 
    x = X_test[j]
    y = y_test[j]

    class_probs = rnn.forward_pass(x)[0]
    class_pred = rnn.classify(class_probs)
 
    if class_pred == y: 
        num_correct_test += 1

Calculate your testing accuracy below: 

In [None]:
num_correct_test/len(X_test)

#### Discussion Question 1: How did our RNN perform? (Optional question): How would we change the backward pass function if we were using MSE loss instead of cross entropy loss?

**Answer here:** Great! Our RNN perfomed really well on this simple task. Take some time to go through the backpropagation part of the RNN if you haven't already. If we were to use the MSE loss instead of cross entropy loss, we would have to re-calculate the partial derivatives w.r.t to the weight matrices.

## References: 
- An Introduction to RNNs <br>
  https://victorzhou.com/blog/intro-to-rnns/
  
- Backpropagation through time <br>
  http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through-time-and-vanishing-gradients/