#### RNN Model from Scratch with Numpy
Recurrent Neural Networks (RNNs) are a kind of neural network that specialize in processing sequences. They’re often used in Natural Language Processing (NLP) tasks because of their effectiveness in handling text.

Why? One issue with vanilla neural nets (and also CNNs) is that they only work with pre-determined sizes: they take fixed-size inputs and produce fixed-size outputs. RNNs are useful because they let us have variable-length sequences as both inputs and outputs (one, many) to  (one, many)

e.g. many to many - machine translation; many to one: sentiment analysis

Here’s what makes a RNN recurrent: it uses the same weights for each step. More specifically, a typical vanilla RNN uses only 3 sets of weights to perform its calculations: W_xh, W_hh, W_hy; bias: b_h, b_y

h_t = tanh(W_xh * x_t + W_hh * h_t-1 + b_h); y = W_hy * h_t + b_y


Find the blog here: https://victorzhou.com/tag/neural-networks/page/2/

In [1]:
# https://victorzhou.com/blog/intro-to-rnns/
# suppose we already load the train and test data
from data import train_data, test_data

In [2]:
# create the vocabulary
vocab = list(set(w for text in train_data.keys() for w in text.split(' ')))
vocab_size = len(vocab)
print('%d unique words in the text' %vocab_size)

18 unique words in the text


In [3]:
# attach indices to words
word_idx = {w:i for i,w in enumerate(vocab)}
idx_word = {i:w for i,w in enumerate(vocab)}
print(word_idx['good'])
print(idx_word[0])

3
and


In [4]:
import numpy as np
# return an array containing one hot encoding vector of the words
def createInputs(text):
    inputs = []
    for w in text.split(' '): # one sentence
        encode = np.zeros((vocab_size, 1))
        encode[word_idx[w]] = 1
        inputs.append(encode)
    return inputs        

The weights of artificial neural networks must be initialized to small random numbers. This is because this is an expectation of the stochastic optimization algorithm used to train the model, called stochastic gradient descent.

It is possible and common to initialize the biases to be zero, since the asymmetry breaking is provided by the small random numbers in the weights.

In [24]:
# If you see an @ in the middle of a line, that's a different thing, matrix multiplication.
# https://stackoverflow.com/questions/6392739/what-does-the-at-symbol-do-in-python
from numpy.random import randn

class RNN:
    def __init__(self, input_size, output_size, hidden_size = 64):
        
        self.Wxh = randn(hidden_size, input_size) / 1000 # explained above
        self.Whh = randn(hidden_size, hidden_size) / 1000
        self.Why = randn(output_size, hidden_size) / 1000
        
        self.bh = np.zeros((hidden_size, 1))
        self.by = np.zeros((output_size, 1)) 
        
    def forward(self, inputs):
        h = np.zeros((self.Whh.shape[0], 1))
        
        self.last_inputs = inputs
        self.last_hs = {0: h}
        
        for i,x in enumerate(inputs):
            h = np.tanh(self.Wxh@x + self.Whh@h + self.bh)
            self.last_hs[i+1] = h
        
        y = self.Why@h + self.by
    
        return y, h
    
    def backprop(self, d_y, learn_rate = 2e-2):
        
        n = len(self.last_inputs)
        
        d_Why = d_y @ self.last_hs[n].T
        d_by = d_y
        
            # Initialize dL/dWhh, dL/dWxh, and dL/dbh to zero.
        d_Whh = np.zeros(self.Whh.shape)
        d_Wxh = np.zeros(self.Wxh.shape)
        d_bh = np.zeros(self.bh.shape)

        # Calculate dL/dh for the last h.
        d_h = self.Why.T @ d_y

        # Backpropagate through time.
        for t in reversed(range(n)):
          # An intermediate value: dL/dh * (1 - h^2)
          temp = ((1 - self.last_hs[t + 1] ** 2) * d_h)

          # dL/db = dL/dh * (1 - h^2)
          d_bh += temp

          # dL/dWhh = dL/dh * (1 - h^2) * h_{t-1}
          d_Whh += temp @ self.last_hs[t].T

          # dL/dWxh = dL/dh * (1 - h^2) * x
          d_Wxh += temp @ self.last_inputs[t].T

          # Next dL/dh = dL/dh * (1 - h^2) * Whh
          d_h = self.Whh @ temp

        # Clip to prevent exploding gradients.
        for d in [d_Wxh, d_Whh, d_Why, d_bh, d_by]:
            np.clip(d, -1, 1, out=d)

        # Update weights and biases using gradient descent.
        self.Whh -= learn_rate * d_Whh
        self.Wxh -= learn_rate * d_Wxh
        self.Why -= learn_rate * d_Why
        self.bh -= learn_rate * d_bh
        self.by -= learn_rate * d_by
        

In [25]:
def softmax(ex):
    return np.exp(ex)/sum(np.exp(ex))

In [26]:
rnn = RNN(vocab_size, 2)

inputs = createInputs('i am very good')
out, h = rnn.forward(inputs)
probs = softmax(out)
print(probs)

[[0.49999712]
 [0.50000288]]


In [27]:
import random

def processData(data, backprop=True):
    '''
      Returns the RNN's loss and accuracy for the given data.
      - data is a dictionary mapping text to True or False.
      - backprop determines if the backward phase should be run.
      '''
    items = list(data.items())
    random.shuffle(items)

    loss = 0
    num_correct = 0

    for x, y in items:
        inputs = createInputs(x)
        target = int(y)

        # Forward
        out, _ = rnn.forward(inputs)
        probs = softmax(out)

        # Calculate loss / accuracy
        loss -= np.log(probs[target])
        num_correct += int(np.argmax(probs) == target)

        if backprop:
          # Build dL/dy
            d_L_d_y = probs
            d_L_d_y[target] -= 1

          # Backward
            rnn.backprop(d_L_d_y)

    return loss / len(data), num_correct / len(data)

In [28]:
# Training loop
for epoch in range(1000):
    train_loss, train_acc = processData(train_data)

    if epoch % 100 == 99:
        print('--- Epoch %d' % (epoch + 1))
        print('Train:\tLoss %.3f | Accuracy: %.3f' % (train_loss, train_acc))

        test_loss, test_acc = processData(test_data, backprop=False)
        print('Test:\tLoss %.3f | Accuracy: %.3f' % (test_loss, test_acc))

--- Epoch 100
Train:	Loss 0.689 | Accuracy: 0.552
Test:	Loss 0.698 | Accuracy: 0.500
--- Epoch 200
Train:	Loss 0.672 | Accuracy: 0.586
Test:	Loss 0.722 | Accuracy: 0.450
--- Epoch 300
Train:	Loss 0.144 | Accuracy: 0.983
Test:	Loss 0.222 | Accuracy: 0.900
--- Epoch 400
Train:	Loss 0.040 | Accuracy: 1.000
Test:	Loss 0.079 | Accuracy: 0.950
--- Epoch 500
Train:	Loss 0.010 | Accuracy: 1.000
Test:	Loss 0.028 | Accuracy: 1.000
--- Epoch 600
Train:	Loss 0.005 | Accuracy: 1.000
Test:	Loss 0.020 | Accuracy: 1.000
--- Epoch 700
Train:	Loss 0.003 | Accuracy: 1.000
Test:	Loss 0.022 | Accuracy: 1.000
--- Epoch 800
Train:	Loss 0.002 | Accuracy: 1.000
Test:	Loss 0.021 | Accuracy: 1.000
--- Epoch 900
Train:	Loss 0.002 | Accuracy: 1.000
Test:	Loss 0.019 | Accuracy: 1.000
--- Epoch 1000
Train:	Loss 0.002 | Accuracy: 1.000
Test:	Loss 0.015 | Accuracy: 1.000
