# Sentiment Analysis using RNN

We will be implementing a Reccurrent Neural Network for basic sentiment analysis using only numpy.

This notebook is to emphasise the generation and usage of a RNN from scratch hence we will just use a small sample dataset.

In [1]:
train_data = {
  'good': True,
  'bad': False,
  'happy': True,
  'sad': False,
  'not good': False,
  'not bad': True,
  'not happy': False,
  'not sad': True,
  'very good': True,
  'very bad': False,
  'very happy': True,
  'very sad': False,
  'i am happy': True,
  'this is good': True,
  'i am bad': False,
  'this is bad': False,
  'i am sad': False,
  'this is sad': False,
  'i am not happy': False,
  'this is not good': False,
  'i am not bad': True,
  'this is not sad': True,
  'i am very happy': True,
  'this is very good': True,
  'i am very bad': False,
  'this is very sad': False,
  'this is very happy': True,
  'i am good not bad': True,
  'this is good not bad': True,
  'i am bad not good': False,
  'i am good and happy': True,
  'this is not good and not happy': False,
  'i am not at all good': False,
  'i am not at all bad': True,
  'i am not at all happy': False,
  'this is not at all sad': True,
  'this is not at all happy': False,
  'i am good right now': True,
  'i am bad right now': False,
  'this is bad right now': False,
  'i am sad right now': False,
  'i was good earlier': True,
  'i was happy earlier': True,
  'i was bad earlier': False,
  'i was sad earlier': False,
  'i am very bad right now': False,
  'this is very good right now': True,
  'this is very sad right now': False,
  'this was bad earlier': False,
  'this was very good earlier': True,
  'this was very bad earlier': False,
  'this was very happy earlier': True,
  'this was very sad earlier': False,
  'i was good and not bad earlier': True,
  'i was not good and not happy earlier': False,
  'i am not at all bad or sad right now': True,
  'i am not at all good or happy right now': False,
  'this was not happy and not good earlier': False,
}

test_data = {
  'this is happy': True,
  'i am good': True,
  'this is not happy': False,
  'i am not good': False,
  'this is not bad': True,
  'i am not sad': True,
  'i am very good': True,
  'this is very bad': False,
  'i am very sad': False,
  'this is bad not good': False,
  'this is good and happy': True,
  'i am not good and not happy': False,
  'i am not at all sad': True,
  'this is not at all good': False,
  'this is not at all bad': True,
  'this is good right now': True,
  'this is sad right now': False,
  'this is very bad right now': False,
  'this was good earlier': True,
  'i was not happy and not good earlier': False,
}

## Preprocessing

We'll have to deal with the data that we have stored ealier. For this : 
 - we first develop a vocabulary of all the words we used in the train data.
 - we assign an integer index for each of the word in the vocabulary
 - we use one hot encoding to generate a vector for each of the word rather than just an index.

In [2]:
#develop vocabulary
vocab = list(set([word for phrase in train_data.keys() for word in phrase.split(' ') ]))
voacb_size = len(vocab)

In [3]:
#assign integer index
word_to_idx = { word: index for index, word in enumerate(vocab) }
idx_to_word = { index: word for index, word in enumerate(vocab) }
print(word_to_idx['good'])
print(idx_to_word[0])

9
is


In [6]:
#Create one hot vectors for each word in a phrase
import numpy as np

def createInputs(phrase):
    inputs = []
    for word in phrase.split(' '):
        vec = np.zeros((vocab_size, 1))
        vec[word_to_idx[word]] = 1
        inputs.append(vec)
    return inputs

## Forward Phase

It's time to add in our 3 weights and the 2 biases to our network parameters.
The 3 weights are:
 - **Whh** : The weight vector used for all the h_t-1 -> h_t links
 - **Wxh** : The weight vector used for all the x -> h_t links
 - **Why** : The weight vector used for all the h_t -> y links

The 2 biases are:
 - **bh** : The bias added in the x -> h link
 - **by** : The bias added in the h -> y link

<center>$ h_t = tanh(W_{xh}x_t + W_{hh}h_{t-1} + b_h)$</center>
<center>$ y_t = W_{hy}h_t + b_y$ </center>

In [7]:
from numpy.random import randn
class RNN:
    def __init__(self, input_size, output_size, hidden_size=64):
    # Weights
    self.Whh = randn(hidden_size, hidden_size) / 1000
    self.Wxh = randn(hidden_size, input_size) / 1000
    self.Why = randn(output_size, hidden_size) / 1000

    # Biases
    self.bh = np.zeros((hidden_size, 1))
    self.by = np.zeros((output_size, 1))

  def forward(self, inputs):
    '''
    Perform a forward pass of the RNN using the given inputs.
    Returns the final output and hidden state.
    - inputs is an array of one hot vectors with shape (input_size, 1).
    '''
    h = np.zeros((self.Whh.shape[0], 1))

    self.last_inputs = inputs
    self.last_hs = { 0: h }

    # Perform each step of the RNN
    for i, x in enumerate(inputs):
      h = np.tanh(self.Wxh @ x + self.Whh @ h + self.bh)
      self.last_hs[i + 1] = h

    # Compute the output
    y = self.Why @ h + self.by

    return y, h

  def backprop(self, d_y, learn_rate=2e-2):
    '''
    Perform a backward pass of the RNN.
    - d_y (dL/dy) has shape (output_size, 1).
    - learn_rate is a float.
    '''
    n = len(self.last_inputs)

    # Calculate dL/dWhy and dL/dby.
    d_Why = d_y @ self.last_hs[n].T
    d_by = d_y

    # Initialize dL/dWhh, dL/dWxh, and dL/dbh to zero.
    d_Whh = np.zeros(self.Whh.shape)
    d_Wxh = np.zeros(self.Wxh.shape)
    d_bh = np.zeros(self.bh.shape)

    # Calculate dL/dh for the last h.
    # dL/dh = dL/dy * dy/dh
    d_h = self.Why.T @ d_y

    # Backpropagate through time.
    for t in reversed(range(n)):
      # An intermediate value: dL/dh * (1 - h^2)
      temp = ((1 - self.last_hs[t + 1] ** 2) * d_h)

      # dL/db = dL/dh * (1 - h^2)
      d_bh += temp

      # dL/dWhh = dL/dh * (1 - h^2) * h_{t-1}
      d_Whh += temp @ self.last_hs[t].T

      # dL/dWxh = dL/dh * (1 - h^2) * x
      d_Wxh += temp @ self.last_inputs[t].T

      # Next dL/dh = dL/dh * (1 - h^2) * Whh
      d_h = self.Whh @ temp

    # Clip to prevent exploding gradients.
    for d in [d_Wxh, d_Whh, d_Why, d_bh, d_by]:
      np.clip(d, -1, 1, out=d)

    # Update weights and biases using gradient descent.
    self.Whh -= learn_rate * d_Whh
    self.Wxh -= learn_rate * d_Wxh
    self.Why -= learn_rate * d_Why
    self.bh -= learn_rate * d_bh
    self.by -= learn_rate * d_by

        
        

In [8]:
def softmax(xs):
  # Applies the Softmax Function to the input array.
  return np.exp(xs) / sum(np.exp(xs))

In [9]:
for x, y in train_data.items():
    inputs = createInputs(x)
    target = int(y)

    # Forward
    out, _ = rnn.forward(inputs)
    probs = softmax(out)

    # Build dL/dy
    # this is the entropy loss 
    # Partial derivative of the loss function wrt predicted output
    d_L_d_y = probs
    d_L_d_y[target] -= 1

    # Backward
    rnn.backprop(d_L_d_y)

NameError: name 'vocab_size' is not defined

Now let's perform the math required for the backpropogation.
<br>
<center>$\Large y = W_{hy}h_n + b_y$ </center>
<br>
<center>$\Large \frac{\partial L}{\partial W_{hy}} = \frac{\partial L}{\partial y} * \frac{\partial y}{\partial W_{hy}}    $</center>
<br>
Hence, <center>$\Large \frac{\partial L}{\partial W_{hy}} = \frac{\partial L}{\partial y} * h_n$</center>
and
<center>$\Large \frac{\partial L}{\partial b_y} = \frac{\partial L}{\partial y}$</center>

Now let's perform the math for calculating the gradients of the other weights and biases for the backpropagation.

<center>$\Large \frac{\partial L}{\partial W_{xh}} = \frac{\partial L}{\partial y} * \sum \limits _{t} \frac{\partial y}{\partial h_t} *  \frac{\partial h_t}{\partial W_{xh}} $</center>   

<center>$\Large h_t = tanh(W_{xh}x_t + W_{hh}h_{t-1} + b_h)$</center>

<center>$\Large \frac{dtanh(x)}{dx} = 1 - {tanh}^{2}(x) $</center>

<center>$\Large \frac{\partial h_t}{\partial W_{xh}} = (1 - {h_t}^2)x_t $ <br>
    $\Large \frac{\partial h_t}{\partial W_{hh}} = (1 - {h_t}^2)h_{t-1}$ <br>
    $\Large \frac{\partial h_t}{\partial b_h} = (1 - {h_t}^2)$ </center>

<center>$\Large \frac{\partial y}{\partial h_t} = \frac{\partial y}{h_{t+1}} * \frac{\partial h_{t+1}}{\partial h_t} $ <br>
    $\Large  = \frac{\partial y}{\partial h_{t+1}} * (1 - {h_t}^2)W_{hh}$ <br>
    $\Large \frac{\partial y}{\partial h_n} = W_{hy}$ </center>