# Recurrent Neural Network (RNN)

Neural networks are a series of algorithms that mimic the operations of an animal brain to recognize relationships between vast amounts of data. Multilayer perceptrons are the simplest form of neural networks which consist of a input layer, one or more hidden layers and an output layer. A batch of input is passed at a time to the input layer and output is generated out of the output layer.  There are several types of neural networks, a few of them being Feed-forward neural networks, convolutional neural networks, deconvolutional neural networks, recurrent neural networks and modular neural networks.

RNN is essentially repeating ANN designed to handle sequential data in which data is fed one at a time and the output of one serves as input to the next input in the sequence.

## Why RNN?

RNNs are required because basic Artificial Neural Networks:
- cannot handle sequential data
- considers only current input
- cannot memorize previous inputs

## Types of RNN

- Single Layer Unidirectional RNN
- Single Layer Bidirectional RNN
- Multiple Layer Unidirectional RNN
- Multiple Layer Bidirectional RNN

## Data Ingestion

In [1]:
import torch
import numpy as np
import pandas as pd
import torch.nn as nn
from torch import LongTensor
from torch.autograd import Variable

In [2]:
data = ['Recurrent', 'Neural', 'Network']   # Dummy data of batch_size = 3

## Create Vocabulary

In [3]:
vocab = [''] + sorted(set([char for seq in data for char in seq]))
vocab2idx = [[vocab.index(token) for token in seq] for seq in data]
print('Vocabulary:', vocab)
print('Vocabulary Index:', vocab2idx)

Vocabulary: ['', 'N', 'R', 'a', 'c', 'e', 'k', 'l', 'n', 'o', 'r', 't', 'u', 'w']
Vocabulary Index: [[2, 5, 4, 12, 10, 10, 5, 8, 11], [1, 5, 12, 10, 3, 7], [1, 5, 11, 13, 9, 10, 6]]


## Padding

In [4]:
seq_lengths = LongTensor([len(seq) for seq in vocab2idx])
sequence_tensor = Variable(torch.zeros(len(vocab2idx), seq_lengths.max(), dtype=torch.long))

for idx, (seq, seq_len) in enumerate(zip(vocab2idx, seq_lengths)):
    sequence_tensor[idx, :seq_len] = LongTensor(seq)

print(sequence_tensor)

# convert the input into time major format (Transpose)
sequence_tensor = sequence_tensor.t()    # Shape: (max_len, batch_size)
print(sequence_tensor)

tensor([[ 2,  5,  4, 12, 10, 10,  5,  8, 11],
        [ 1,  5, 12, 10,  3,  7,  0,  0,  0],
        [ 1,  5, 11, 13,  9, 10,  6,  0,  0]])
tensor([[ 2,  1,  1],
        [ 5,  5,  5],
        [ 4, 12, 11],
        [12, 10, 13],
        [10,  3,  9],
        [10,  7, 10],
        [ 5,  0,  6],
        [ 8,  0,  0],
        [11,  0,  0]])


##  Model Implementation

Here we will define model classes inherited from pytorch's `nn.Module`. The `__init__()` method is used to instanciate each model classes while `forward()` method is used to define the model solution. The shape of different elements of `forward()` module are listed below:

- *input* Shape: (max_len, batch_size)
- *embed* Shape: (max_len, batch_size, embed_dim)
- *hidden* Shape:
    - Single Layer Unidirectional: (1, batch_size, hidden_size)
    - Single Layer Bidirectional: (2, batch_size, hidden_size)
    - Multiple Layer Unidirectional: (num_layers, batch_size, hidden_size)
    - Multiple Layer Bidirectional: (num_layers * 2, batch_size, hidden_size)
- *output* Shape: 
    - Unidirectional: (max_len, batch_size, hidden_size)
    - Bidirectional: (max_len, batch_size, hidden_size * 2)

In [5]:
input_dim = len(vocab)
hidden_dim = 5
embed_dim = 5

In [6]:
class RNNModel(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, n_layers, bidirectional):
        super().__init__()
        self.embedding = nn.Embedding(input_dim, embedding_dim)
        self.rnn = nn.RNN(embedding_dim, hidden_dim, num_layers=n_layers, bidirectional=bidirectional)

    def forward(self, input):
        embed = self.embedding(input)
        output, hidden = self.rnn(embed)
        return output, hidden

#### 1. Single Layer Unidirectional RNN

In [7]:
n_layers = 1
bidirectional = False

model = RNNModel(input_dim, embed_dim, hidden_dim, n_layers, bidirectional)
output, hidden = model(sequence_tensor)

print(f"Input shape: {sequence_tensor.shape}")
print(f"Output shape: {output.shape}")
print(f"Hidden shape: {hidden.shape}")

Input shape: torch.Size([9, 3])
Output shape: torch.Size([9, 3, 5])
Hidden shape: torch.Size([1, 3, 5])


Final output must be same as Hidden state in case of Single layer uni-directional RNN

In [8]:
assert (output[-1, :, :] == hidden[0]).all()

#### 2. Single Layer Bidirectional RNN

In [9]:
n_layers = 1
bidirectional = True

model = RNNModel(input_dim, embed_dim, hidden_dim, n_layers, bidirectional)
output, hidden = model(sequence_tensor)

print(f"Input shape: {sequence_tensor.shape}")
print(f"Output shape: {output.shape}")
print(f"Hidden shape: {hidden.shape}")

Input shape: torch.Size([9, 3])
Output shape: torch.Size([9, 3, 10])
Hidden shape: torch.Size([2, 3, 5])


- First hidden_dim of output at last time step must be same as Final Forward Hidden state in case of Single layer bi-directional RNN
- Last hidden_dim of output at initial time step must be same as Final Backward Hidden state in case of Single layer bi-directional RNN

In [10]:
assert (output[-1, :, :hidden_dim] == hidden[0]).all()
assert (output[0, :, hidden_dim:] == hidden[-1]).all()

#### 3. Multiple Layer Unidirectional RNN

In [11]:
n_layers = 2
bidirectional = False

model = RNNModel(input_dim, embed_dim, hidden_dim, n_layers, bidirectional)
output, hidden = model(sequence_tensor)

print(f"Input shape: {sequence_tensor.shape}")
print(f"Output shape: {output.shape}")
print(f"Hidden shape: {hidden.shape}")

Input shape: torch.Size([9, 3])
Output shape: torch.Size([9, 3, 5])
Hidden shape: torch.Size([2, 3, 5])


Final output must be same as Final Hidden state in case of Multi layer uni-directional RNN

In [12]:
assert (output[-1, :, :] == hidden[-1]).all()

#### 4. Multiple Layer Bidirectional RNN

In [13]:
n_layers = 2
bidirectional = True

model = RNNModel(input_dim, embed_dim, hidden_dim, n_layers, bidirectional)
output, hidden = model(sequence_tensor)

print(f"Input shape: {sequence_tensor.shape}")
print(f"Output shape: {output.shape}")
print(f"Hidden shape: {hidden.shape}")

Input shape: torch.Size([9, 3])
Output shape: torch.Size([9, 3, 10])
Hidden shape: torch.Size([4, 3, 5])


In [14]:
batch_size = sequence_tensor.shape[1]
hidden = hidden.view(n_layers, 2, batch_size, hidden_dim)
print(f"Reshaped hidden shape: {hidden.shape}")

Reshaped hidden shape: torch.Size([2, 2, 3, 5])


- First hidden_dim of output at last time step must be same as Final Forward Hidden state of final layer in case of Multi layer bi-directional RNN
- Last hidden_dim of output at initial time step must be same as Final Backward Hidden state of final layer in case of Multi layer bi-directional RNN

In [15]:
assert (output[-1, :, :hidden_dim] == hidden[-1][0]).all()
assert (output[0, :, hidden_dim:] == hidden[-1][1]).all()

## References

- [Recurrent Neural Network Tutorial](https://www.simplilearn.com/tutorials/deep-learning-tutorial/rnn)
- [Recurrent Neural Network Tutorial (RNN)](https://www.datacamp.com/tutorial/tutorial-for-recurrent-neural-network)