# Gated Recurrent Unit (GRU)

Similar to LSTM, GRU is used to combat the problem of vanishing gradient.

- The GRU has two gates, LSTM has three gates.
- GRU does not possess any internal memory, they don’t have an output gate that is present in LSTM.
- In LSTM the input gate and target gate are coupled by an update gate and in GRU reset gate is applied directly to the previous hidden state. In LSTM the responsibility of reset gate is taken by the two gates i.e., input and target. 

## Data Ingestion

In [1]:
import torch
import numpy as np
import pandas as pd
import torch.nn as nn
from torch import LongTensor
from torch.autograd import Variable

In [2]:
data = ['Gated', 'Recurrent', 'Unit']   # Dummy data of batch_size = 3

## Create Vocabulary

In [3]:
vocab = [''] + sorted(set([char for seq in data for char in seq]))
vocab2idx = [[vocab.index(token) for token in seq] for seq in data]
print('Vocabulary:', vocab)
print('Vocabulary Index:', vocab2idx)

Vocabulary: ['', 'G', 'R', 'U', 'a', 'c', 'd', 'e', 'i', 'n', 'r', 't', 'u']
Vocabulary Index: [[1, 4, 11, 7, 6], [2, 7, 5, 12, 10, 10, 7, 9, 11], [3, 9, 8, 11]]


## Padding

In [4]:
seq_lengths = LongTensor([len(seq) for seq in vocab2idx])
sequence_tensor = Variable(torch.zeros(len(vocab2idx), seq_lengths.max(), dtype=torch.long))

for idx, (seq, seq_len) in enumerate(zip(vocab2idx, seq_lengths)):
    sequence_tensor[idx, :seq_len] = LongTensor(seq)

print(sequence_tensor)

# convert the input into time major format (Transpose)
sequence_tensor = sequence_tensor.t()    # Shape: (max_len, batch_size)
print(sequence_tensor)

tensor([[ 1,  4, 11,  7,  6,  0,  0,  0,  0],
        [ 2,  7,  5, 12, 10, 10,  7,  9, 11],
        [ 3,  9,  8, 11,  0,  0,  0,  0,  0]])
tensor([[ 1,  2,  3],
        [ 4,  7,  9],
        [11,  5,  8],
        [ 7, 12, 11],
        [ 6, 10,  0],
        [ 0, 10,  0],
        [ 0,  7,  0],
        [ 0,  9,  0],
        [ 0, 11,  0]])


##  Model Implementation

Here we will define model classes inherited from pytorch's `nn.Module`. The `__init__()` method is used to instanciate each model classes while `forward()` method is used to define the model solution. The shape of different elements of `forward()` module are listed below:

- *input* Shape: (max_len, batch_size)
- *embed* Shape: (max_len, batch_size, embed_dim)
- *output* Shape: (max_len, batch_size, hidden_size)
- *hidden* Shape: (num_layers * num_directions, batch_size, hidden_size)

In [5]:
input_dim = len(vocab)
hidden_dim = 5
embed_dim = 5

In [6]:
class GRUModel(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, n_layers, bidirectional):
        super().__init__()
        self.embedding = nn.Embedding(input_dim, embedding_dim)
        self.gru = nn.GRU(embedding_dim, hidden_dim, num_layers=n_layers, bidirectional=bidirectional)

    def forward(self, input):
        embed = self.embedding(input)
        output, hidden = self.gru(embed)
        return output, hidden

#### 1. Single Layer Unidirectional GRU

In [7]:
n_layers = 1
bidirectional = False

model = GRUModel(input_dim, embed_dim, hidden_dim, n_layers, bidirectional)
output, hidden = model(sequence_tensor)

print(f"Input shape: {sequence_tensor.shape}")
print(f"Output shape: {output.shape}")
print(f"Hidden shape: {hidden.shape}")

Input shape: torch.Size([9, 3])
Output shape: torch.Size([9, 3, 5])
Hidden shape: torch.Size([1, 3, 5])


Final output must be same as Hidden state in case of Single layer uni-directional GRU

In [8]:
assert (output[-1, :, :] == hidden[0]).all()

#### 2. Single Layer Bidirectional GRU

In [9]:
n_layers = 1
bidirectional = True

model = GRUModel(input_dim, embed_dim, hidden_dim, n_layers, bidirectional)
output, hidden = model(sequence_tensor)

print(f"Input shape: {sequence_tensor.shape}")
print(f"Output shape: {output.shape}")
print(f"Hidden shape: {hidden.shape}")

Input shape: torch.Size([9, 3])
Output shape: torch.Size([9, 3, 10])
Hidden shape: torch.Size([2, 3, 5])


- First hidden_dim of output at last time step must be same as Final Forward Hidden state in case of Single layer bi-directional GRU
- Last hidden_dim of output at initial time step must be same as Final Backward Hidden state in case of Single layer bi-directional GRU

In [10]:
assert (output[-1, :, :hidden_dim] == hidden[0]).all()
assert (output[0, :, hidden_dim:] == hidden[-1]).all()

#### 3. Multiple Layer Unidirectional GRU

In [11]:
n_layers = 2
bidirectional = False

model = GRUModel(input_dim, embed_dim, hidden_dim, n_layers, bidirectional)
output, hidden = model(sequence_tensor)

print(f"Input shape: {sequence_tensor.shape}")
print(f"Output shape: {output.shape}")
print(f"Hidden shape: {hidden.shape}")

Input shape: torch.Size([9, 3])
Output shape: torch.Size([9, 3, 5])
Hidden shape: torch.Size([2, 3, 5])


Final output must be same as Final Hidden state in case of Multi layer uni-directional GRU

In [12]:
assert (output[-1, :, :] == hidden[-1]).all()

#### 4. Multiple Layer Bidirectional GRU

In [13]:
n_layers = 2
bidirectional = True

model = GRUModel(input_dim, embed_dim, hidden_dim, n_layers, bidirectional)
output, hidden = model(sequence_tensor)

print(f"Input shape: {sequence_tensor.shape}")
print(f"Output shape: {output.shape}")
print(f"Hidden shape: {hidden.shape}")

Input shape: torch.Size([9, 3])
Output shape: torch.Size([9, 3, 10])
Hidden shape: torch.Size([4, 3, 5])


In [14]:
batch_size = sequence_tensor.shape[1]
hidden = hidden.view(n_layers, 2, batch_size, hidden_dim)
print(f"Reshaped hidden shape: {hidden.shape}")

Reshaped hidden shape: torch.Size([2, 2, 3, 5])


- First hidden_dim of output at last time step must be same as Final Forward Hidden state of final layer in case of Multi layer bi-directional GRU
- Last hidden_dim of output at initial time step must be same as Final Backward Hidden state of final layer in case of Multi layer bi-directional GRU

In [15]:
assert (output[-1, :, :hidden_dim] == hidden[-1][0]).all()
assert (output[0, :, hidden_dim:] == hidden[-1][1]).all()

## References

- [Understanding GRU Networks](https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be)
- [Introduction to Gated Recurrent Unit (GRU)](https://www.analyticsvidhya.com/blog/2021/03/introduction-to-gated-recurrent-unit-gru/)