In [None]:
# Sumani
# https://www.linkedin.com/in/sumanaruban/
# https://github.com/Sumanaruban# Sumani
# 1-9-2024

# Sequential Models in PyTorch: RNN, LSTM, and GRU

This tutorial will cover the basics of sequential models in PyTorch, specifically focusing on Recurrent Neural Networks (RNNs), Long Short-Term Memory Networks (LSTMs), and Gated Recurrent Units (GRUs). We'll start with a brief overview of each model, followed by simple examples using tensors to illustrate their operations.

## 1. Introduction to Sequential Models

Sequential models are designed to process sequences of data, such as time series or natural language. Unlike feedforward neural networks, sequential models retain information from previous inputs in the sequence, making them suitable for tasks where the order of inputs matters.

### 1.1 Recurrent Neural Networks (RNN)

RNNs are a type of neural network designed to handle sequential data by maintaining a hidden state that captures information from previous steps in the sequence.

### 1.2 Long Short-Term Memory Networks (LSTM)

LSTMs are a special type of RNN designed to address the vanishing gradient problem. They use gates to control the flow of information, making them capable of learning long-term dependencies.

### 1.3 Gated Recurrent Units (GRU)

GRUs are similar to LSTMs but are simpler and faster to train. They combine the forget and input gates into a single update gate, reducing the number of parameters.


## 2. PyTorch Implementation of RNN, LSTM, and GRU

We'll implement each of these models using PyTorch and simple tensors to demonstrate how they work.

### 2.1 Recurrent Neural Networks (RNN)

In this example, we define a simple RNN with one input feature and two hidden units. The input sequence consists of three time steps. The RNN processes each time step and updates the hidden state accordingly. The final output and hidden state are printed.

In [1]:
import torch
import torch.nn as nn

# Define the input tensor (batch_size, sequence_length, input_size)
input_seq = torch.tensor([[[0.1], [0.2], [0.3]]], dtype=torch.float32)

# Define an RNN layer
rnn = nn.RNN(input_size=1, hidden_size=2, batch_first=True)

# Initialize the hidden state
hidden_state = torch.zeros(1, 1, 2)

# Forward pass through the RNN
output, hidden_state = rnn(input_seq, hidden_state)

# Print the output and hidden state
print("RNN Output:", output)
print("RNN Hidden State:", hidden_state)

RNN Output: tensor([[[-0.7648, -0.0051],
         [-0.8265,  0.1510],
         [-0.8547,  0.1817]]], grad_fn=<TransposeBackward1>)
RNN Hidden State: tensor([[[-0.8547,  0.1817]]], grad_fn=<StackBackward0>)


In [2]:
# Let's pass the same input again
output, hidden_state = rnn(input_seq, hidden_state)

# Print the output and hidden state
print("RNN Output:", output)
print("RNN Hidden State:", hidden_state)

RNN Output: tensor([[[-0.8454,  0.1614],
         [-0.8499,  0.1722],
         [-0.8581,  0.1867]]], grad_fn=<TransposeBackward1>)
RNN Hidden State: tensor([[[-0.8581,  0.1867]]], grad_fn=<StackBackward0>)


### Change Hidden State Size

In [3]:
# Define the input tensor (batch_size, sequence_length, input_size)
input_seq = torch.tensor([[[0.1], [0.2], [0.3]]], dtype=torch.float32)

# Define an RNN layer with hidden_size=4
rnn_hidden_4 = nn.RNN(input_size=1, hidden_size=4, batch_first=True)

# Initialize the hidden state (num_layers, batch_size, hidden_size)
hidden_state_4 = torch.zeros(1, 1, 4)

# Forward pass through the RNN
output_4, hidden_state_4 = rnn_hidden_4(input_seq, hidden_state_4)

# Print the output and hidden state
print("RNN with hidden_size=4 Output:", output_4)
print("RNN with hidden_size=4 Hidden State:", hidden_state_4)

RNN with hidden_size=4 Output: tensor([[[-0.1240, -0.1224, -0.4693,  0.5299],
         [-0.3676, -0.0856, -0.2503,  0.6491],
         [-0.3318,  0.1474, -0.1984,  0.6977]]], grad_fn=<TransposeBackward1>)
RNN with hidden_size=4 Hidden State: tensor([[[-0.3318,  0.1474, -0.1984,  0.6977]]], grad_fn=<StackBackward0>)


### Multi Layer RNN

In [4]:
# Define the input tensor (batch_size, sequence_length, input_size)
input_seq = torch.tensor([[[0.1], [0.2], [0.3]]], dtype=torch.float32)

# Define an RNN layer with 3 stacked layers and hidden_size=2
rnn_stacked = nn.RNN(input_size=1, hidden_size=2, num_layers=3, batch_first=True)

# Initialize the hidden state (num_layers, batch_size, hidden_size)
hidden_state_stacked = torch.zeros(3, 1, 2)  # 3 layers, batch_size=1, hidden_size=2

# Forward pass through the Multi Layer RNN
output_stacked, hidden_state_stacked = rnn_stacked(input_seq, hidden_state_stacked)

# Print the output and hidden state
print("Multi Layer RNN Output:", output_stacked)
print("Multi Layer RNN Hidden State:", hidden_state_stacked)

Multi Layer RNN Output: tensor([[[0.5857, 0.7736],
         [0.5092, 0.2235],
         [0.4680, 0.5402]]], grad_fn=<TransposeBackward1>)
Multi Layer RNN Hidden State: tensor([[[-0.6025, -0.0181]],

        [[ 0.1877, -0.3924]],

        [[ 0.4680,  0.5402]]], grad_fn=<StackBackward0>)


### 2.2 Long Short-Term Memory Networks (LSTM)

The LSTM example is similar to the RNN example, but with an additional cell state that helps maintain long-term dependencies. The output, hidden state, and cell state are printed after processing the input sequence.

In [5]:
# Define an LSTM layer
lstm = nn.LSTM(input_size=1, hidden_size=2, batch_first=True)

# Initialize the hidden state and cell state
hidden_state = torch.zeros(1, 1, 2)
cell_state = torch.zeros(1, 1, 2)

# Forward pass through the LSTM
output, (hidden_state, cell_state) = lstm(input_seq, (hidden_state, cell_state))

# Print the output, hidden state, and cell state
print("LSTM Output:", output)
print("LSTM Hidden State:", hidden_state)
print("LSTM Cell State:", cell_state)

LSTM Output: tensor([[[-0.1971,  0.0529],
         [-0.3005,  0.0989],
         [-0.3565,  0.1298]]], grad_fn=<TransposeBackward0>)
LSTM Hidden State: tensor([[[-0.3565,  0.1298]]], grad_fn=<StackBackward0>)
LSTM Cell State: tensor([[[-0.5722,  0.3109]]], grad_fn=<StackBackward0>)


In [6]:
# Forward pass through the LSTM
output, (hidden_state, cell_state) = lstm(input_seq, (hidden_state, cell_state))

# Print the output, hidden state, and cell state
print("LSTM Output:", output)
print("LSTM Hidden State:", hidden_state)
print("LSTM Cell State:", cell_state)

LSTM Output: tensor([[[-0.3675,  0.1714],
         [-0.3846,  0.1859],
         [-0.3981,  0.1888]]], grad_fn=<TransposeBackward0>)
LSTM Hidden State: tensor([[[-0.3981,  0.1888]]], grad_fn=<StackBackward0>)
LSTM Cell State: tensor([[[-0.6296,  0.4410]]], grad_fn=<StackBackward0>)


### 2.3 Gated Recurrent Units (GRU)

In the GRU example, we define a GRU layer with one input feature and two hidden units. The GRU is simpler than the LSTM, with fewer parameters. The output and hidden state are printed after processing the input sequence.

In [7]:
# Define a GRU layer
gru = nn.GRU(input_size=1, hidden_size=2, batch_first=True)

# Initialize the hidden state
hidden_state = torch.zeros(1, 1, 2)

# Forward pass through the GRU
output, hidden_state = gru(input_seq, hidden_state)

# Print the output and hidden state
print("GRU Output:", output)
print("GRU Hidden State:", hidden_state)

GRU Output: tensor([[[ 0.2566, -0.0614],
         [ 0.4277, -0.1204],
         [ 0.5382, -0.1696]]], grad_fn=<TransposeBackward1>)
GRU Hidden State: tensor([[[ 0.5382, -0.1696]]], grad_fn=<StackBackward0>)


In [8]:
# Forward pass through the GRU
output, hidden_state = gru(input_seq, hidden_state)

# Print the output and hidden state
print("GRU Output:", output)
print("GRU Hidden State:", hidden_state)

GRU Output: tensor([[[ 0.5916, -0.2482],
         [ 0.6326, -0.3021],
         [ 0.6645, -0.3359]]], grad_fn=<TransposeBackward1>)
GRU Hidden State: tensor([[[ 0.6645, -0.3359]]], grad_fn=<StackBackward0>)
