# Introduction to Seq2Seq Models

Sequence-to-Sequence (Seq2Seq) models are a type of model that convert sequences from one domain to sequences in another domain. These models are particularly useful for tasks that involve generating sequences as outputs based on sequential inputs. Common applications include machine translation, where the input sequence is text in the source language, and the output sequence is the corresponding text in the target language. Other applications involve speech recognition, text summarization, and question answering.

## Understanding the Components

Seq2Seq models typically consist of two main components:

### Encoder
The encoder processes the input sequence and compresses the information into a context vector (also known as the state vector). This vector aims to encapsulate the information for all the input elements in order to help the decoder make accurate predictions. In most cases, the encoder is a Recurrent Neural Network (RNN) or one of its variants like LSTM or GRU.

### Decoder
The decoder is trained to generate the output sequence by predicting the next element based on the previous elements and the context vector from the encoder. It continues generating elements of the sequence until it produces an end-of-sequence token, signaling that the output is complete. Similar to the encoder, the decoder is often an RNN or an advanced variant.

Together, these components enable the model to handle complex sequence generation tasks, often surpassing the capabilities of models that do not have such a structured approach to handling sequences.


In [1]:
# Importing necessary libraries
import torch
import torch.nn as nn
import torch.optim as optim
import random

In [2]:
# Define a simple Seq2Seq model using PyTorch
# For this example, we will consider a simple case study of reversing a sequence

# Sample data: a list of sequences (for simplicity, we use numbers here)
input_sequences = [
    [1, 2, 3, 4, 5],
    [6, 7, 8, 9, 10],
    [11, 12, 13, 14, 15],
    [16, 17, 18, 19, 20],
    [21, 22, 23, 24, 25]
]


In [18]:
# Reverse the sequences to create target sequences
target_sequences = [list(reversed(seq)) for seq in input_sequences]
print(target_sequences)

[[5, 4, 3, 2, 1], [10, 9, 8, 7, 6], [15, 14, 13, 12, 11], [20, 19, 18, 17, 16], [25, 24, 23, 22, 21]]


## Encoder Architecture

Here, we define a simple Encoder class that extends `nn.Module`, PyTorch's base class for all neural network modules. Our encoder is a key component in sequence-to-sequence models, where it processes the input sequence and compresses the information into a context vector (the final hidden state).

### Components of the Encoder:

- **Embedding Layer**: This layer converts input tokens (usually integers representing words) into dense vectors of fixed size. It's a way to handle the vast dimensionality of language data and reduce it to a more manageable form.
  
- **GRU Layer**: The Gated Recurrent Unit (GRU) is a type of RNN that can capture dependencies at different time scales. It processes the sequence step by step, updating its hidden state at each time step.

### The `forward` Method:

The `forward` function is where the actual computation of the module occurs. It takes two arguments: `input` and `hidden`:
- `input` is the sequence of tokens that are to be encoded.
- `hidden` is the initial hidden state (usually starting as zeros).

The `forward` function performs the following steps:
1. It passes the input through the embedding layer to get dense representations.
2. It reshapes the embedded input to fit the expected input dimensions of the GRU.
3. It processes the input through the GRU layer, which updates the hidden state.

The GRU outputs the `output` for each input along with the updated `hidden` state, which is then returned by the function.

### Initialization of Hidden State:

The `initHidden` method initializes the hidden state to zeros. This state will be updated as the GRU processes the input sequence.

This encoder architecture is commonly used as the first component in sequence-to-sequence models, which aim to transform a given sequence into a new domain, such as translating sentences from one language to another.


In [4]:
# Define a simple Encoder
class Encoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(Encoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)

    def forward(self, input, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        output, hidden = self.gru(embedded, hidden)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size)

## Decoder Architecture

The Decoder is the second component of our sequence-to-sequence model, designed to generate an output sequence from the context vector provided by the Encoder.

### Components of the Decoder:

- **Embedding Layer**: Similar to the Encoder, the embedding layer here transforms the indices of the output tokens into dense vectors.

- **GRU Layer**: The GRU works just like in the Encoder, but here it's generating a sequence rather than encoding it. It starts with the context vector from the Encoder as its initial hidden state.

- **Linear Layer**: This layer maps the output of the GRU to the space of the possible output tokens.

- **Softmax Layer**: The softmax function is applied to the linear layer's output to obtain a probability distribution over all possible output tokens.

### The `forward` Method:

The `forward` function processes the inputs through the following steps:
1. The input token is embedded and reshaped to fit the expected input dimensions of the GRU.
2. The GRU processes the input, and the hidden state is updated.
3. The output from the GRU is passed through a linear layer and then through a softmax layer to predict the probability distribution of the next token in the sequence.

### Generating Output Sequences:

The Decoder's job is to generate an output sequence one token at a time. It continues generating tokens until it reaches an end-of-sequence token or some predefined limit. At each step, it uses the output token as the next input token.

This simple decoder architecture is a fundamental part of many sequence generation tasks such as machine translation, where the model needs to produce a sequence of words in the target language.


In [5]:
# Define a simple Decoder
class Decoder(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(Decoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(output_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)
        self.out = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        output = self.embedding(input).view(1, 1, -1)
        output = torch.relu(output)
        output, hidden = self.gru(output, hidden)
        output = self.softmax(self.out(output[0]))
        return output, hidden

In [6]:
# Hyperparameters
input_size = 26  # Assuming a vocabulary size of 26 (like the English alphabet)
hidden_size = 256
output_size = 26

In [13]:
# Initialize encoder and decoder models
encoder = Encoder(input_size, hidden_size)
decoder = Decoder(hidden_size, output_size)

In [14]:
# Define training loop
def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion, max_length=5):
    encoder_hidden = encoder.initHidden()

    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    input_length = input_tensor.size(0)
    target_length = target_tensor.size(0)

    loss = 0

    for ei in range(input_length):
        encoder_output, encoder_hidden = encoder(input_tensor[:, ei], encoder_hidden)

    decoder_input = torch.tensor([[0]])  # SOS token

    for di in range(target_length):
        decoder_output, decoder_hidden = decoder(decoder_input, encoder_hidden)
        topv, topi = decoder_output.topk(1)
        decoder_input = topi.squeeze().detach()

        loss += criterion(decoder_output, target_tensor[:, di])
        if decoder_input.item() == 1:  # EOS token
            break

    loss.backward()

    encoder_optimizer.step()
    decoder_optimizer.step()

    return loss.item() / target_length

In [15]:
# Loss function
criterion = nn.NLLLoss()

# Optimizers
encoder_optimizer = optim.SGD(encoder.parameters(), lr=0.01)
decoder_optimizer = optim.SGD(decoder.parameters(), lr=0.01)

In [16]:
# Convert input and target sequences to tensors
input_tensors = [torch.tensor(seq, dtype=torch.long) for seq in input_sequences]
target_tensors = [torch.tensor(seq, dtype=torch.long) for seq in target_sequences]

In [17]:
# Training the model with a small number of epochs for demonstration purposes
for epoch in range(1):
    total_loss = 0
    for input_tensor, target_tensor in zip(input_tensors, target_tensors):
        loss = train(input_tensor.unsqueeze(0), target_tensor.unsqueeze(0), encoder, decoder, encoder_optimizer, decoder_optimizer, criterion)
        total_loss += loss
        print(f'Epoch {epoch}, Loss: {loss}')
    print(f'Epoch {epoch} completed, Total Loss: {total_loss}')

Epoch 0, Loss: 3.160747766494751
Epoch 0, Loss: 3.3049395084381104
Epoch 0, Loss: 3.376070737838745
Epoch 0, Loss: 3.3989901542663574
Epoch 0, Loss: 3.061946392059326
Epoch 0 completed, Total Loss: 16.30269455909729


In [19]:
# Function to convert tensor to list
def tensor_to_list(tensor):
    return tensor.detach().cpu().tolist()

In [20]:
# Assuming max_length is the maximum sequence length that you expect
max_length = 5

# Function to generate predictions from the model
def predict(input_tensor, encoder, decoder, max_length=max_length):
    with torch.no_grad():
        input_length = input_tensor.size(0)
        encoder_hidden = encoder.initHidden()

        # Encoding
        for ei in range(input_length):
            encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)

        # Decoding
        decoder_input = torch.tensor([[0]])  # SOS token
        decoder_hidden = encoder_hidden

        predicted_seq = []
        for di in range(max_length):
            decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
            topv, topi = decoder_output.data.topk(1)
            if topi.item() == 1:  # EOS token
                break
            else:
                predicted_seq.append(topi.item())

            decoder_input = topi.squeeze().detach()

        return predicted_seq

In [21]:
# After training, visualize some predictions
print("Visualizing Predictions after Training:")
for i in range(min(5, len(input_tensors))):
    input_sequence = tensor_to_list(input_tensors[i])
    target_sequence = tensor_to_list(target_tensors[i])
    predicted_sequence = predict(input_tensors[i], encoder, decoder)

    print(f"Input Sequence: {input_sequence}")
    print(f"Target Sequence: {target_sequence}")
    print(f"Predicted Sequence: {predicted_sequence}\n")


Visualizing Predictions after Training:
Input Sequence: [1, 2, 3, 4, 5]
Target Sequence: [5, 4, 3, 2, 1]
Predicted Sequence: [25, 7, 23, 25, 7]

Input Sequence: [6, 7, 8, 9, 10]
Target Sequence: [10, 9, 8, 7, 6]
Predicted Sequence: [5, 5, 14, 12, 25]

Input Sequence: [11, 12, 13, 14, 15]
Target Sequence: [15, 14, 13, 12, 11]
Predicted Sequence: [15, 5, 5, 5, 14]

Input Sequence: [16, 17, 18, 19, 20]
Target Sequence: [20, 19, 18, 17, 16]
Predicted Sequence: [25, 7, 23, 25, 7]

Input Sequence: [21, 22, 23, 24, 25]
Target Sequence: [25, 24, 23, 22, 21]
Predicted Sequence: [15]

