<a href="https://colab.research.google.com/github/akkkiii08/Deep-Learning/blob/main/Tranformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Transformers in Deep Learning**

Transformers in deep learning represent a significant advancement in handling sequential data, particularly for tasks such as language translation and text generation. Unlike traditional recurrent neural networks (RNNs), transformers process entire sequences simultaneously through a mechanism known as "self-attention." This allows them to dynamically prioritize relevant portions of input data, irrespective of their position within the sequence.

This innovative approach empowers transformers to effectively capture intricate dependencies and relationships across long distances within the sequence. It has sparked a revolution in natural language processing, giving rise to formidable models like BERT and GPT, which excel in comprehending and generating human-like text.

Let's delve into an example program that showcases the utilization of a transformer model for sequence classification using the PyTorch library.

In [30]:
pip install torch torchvision torchtext




In [31]:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
import numpy as np


In [33]:
# Sample data
data = [
    ("I love this movie", 1),
    ("This film was terrible", 0),
    ("Great acting and plot", 1),
    ("I did not like this movie", 0),
    ("Fantastic experience", 1),
    ("Not my type of film", 0),
    ("I enjoyed every moment", 1),
    ("It was a waste of time", 0),
    ("Brilliant storytelling", 1),
    ("Awful direction and script", 0)
]

# Splitting data into training and testing sets
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

class SimpleTextDataset(Dataset):
    def __init__(self, data, vocab=None, tokenizer=lambda x: x.split()):
        self.data = data
        self.tokenizer = tokenizer
        self.vocab = vocab or self.build_vocab(data)

    def build_vocab(self, data):
        vocab = {"<pad>": 0, "<unk>": 1}
        idx = 2
        for text, _ in data:
            for token in self.tokenizer(text):
                if token not in vocab:
                    vocab[token] = idx
                    idx += 1
        return vocab

    def encode(self, text):
        return [self.vocab.get(token, self.vocab["<unk>"]) for token in self.tokenizer(text)]

    def pad_sequence(self, seq, max_len):
        return seq + [0] * (max_len - len(seq))

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        text, label = self.data[idx]
        encoded_text = self.encode(text)
        return torch.tensor(encoded_text), torch.tensor(label)

# Create datasets and dataloaders
train_dataset = SimpleTextDataset(train_data)
test_dataset = SimpleTextDataset(test_data, vocab=train_dataset.vocab)

def collate_fn(batch):
    texts, labels = zip(*batch)
    max_len = max(len(text) for text in texts)
    padded_texts = [train_dataset.pad_sequence(text.tolist(), max_len) for text in texts]
    return torch.tensor(padded_texts), torch.tensor(labels)

train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True, collate_fn=collate_fn)
test_loader = DataLoader(test_dataset, batch_size=2, shuffle=False, collate_fn=collate_fn)


In [34]:
class TransformerModel(nn.Module):
    def __init__(self, vocab_size, embed_size, num_heads, hidden_dim, num_layers, num_classes, max_seq_length):
        super(TransformerModel, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.positional_encoding = self.generate_positional_encoding(embed_size, max_seq_length)
        encoder_layers = nn.TransformerEncoderLayer(embed_size, num_heads, hidden_dim)
        self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_layers)
        self.fc = nn.Linear(embed_size, num_classes)

    def generate_positional_encoding(self, embed_size, max_seq_length):
        pe = torch.zeros(max_seq_length, embed_size)
        position = torch.arange(0, max_seq_length, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, embed_size, 2).float() * (-np.log(10000.0) / embed_size))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0).transpose(0, 1)
        return pe

    def forward(self, src, src_mask=None):
        src = self.embedding(src) + self.positional_encoding[:src.size(0), :]
        output = self.transformer_encoder(src, src_mask)
        output = self.fc(output.mean(dim=0))
        return output

# Hyperparameters
vocab_size = len(train_dataset.vocab)
embed_size = 64
num_heads = 2
hidden_dim = 128
num_layers = 2
num_classes = 2
max_seq_length = 20  # Set a maximum sequence length

# Instantiate the model
model = TransformerModel(vocab_size, embed_size, num_heads, hidden_dim, num_layers, num_classes, max_seq_length)
print(model)


TransformerModel(
  (embedding): Embedding(31, 64)
  (transformer_encoder): TransformerEncoder(
    (layers): ModuleList(
      (0-1): 2 x TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (out_proj): NonDynamicallyQuantizableLinear(in_features=64, out_features=64, bias=True)
        )
        (linear1): Linear(in_features=64, out_features=128, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (linear2): Linear(in_features=128, out_features=64, bias=True)
        (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
        (norm2): LayerNorm((64,), eps=1e-05, elementwise_affine=True)
        (dropout1): Dropout(p=0.1, inplace=False)
        (dropout2): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (fc): Linear(in_features=64, out_features=2, bias=True)
)




In [35]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)


In [36]:
# Training loop
def train(model, iterator, criterion, optimizer):
    model.train()
    total_loss = 0
    for texts, labels in iterator:
        optimizer.zero_grad()
        texts = texts.permute(1, 0)  # Transformer expects (seq_len, batch_size)
        output = model(texts)
        loss = criterion(output, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    return total_loss / len(iterator)

# Number of epochs
EPOCHS = 10

for epoch in range(EPOCHS):
    start_time = time.time()
    train_loss = train(model, train_loader, criterion, optimizer)
    end_time = time.time()
    print(f'Epoch {epoch+1}/{EPOCHS}, Loss: {train_loss:.4f}, Time: {end_time-start_time:.2f}s')

print('Finished Training')


Epoch 1/10, Loss: 0.6745, Time: 0.14s
Epoch 2/10, Loss: 0.5714, Time: 0.03s
Epoch 3/10, Loss: 0.4862, Time: 0.03s
Epoch 4/10, Loss: 0.3199, Time: 0.03s
Epoch 5/10, Loss: 0.1926, Time: 0.03s
Epoch 6/10, Loss: 0.1658, Time: 0.03s
Epoch 7/10, Loss: 0.1045, Time: 0.03s
Epoch 8/10, Loss: 0.0330, Time: 0.03s
Epoch 9/10, Loss: 0.0149, Time: 0.03s
Epoch 10/10, Loss: 0.0130, Time: 0.03s
Finished Training


In [37]:
def evaluate(model, iterator, criterion):
    model.eval()
    total_loss = 0
    total_correct = 0
    with torch.no_grad():
        for texts, labels in iterator:
            texts = texts.permute(1, 0)  # Transformer expects (seq_len, batch_size)
            output = model(texts)
            loss = criterion(output, labels)
            total_loss += loss.item()
            _, predicted = torch.max(output, 1)
            total_correct += (predicted == labels).sum().item()
    return total_loss / len(iterator), total_correct / len(test_dataset)

test_loss, test_acc = evaluate(model, test_loader, criterion)
print(f'Loss on the test dataset: {test_loss:.4f}, Accuracy: {test_acc * 100:.2f}%')


Loss on the test dataset: 0.1262, Accuracy: 100.00%


## Conclusion :
This project demonstrates the basics of using Transformer models for text classification with PyTorch.
In this project, following has been done:

Created a simple synthetic dataset for text classification.
Defined and trained a Transformer model using PyTorch.
Evaluated the model's performance on the test dataset.
