<a href="https://colab.research.google.com/github/ShreePurvaja/Data-Science-Notes/blob/main/RNN%2CLSTM%2CTranformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 22-04-2025

In [1]:
!pip uninstall torchtext
!pip install torchtext==0.17.1

[0mCollecting torchtext==0.17.1
  Downloading torchtext-0.17.1-cp311-cp311-manylinux1_x86_64.whl.metadata (7.6 kB)
Collecting torch==2.2.1 (from torchtext==0.17.1)
  Downloading torch-2.2.1-cp311-cp311-manylinux1_x86_64.whl.metadata (26 kB)
Collecting torchdata==0.7.1 (from torchtext==0.17.1)
  Downloading torchdata-0.7.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.2.1->torchtext==0.17.1)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch==2.2.1->torchtext==0.17.1)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch==2.2.1->torchtext==0.17.1)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch==2.2.1->to

In [14]:
import torch
from torchtext.data.utils import get_tokenizer
from collections import Counter
from torchtext.vocab import build_vocab_from_iterator


# Sample text data
text = """Nethaji Nirmal is a well-known educator and data science expert associated with GUVI, an online learning platform that focuses on upskilling students in areas like programming, data science, and full stack development. At GUVI, he plays a key role in conducting webinars, live classes, and project-based sessions for learners pursuing Python programming, data extraction, machine learning, and big data technologies.
  Nirmal is appreciated for his hands-on teaching approach,
  making complex concepts easier through real-time coding
  demonstrations and practical examples. He regularly conducts sessions on topics like
 “Introduction to Python,” “Data Extraction using Python,” and “Big Data Essentials,”
  helping learners understand the industry applications of these technologies.
  His workshops often cover real-world scenarios such as using APIs, handling large
   datasets, automating tasks, and deploying applications.
  Beyond his work at GUVI, Nirmal is also a contributor on
  GitHub and Hugging Face, where he shares projects related to
 data analytics, Twitter scraping, machine learning models,
 and deployment using tools like Docker. As the co-founder of Webdojo,
 he promotes project-based learning and industry readiness among students.
 Nethaji Nirmal’s passion for teaching, combined with his technical expertise,
 has made him a respected figure in the Indian edtech community,
  especially among those aspiring to enter the field of data science."""


# Tokenize the text
tokenizer = get_tokenizer('basic_english')
tokens = tokenizer(text.lower())


# Build the vocabulary
counter = Counter(tokens)
vocab = build_vocab_from_iterator([tokens], specials=['<unk>', '<pad>'])
vocab.set_default_index(vocab['<unk>'])


# Numericalize the text
data = [vocab[token] for token in tokens]


# Convert data to tensors and create batches
seq_length = 30
def create_batches(data, seq_length):
   sequences = [data[i:i+seq_length] for i in range(0, len(data)-seq_length)]
   inputs = torch.tensor([seq[:-1] for seq in sequences], dtype=torch.long)
   targets = torch.tensor([seq[-1] for seq in sequences], dtype=torch.long)
   return inputs, targets


inputs, targets = create_batches(data, seq_length)
train_data, val_data = inputs[:int(0.8*len(inputs))], inputs[int(0.8*len(inputs)):]
train_targets, val_targets = targets[:int(0.8*len(targets))], targets[int(0.8*len(targets)):]


# DataLoader
train_dataset = torch.utils.data.TensorDataset(train_data, train_targets)
val_dataset = torch.utils.data.TensorDataset(val_data, val_targets)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=20, shuffle=True)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=20)


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


In [7]:
#RNN model
import torch.nn as nn
import torch.optim as optim


class RNNModel(nn.Module):
   def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
       super(RNNModel, self).__init__()
       self.embedding = nn.Embedding(vocab_size, embedding_dim)
       self.rnn = nn.RNN(embedding_dim, hidden_dim, batch_first=True)
       self.fc = nn.Linear(hidden_dim, output_dim)


   def forward(self, x):
       embedded = self.embedding(x)
       output, hidden = self.rnn(embedded)
       output = self.fc(output[:, -1, :])
       return output


# Initialize the model, criterion, and optimizer
vocab_size = len(vocab)
embedding_dim = 100
hidden_dim = 100
output_dim = vocab_size


model = RNNModel(vocab_size, embedding_dim, hidden_dim, output_dim).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())


In [17]:
# LSTM instead of RNN

class LSTMModel(nn.Module):
   def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
       super(LSTMModel, self).__init__()
       self.embedding = nn.Embedding(vocab_size, embedding_dim)
       self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
       self.fc = nn.Linear(hidden_dim, output_dim)


   def forward(self, x):
       embedded = self.embedding(x)
       output, (hidden, cell) = self.lstm(embedded)
       output = self.fc(output[:, -1, :])
       return output


# Initialize the LSTM model
model = LSTMModel(vocab_size, embedding_dim, hidden_dim, output_dim).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())


# # Train the LSTM model
# for epoch in range(1, 201):
#    train_loss = train_epoch(train_loader, model, criterion, optimizer)
#    val_loss = evaluate(val_loader, model, criterion)
#    print(f'Epoch {epoch}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}')


In [20]:
# tranformer model

import torch.nn as nn
import torch.optim as optim
import math


class PositionalEncoding(nn.Module):
   def __init__(self, d_model, dropout=0.1, max_len=5000):
       super(PositionalEncoding, self).__init__()
       self.dropout = nn.Dropout(p=dropout)
       pe = torch.zeros(max_len, d_model)
       position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
       div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
       pe[:, 0::2] = torch.sin(position * div_term)
       pe[:, 1::2] = torch.cos(position * div_term)
       pe = pe.unsqueeze(0).transpose(0, 1)
       self.register_buffer('pe', pe)


   def forward(self, x):
       x = x + self.pe[:x.size(0), :]
       return self.dropout(x)


class TransformerModel(nn.Module):
   def __init__(self, vocab_size, embedding_dim, nhead, num_encoder_layers, hidden_dim, output_dim):
       super(TransformerModel, self).__init__()
       self.embedding = nn.Embedding(vocab_size, embedding_dim)
       self.pos_encoder = PositionalEncoding(embedding_dim)
       encoder_layers = nn.TransformerEncoderLayer(embedding_dim, nhead, hidden_dim)
       self.transformer_encoder = nn.TransformerEncoder(encoder_layers, num_encoder_layers)
       self.fc = nn.Linear(embedding_dim, output_dim)


   def forward(self, x):
       embedded = self.embedding(x) * math.sqrt(embedding_dim)
       embedded = self.pos_encoder(embedded)
       output = self.transformer_encoder(embedded)
       output = self.fc(output[:, -1, :])
       return output


# Initialize the model, criterion, and optimizer
vocab_size = len(vocab)
embedding_dim = 200
nhead = 2
num_encoder_layers = 2
hidden_dim = 200
output_dim = vocab_size


model = TransformerModel(vocab_size, embedding_dim, nhead, num_encoder_layers, hidden_dim, output_dim).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters())






In [21]:
#training and evaluation
def train_epoch(loader, model, criterion, optimizer):
   model.train()
   total_loss = 0
   for inputs, targets in loader:
       inputs, targets = inputs.to(device), targets.to(device)
       optimizer.zero_grad()
       output = model(inputs)
       loss = criterion(output, targets)
       loss.backward()
       optimizer.step()
       total_loss += loss.item()
   return total_loss / len(loader)


def evaluate(loader, model, criterion):
   model.eval()
   total_loss = 0
   with torch.no_grad():
       for inputs, targets in loader:
           inputs, targets = inputs.to(device), targets.to(device)
           output = model(inputs)
           loss = criterion(output, targets)
           total_loss += loss.item()
   return total_loss / len(loader)


for epoch in range(1, 500):
   train_loss = train_epoch(train_loader, model, criterion, optimizer)
   val_loss = evaluate(val_loader, model, criterion)
   print(f'Epoch {epoch}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}')




Epoch 1, Train Loss: 4.9682, Val Loss: 5.0286
Epoch 2, Train Loss: 4.0726, Val Loss: 5.0118
Epoch 3, Train Loss: 3.5331, Val Loss: 4.9265
Epoch 4, Train Loss: 2.9801, Val Loss: 4.9444
Epoch 5, Train Loss: 2.6245, Val Loss: 4.8710
Epoch 6, Train Loss: 2.1849, Val Loss: 4.9939
Epoch 7, Train Loss: 1.7988, Val Loss: 5.1052
Epoch 8, Train Loss: 1.5833, Val Loss: 5.0434
Epoch 9, Train Loss: 1.3294, Val Loss: 5.2427
Epoch 10, Train Loss: 1.2804, Val Loss: 5.3303
Epoch 11, Train Loss: 1.1499, Val Loss: 5.4585
Epoch 12, Train Loss: 1.1375, Val Loss: 5.5557
Epoch 13, Train Loss: 0.9908, Val Loss: 5.5913
Epoch 14, Train Loss: 1.0490, Val Loss: 5.8997
Epoch 15, Train Loss: 0.8612, Val Loss: 5.9574
Epoch 16, Train Loss: 0.9393, Val Loss: 5.9901
Epoch 17, Train Loss: 0.9177, Val Loss: 6.0223
Epoch 18, Train Loss: 0.9318, Val Loss: 6.1076
Epoch 19, Train Loss: 0.9099, Val Loss: 6.1578
Epoch 20, Train Loss: 0.9193, Val Loss: 6.1243
Epoch 21, Train Loss: 0.8877, Val Loss: 6.1860
Epoch 22, Train Loss: 

In [22]:
#testing generation

def generate_text(model, seed_text, vocab, tokenizer, next_words=50, temperature=1.0):
   model.eval()
   tokens = tokenizer(seed_text.lower())
   input_ids = torch.tensor([vocab[token] for token in tokens], dtype=torch.long).unsqueeze(0).to(device)


   generated_text = seed_text
   with torch.no_grad():
       for _ in range(next_words):
           output = model(input_ids)
           output = output.squeeze(0)  # Remove the batch dimension
           output = output / temperature
           probabilities = torch.nn.functional.softmax(output, dim=-1)
           next_token_id = torch.multinomial(probabilities, num_samples=1).item()
           next_token = vocab.lookup_token(next_token_id)
           generated_text += ' ' + next_token
           next_input = torch.tensor([[next_token_id]], dtype=torch.long).to(device)
           input_ids = torch.cat((input_ids, next_input), dim=1)


   return generated_text


seed_text = "Nethaji Nirmal is a"
generated_text = generate_text(model, seed_text, vocab, tokenizer, next_words=50, temperature=1.0)
print(generated_text)


Nethaji Nirmal is a contributor on topics like docker . he plays a contributor on topics like pursuing python , and big data science , ” and practical examples . as using python , ” and deployment using python programming , live classes , and practical examples . at guvi , and deploying applications
