## Week 5 : Generative AI for Language Models (Recurrent neural networks, LSTM)
```
- Generative Artificial Intelligence (Spring semester 2025)
- Professor: Muhammad Fahim
- Teaching Assistant: Ahmad Taha
```
<hr>

## Contents
```
Lab Plan
1. Dataset (SenseEval)
2. Data Preprocessing
3. Recurrent neural networks (PyTorch RNN)
4. Long short-term memory (PyTorch LSTM)
5. Self practice task
```

<hr>

## Recap


In [None]:
!pip install -U torchtext==0.15.2

In [1]:
import torch
from torch import nn
import torch.optim as optim
import pandas as pd
import numpy as np

# Preliminaries for processing the text
from torchtext.data.utils import get_tokenizer
from collections import Counter
from torchtext.vocab import vocab
import torchtext
from torch.utils.data import DataLoader, TensorDataset

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## Basics (Sequences and RNN)


Each rectangle is a vector and arrows represent functions (e.g. matrix multiply). Input vectors are in red, output vectors are in blue and green vectors hold the RNN's state. The core reason that recurrent nets are more exciting is that they allow us to operate over sequences of vectors

![](http://karpathy.github.io/assets/rnn/diags.jpeg)

### Mode details on the RNN cell

![](https://github.com/bentrevett/pytorch-sentiment-analysis/blob/master/assets/sentiment7.png?raw=1)

In [None]:
simple_sequence = torch.Tensor([[0.3,1.9,4.5],[0.4,0.1,0.23],[0.7,0.91,0.43], [0.34,0.01,0.002]])

simple_sequence = simple_sequence.unsqueeze(0)

simple_sequence.shape

The `simple_sequence` variable represents a sequence of length 4, where each element (time-stamp) is represented by a feature vector of length 3.

## Basic RNN layer

In [19]:
simple_rnn_layer = nn.RNN(input_size=3, hidden_size=2, num_layers = 2, bias = True, batch_first=True)

In [None]:
simple_rnn_layer.state_dict()

In [20]:
output_all, output_last = simple_rnn_layer(simple_sequence)

In [None]:
output_last

In [None]:
output_all

### Inside the RNN cell

$$a^{(t)} = b + Wh^{(t-1)} + Ux^{(t)}$$
$$h^{(t)} = tanh(a^{(t)})$$

where the parameters are the bias vectors `b` and `c` along with the weight matrices
`U, V and W`, respectively for input-to-hidden, hidden-to-output and hidden-tohidden connections. <br>
Lets see whats inside Pytorch and compare with our theory

### Inside the RNN cell (1)

$$h_t = tanh(W_{hi}x_t + b_{ih} + W_{hh}h_{(t-1)} + b_{hh})$$

where $h_t$ represents the hidden state at time $t$

In [None]:
wih = simple_rnn_layer.weight_ih_l0.squeeze(0)
whh = simple_rnn_layer.weight_hh_l0.squeeze(0)

bih = simple_rnn_layer.bias_ih_l0
bhh = simple_rnn_layer.bias_hh_l0

x = simple_sequence[0][0] # The first input feature of the first sequence

# Computing thw hidden state for time = 1
h1 = torch.tanh(torch.Tensor(torch.dot(x,wih) + bih  + torch.dot(whh,torch.Tensor([0.0])) + bhh))
h1

In [None]:
x = simple_sequence[0][1]

h2 = torch.tanh(torch.Tensor(torch.dot(x, wih) + bih  + torch.dot(whh,h1) + bhh))
h2

**Task** : Compute all the other hidden states

In [None]:
result = []

h_previous = torch.Tensor([0.0])

for i in range(simple_sequence.shape[1]):
  # TODO: Compute and print the hidden states using the given example


result

In [None]:
# !pip install -U torchtext

## 1. Dataset and Problem statement

In [None]:
## Downloading Dataset

!pip install wget
import wget

#Download and unzip dataset
wget.download("http://alt.qcri.org/semeval2016/task6/data/uploads/stancedataset.zip")

!unzip stancedataset.zip

### 1.2 Read dataset to dataframe

In [4]:
import pandas as pd

train_data = pd.read_csv("./StanceDataset/train.csv", header=0, engine='python' ,encoding = "latin-1", usecols=["Tweet","Target"])
test_data = pd.read_csv("./StanceDataset/test.csv", header=0, engine='python' ,encoding = "latin-1", usecols=["Tweet","Target"])

test_data.query("Target != 'Donald Trump'", inplace=True)

labels_keys = {value: i for i, (value, count) in enumerate(train_data.Target.value_counts().items())}

train_data['Target'] = train_data['Target'].apply(lambda x: labels_keys.get(x))
test_data['Target'] = test_data['Target'].apply(lambda x: labels_keys.get(x))

In [None]:
np.unique(labels_keys.values())

In [None]:
train_data

## 2.Data Preprocessing

[`torchtext`](https://pytorch.org/text/stable/index.html) is a package that consists of data processing utilities and popular datasets for natural language


### 1.3 Clean, tokenize and create Vocab


In [6]:
def clean_ascii(text):
  #remove non-ASCII chars from data
  return ''.join(i for i in text if ord(i) < 128)

train_data['Tweet'] = train_data['Tweet'].apply(clean_ascii)


tokenizer = get_tokenizer('basic_english')
counter = Counter()

for _, row in train_data.iterrows():
  counter.update(tokenizer(row["Tweet"]))


vocablary = vocab(counter, specials=("<pad>","<unk>"), min_freq=1)
vocablary.set_default_index(-1)

In [None]:
list(counter.keys())

In [10]:
from collections import Counter, OrderedDict
unk_token = '<unk>'
default_index = -1
v2 = vocab(OrderedDict([(token, 1) for token in list(counter.keys())]), specials=[unk_token])
v2.set_default_index(default_index)

### 1.4 Padding Data

In [11]:
# Do padding
def data_process(raw_text_iter,max_len=64):
  batch = []
  for item in raw_text_iter:
    res = [v2[token] for token in tokenizer(item)]
    if len(res) > max_len :
      res = res[:max_len]
    if len(res) < max_len :
      res += ([v2["<pad>"]] * (max_len-len(res)))
    batch.append(res)
  pad_data = torch.tensor(batch, dtype=torch.long)
  return pad_data

## Create Dataloaders


<font color='red'>**Task**</font>: Create validation dataloader from train set

In [None]:
max_len = 64
embedding_size = 10
n_classes = len(np.unique(train_data.Target.values))

#Create Dataloader
train_tensor = data_process(train_data.Tweet.values)
tgts_tensor = torch.from_numpy(train_data.Target.values)

# Test
test_tensor = data_process(test_data.Tweet.values)
test_labels = torch.from_numpy(test_data.Target.values)

# Valid
# TODO : Create validation



train_dataset = TensorDataset(train_tensor, tgts_tensor)
test_dataset = TensorDataset(test_tensor, test_labels)

train_iterator = DataLoader(train_dataset, batch_size=32, shuffle=True, pin_memory=True)
test_iterator = DataLoader(test_dataset, batch_size=32, shuffle=True, pin_memory=True)
valid_iterator = None

## Simple RNN

In [None]:
import torch.nn as nn

class RNN(nn.Module):
  def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
    super().__init__()
    # TODO: Define the RNN layer and fully connected layer

    self.embedding_layer = nn.Embedding(input_dim, embedding_dim)
    self.rnn_cell = nn.RNN(...)
    self.fc_layer = nn.Linear(...)

  def forward(self, text):
    """
    Foward pass method
    """
    # TODO: Define the forward method
    return None

### Training Model

<font color='red'>**Task**</font>: Define a function for calculating accuracy

In [None]:
def accuracy_calculator(preds, y):
  """Returns accuracy per batch"""
  # TODO implement a function that returns accuracy per batch
  return 0



<font color='red'>**Task**</font>: Define a training loop for training the RNN model<br>

In [None]:
def train(model, dataloader, optimizer, criterion):
  epoch_loss = 0
  epoch_acc = 0

  model.train()

  for batch in dataloader:
    # TODO: Define a training loop for training the RNN model
    optimizer.zero_grad()
    predictions = None
    loss = None

    acc = accuracy_calculator(predictions, batch.label)

    loss.backward()
    optimizer.step()

    epoch_loss += loss.item()
    epoch_acc += acc.item()


  return epoch_loss / len(dataloader), epoch_acc / len(dataloader)

<font color='red'>**Task**</font>: Define a function for evaluating the model on test set

In [None]:
def evaluate_model(model, data_batches, criterion):
  eval_loss = 0
  eval_acc = 0

  model.eval()

  with torch.no_grad():
    for batch in data_batches:
      # TODO : Define a function for evaluating the model on test set
      predictions = None
      loss = None

      acc = accuracy_calculator(None, None)
      eval_loss += loss.item()
      eval_acc += acc.item()

  return eval_loss / len(data_batches), eval_acc / len(data_batches)

## Train the model

In [None]:
import torch.optim as optim

input_dim = len(vocablary) #input dimension is the dimension of the one-hot vectors
embedding_dim = 100
hidden_dim = 10 #size of the hidden states
output_dim = 1

# TODO: create the RNN model
model = None

# define loss function and optimizer
optimizer = optim.SGD(model.parameters(), lr=1e-3)
criterion = None

#make model instance and send it to training device
model = model.to(device)
criterion = criterion.to(device)

In [None]:
model

## Training loop

In [None]:
epochs = 5

best_valid_loss = float('inf')

for epoch in range(epochs):
  train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
  valid_loss, valid_acc = evaluate_model(model, valid_iterator, criterion)

  if valid_loss < best_valid_loss:
    best_valid_loss = valid_loss
    torch.save(model.state_dict(), 'best-model.pt')

    print(f'Epoch: {epoch+1} , Train [Loss:  {train_loss:.3f}  Acc :{train_acc*100:.2f}], Val.[Loss: {valid_loss:.3f} Acc: {valid_acc*100:.2f}]')

### Load best model for testing

In [None]:
model.load_state_dict(torch.load('best-model.pt')) #Load the best model
test_loss, test_acc = evaluate_model(model, test_iterator, criterion)
print(f'Accuracy on test data : {test_acc*100:.2f}%')

## How to check for vanishing/exploding gradients ?

* This is a problem that involves weights in earlier layers of the network. Why? (hint : stochastic gradient descent)
* The vanishing gradient problem is a problem that causes major difficulty when training a neural network.
* If the gradient is vanishingly small, then the weights update during backpropergation are going to be vanishingly small as well.


**How to detect this?**
1. Monitor the weights i.e use TensorBoard and log the weights
2. Make checkpoints and manualy log to track :
```
for name, param in model.named_parameters():
    print(name, param.grad.norm())
```

# Long short-term memory (LSTM)

![](https://www.researchgate.net/publication/329362532/figure/fig5/AS:699592479870977@1543807253596/Structure-of-the-LSTM-cell-and-equations-that-describe-the-gates-of-an-LSTM-cell.jpg)

## Basics of LSTM

<font color='red'>**Task**</font>: Using
The SemEval-2016 Stance Dataset. train a LSTM model for sentiment analysis. In the dataset there is a column **`sentiment`**, use it as target. To make the task binary classification remove samples in the dataset with label **`other`**

In [None]:
# TODO: Data preprocessing and creation of dataloaders (train, validation & test)

## Define Simple LSTM Model

In [None]:
class SimpleLstm(nn.Module):
  def __init__(self, embedding_dim ,vocab_size , hidden_dim=10, output_dim=1, n_layers=1):
    super().__init__()
    self.hidden_dim = hidden_dim
    self.embedding = nn.Embedding(vocab_size, embedding_dim)
    self.lstm_layer = nn.LSTM(...)

    self.output_layer = nn.Linear(...)

  def forward(self, x):
    batch_size = x.size(0)
    embedded = self.embedding(x)
    outputs, (hidden, cell) = self.lstm_layer(embedded)

    # TODO :
    pred = self.output_layer(...)
    return pred

vocab_size = len(vocablary)
embedding_size = 64
output_dim = None # TODO
model = SimpleLstm(embedding_dim=embedding_size, vocab_size=vocab_size, hidden_dim=10,output_dim=output_dim).to(device).float()

## Training and Evaluating LSTM Model

In [None]:
# TODO : You will need to redefine accuracy_calculator function
def accuracy_calculator(pred, y):
  return None


In [None]:
# Train loop
criterion = None
optimizer = optim.SGD(model.parameters(), lr=1e-3)

criterion = criterion.to(device)
N_EPOCHS = 5

for epoch in range(N_EPOCHS):
  train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
  valid_loss, valid_acc = evaluate_model(model, valid_iterator, criterion)

  print(f'Epoch: {epoch+1} , Train [Loss:  {train_loss:.3f}  Acc :{train_acc*100:.2f}], Val.[Loss: {valid_loss:.3f} Acc: {valid_acc*100:.2f}]')

## <center>Self practice Task</center>

```
1. From Senseval data RNN and LSTM model to do prediction of the tags on ambiguous words
2. Use pretrained embeddings instead of training from scratch with pretrained embeddings (glove or fasttext)
```
**To download Senseval data example**
```
from nltk.corpus import senseval as se
nltk.download('senseval')
```
