## Text classification with Neural NetWorks (RNN, LSTM) - Aris Tsilifonis mtn2323

Different architecture of Neural Networks were utilised to predict the correct class on the AG News Topic Classification Dataset. 1-layer and 2-layers Bidirectional Neural Networks were tested to measure performance. Also, we experimented with tokenization, pretrained embeddings, hyperparameter optimization to increase performance when possible.

In [1]:
"""

A RNN classifier applied to AG_NEWS dataset

Download dataset:
https://www.kaggle.com/datasets/amananandrai/ag-news-classification-dataset

"""

import torch

from torch.utils.data import DataLoader
from torchtext.data import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator
from torch.utils.data.dataset import random_split
from torch import nn
from torch.nn import functional as F
import pandas as pd
from tqdm import tqdm
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import time

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# HYPER-PARAMETERS
MAX_WORDS = 25
EPOCHS = 15
LEARNING_RATE = 1e-3
BATCH_SIZE = 1024
EMBEDDING_DIM = 100
HIDDEN_DIM = 64


# Load Dataset

In [2]:
######################################################################
# Read dataset files
# ------------------

from google.colab import drive
drive.mount('/content/drive')

# Specify the path of the train file in your Google Drive
train_file_path = '/content/drive/My Drive/train.csv'

# Specify the path of the test file in your Google Drive
test_file_path = '/content/drive/My Drive/test.csv'

# Read the file using appropriate methods (e.g., pandas, numpy, etc.)
# Example for reading a CSV file using pandas:

train_data = pd.read_csv(train_file_path)

test_data = pd.read_csv(test_file_path)


Mounted at /content/drive


In [3]:
#print(test_data)
test_data["text"] = test_data['Title'] +" "+ test_data['Description']
#print(test_data["text"])

 # Data Processing

In [4]:
######################################################################
# Data processing
# -----------------------------


tokenizer = get_tokenizer("basic_english")

# All texts are truncated and padded to MAX_WORDS tokens
def collate_batch(batch):
    Y, X = list(zip(*batch))
    Y = torch.tensor(Y) - 1 # Target names in range [0,1,2,3] instead of [1,2,3,4]
    X = [vocab(tokenizer(text)) for text in X]
    # Bringing all samples to MAX_WORDS length. Shorter texts are padded with <PAD> sequences, longer texts are truncated.
    X = [tokens+([vocab['<PAD>']]* (MAX_WORDS-len(tokens))) if len(tokens)<MAX_WORDS else tokens[:MAX_WORDS] for tokens in X]
    return torch.tensor(X, dtype=torch.int32).to(device), Y.to(device)

train_dataset = [(label,train_data['Title'][i] + ' ' + train_data['Description'][i]) for i,label in enumerate(train_data['Class Index'])]
test_dataset = [(label,test_data['Title'][i] + ' ' + test_data['Description'][i]) for i,label in enumerate(test_data['Class Index'])]

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE,
                              shuffle=True, collate_fn=collate_batch)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE,
                              shuffle=False, collate_fn=collate_batch)

target_classes = ["World", "Sports", "Business", "Sci/Tech"]

def build_vocabulary(datasets):
    for dataset in datasets:
        for _, text in dataset:
            yield tokenizer(text)

# Vocabulary includes all tokens with at least 10 occurrences in the texts
# Special tokens <PAD> and <UNK> are used for padding sequences and unknown words respectively
vocab = build_vocab_from_iterator(build_vocabulary([train_dataset, test_dataset]), min_freq=10, specials=["<PAD>","<UNK>"])
vocab.set_default_index(vocab["<UNK>"])


# Define the model and initialize an instance of it

In [5]:
######################################################################
# Define the model
# ----------------


class model(nn.Module):
    def __init__(self,input_dim, embedding_dim, hidden_dim, output_dim):
        super(model, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)
        self.rnn = nn.RNN(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True)
        self.linear = nn.Linear(hidden_dim, output_dim)

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn(embeddings)
        logits = self.linear(output[:,-1])  # The last output of RNN is used for sequence classification
        probs = F.softmax(logits, dim=1)
        return probs

######################################################################
# Initiate an instance of the model
# ---------------------------------


classifier = model(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes)).to(device)
# Define loss function and opimization algorithm
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam([param for param in classifier.parameters() if param.requires_grad == True],lr=LEARNING_RATE)

# Count model parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print('\nModel:')
print(classifier)
print('Total parameters: ',count_parameters(classifier))
print('\n\n')



Model:
model(
  (embedding_layer): Embedding(21254, 100)
  (rnn): RNN(100, 64, batch_first=True)
  (linear): Linear(in_features=64, out_features=4, bias=True)
)
Total parameters:  2136284





# Define functions to train and evaluate the model

In [6]:
######################################################################
# Define functions to train and evaluate the model
# ------------------------------------------------
from collections import defaultdict
def find_missclassfication(y_pred,y_test,test_text):
  missclf_text = []
  missclf_label = []
  misclf_dict = defaultdict(list)
  counter=0
  for i in range(0,len(y_test)):
    if(y_test[i]!= y_pred[i]):
      #a = int(test_text[i])
      #a+=1
      missclf_text.append(test_text[i])
      #b = int(y_test[i])
      #b+=1
      missclf_label.append(y_test[i]+1)
      misclf_dict[y_test[i]+1].append((y_pred[i]+1, test_text[i]))
      counter+=1

  return missclf_text,missclf_label, counter, misclf_dict

def EvaluateModel(model, loss_fn, val_loader, test_texts):
    model.eval()
    with torch.no_grad():
        Y_actual, Y_preds, losses = [],[],[]
        for X, Y in val_loader:
            preds = model(X)
            loss = loss_fn(preds, Y)
            losses.append(loss.item())

            Y_actual.append(Y)
            Y_preds.append(preds.argmax(dim=-1))

        Y_actual = torch.cat(Y_actual)
        Y_preds = torch.cat(Y_preds)


        _,_,_,misclf_data = find_missclassfication(Y_preds.detach().cpu().numpy(),Y_actual.detach().cpu().numpy(),test_texts)


    # Returns mean loss, actual labels, predicted labels
    return torch.tensor(losses).mean(), Y_actual.detach().cpu().numpy(), Y_preds.detach().cpu().numpy(), misclf_data

def TrainModel(model, loss_fn, optimizer, train_loader, epochs):
    epoch_durations = []
    for i in range(1, epochs+1):
        start_time = time.time()  # Start time measurement
        model.train()
        print('Epoch:',i)
        losses = []
        for X, Y in tqdm(train_loader):
            Y_preds = model(X)

            loss = loss_fn(Y_preds, Y)
            losses.append(loss.item())

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        print("Train Loss : {:.3f}".format(torch.tensor(losses).mean()))
        end_time = time.time()  # End timing here
        epoch_duration = end_time - start_time
        print(f"Epoch {i}/{epochs} - Duration: {epoch_duration:.2f} seconds ")
        epoch_durations.append(epoch_duration)
    return epoch_durations


# Train and Evaluate model

In [7]:
epoch_durations1 = TrainModel(classifier, loss_fn, optimizer, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration1 = sum(epoch_durations1) / len(epoch_durations1)
print(f"Average training time per epoch: {average_duration1:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data1 = EvaluateModel(classifier, loss_fn, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))

Epoch: 1


100%|██████████| 118/118 [00:05<00:00, 21.82it/s]


Train Loss : 1.309
Epoch 1/15 - Duration: 5.43 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 28.62it/s]


Train Loss : 1.068
Epoch 2/15 - Duration: 4.13 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 26.96it/s]


Train Loss : 0.972
Epoch 3/15 - Duration: 4.38 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 28.56it/s]


Train Loss : 0.929
Epoch 4/15 - Duration: 4.14 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 28.67it/s]


Train Loss : 0.904
Epoch 5/15 - Duration: 4.12 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 27.10it/s]


Train Loss : 0.889
Epoch 6/15 - Duration: 4.36 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 28.48it/s]


Train Loss : 0.877
Epoch 7/15 - Duration: 4.15 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 27.77it/s]


Train Loss : 0.868
Epoch 8/15 - Duration: 4.26 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 27.71it/s]


Train Loss : 0.861
Epoch 9/15 - Duration: 4.26 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 29.20it/s]


Train Loss : 0.857
Epoch 10/15 - Duration: 4.04 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 29.07it/s]


Train Loss : 0.850
Epoch 11/15 - Duration: 4.07 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 27.56it/s]


Train Loss : 0.847
Epoch 12/15 - Duration: 4.29 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 28.98it/s]


Train Loss : 0.842
Epoch 13/15 - Duration: 4.08 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 27.61it/s]


Train Loss : 0.838
Epoch 14/15 - Duration: 4.28 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 28.33it/s]


Train Loss : 0.837
Epoch 15/15 - Duration: 4.17 seconds 
Average training time per epoch: 4.28 seconds

Test Accuracy : 0.867

Classification Report : 
              precision    recall  f1-score   support

       World       0.89      0.86      0.87      1900
      Sports       0.92      0.94      0.93      1900
    Business       0.84      0.81      0.83      1900
    Sci/Tech       0.83      0.85      0.84      1900

    accuracy                           0.87      7600
   macro avg       0.87      0.87      0.87      7600
weighted avg       0.87      0.87      0.87      7600


Confusion Matrix : 
[[1641   75  118   66]
 [  61 1784   10   45]
 [  91   32 1546  231]
 [  59   57  164 1620]]


# Bidirectional 1-layer RecurrentNN

In [8]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class BiRNN1(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        super(BiRNN1, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)
        self.rnn = nn.RNN(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True, bidirectional=True, num_layers=1)
        self.linear = nn.Linear(2 * hidden_dim, output_dim)  # 2 * hidden_dim because it's bidirectional

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn(embeddings)
        final_hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)
        logits = self.linear(final_hidden)
        #logits = self.linear(output[:, -1, :])  # Directly use the last timestep's output
        probs = F.softmax(logits, dim=1)
        return probs

# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier2 = BiRNN1(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes)).to(device)
loss_fn2 = nn.CrossEntropyLoss().to(device)
optimizer2 = torch.optim.Adam([param for param in classifier2.parameters() if param.requires_grad], lr=LEARNING_RATE)

# Function to count parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print('\nModel:')
print(classifier2)
print('Total parameters: ', count_parameters(classifier2))
print('\n\n')



Model:
BiRNN1(
  (embedding_layer): Embedding(21254, 100)
  (rnn): RNN(100, 64, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  2147164





# Train and Evaluate model

In [9]:
epoch_durations2 = TrainModel(classifier2, loss_fn2, optimizer2, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration2 = sum(epoch_durations2) / len(epoch_durations2)
print(f"Average training time per epoch: {average_duration2:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------


_, Y_actual, Y_preds, misclf_data2 = EvaluateModel(classifier2, loss_fn2, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))

Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 27.60it/s]


Train Loss : 1.235
Epoch 1/15 - Duration: 4.28 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 26.20it/s]


Train Loss : 0.999
Epoch 2/15 - Duration: 4.51 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 27.81it/s]


Train Loss : 0.927
Epoch 3/15 - Duration: 4.25 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 28.11it/s]


Train Loss : 0.894
Epoch 4/15 - Duration: 4.20 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 27.03it/s]


Train Loss : 0.873
Epoch 5/15 - Duration: 4.37 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 27.99it/s]


Train Loss : 0.859
Epoch 6/15 - Duration: 4.22 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 26.65it/s]


Train Loss : 0.849
Epoch 7/15 - Duration: 4.43 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 26.97it/s]


Train Loss : 0.841
Epoch 8/15 - Duration: 4.38 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 28.00it/s]


Train Loss : 0.835
Epoch 9/15 - Duration: 4.22 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 27.41it/s]


Train Loss : 0.829
Epoch 10/15 - Duration: 4.31 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 26.55it/s]


Train Loss : 0.825
Epoch 11/15 - Duration: 4.45 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 27.92it/s]


Train Loss : 0.821
Epoch 12/15 - Duration: 4.23 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 27.20it/s]


Train Loss : 0.817
Epoch 13/15 - Duration: 4.35 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 26.57it/s]


Train Loss : 0.814
Epoch 14/15 - Duration: 4.45 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 27.59it/s]


Train Loss : 0.812
Epoch 15/15 - Duration: 4.28 seconds 
Average training time per epoch: 4.33 seconds

Test Accuracy : 0.879

Classification Report : 
              precision    recall  f1-score   support

       World       0.91      0.85      0.88      1900
      Sports       0.93      0.95      0.94      1900
    Business       0.85      0.84      0.84      1900
    Sci/Tech       0.83      0.88      0.85      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600


Confusion Matrix : 
[[1616   68  120   96]
 [  30 1800   35   35]
 [  63   34 1594  209]
 [  58   34  137 1671]]


# Bidirectional 2-layer Recurrent NN

In [10]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class BiRNN2(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        super(BiRNN2, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)
        # Set num_layers=2 for a two-layer network
        self.rnn = nn.RNN(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True, bidirectional=True, num_layers=2)
        self.linear = nn.Linear(2 * hidden_dim, output_dim)  # Still 2 * hidden_dim because it's bidirectional

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn(embeddings)
        # Use the last timestep's output from the last layer
        final_hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)
        logits = self.linear(final_hidden)
        probs = F.softmax(logits, dim=1)
        return probs

# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier3 = BiRNN2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes)).to(device)
loss_fn3 = nn.CrossEntropyLoss().to(device)
optimizer3 = torch.optim.Adam(classifier3.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier3)
print('Total parameters: ', count_parameters(classifier3))
print('\n\n')



Model:
BiRNN2(
  (embedding_layer): Embedding(21254, 100)
  (rnn): RNN(100, 64, num_layers=2, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  2171996





In [11]:
epoch_durations3=TrainModel(classifier3, loss_fn3, optimizer3, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration3 = sum(epoch_durations3) / len(epoch_durations3)
print(f"Average training time per epoch: {average_duration3:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data3 = EvaluateModel(classifier3, loss_fn3, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))

Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 25.37it/s]


Train Loss : 1.207
Epoch 1/15 - Duration: 4.66 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 25.86it/s]


Train Loss : 0.996
Epoch 2/15 - Duration: 4.57 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 26.77it/s]


Train Loss : 0.930
Epoch 3/15 - Duration: 4.41 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 26.22it/s]


Train Loss : 0.899
Epoch 4/15 - Duration: 4.51 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 26.64it/s]


Train Loss : 0.881
Epoch 5/15 - Duration: 4.43 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 26.01it/s]


Train Loss : 0.866
Epoch 6/15 - Duration: 4.54 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 24.90it/s]


Train Loss : 0.856
Epoch 7/15 - Duration: 4.74 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 26.40it/s]


Train Loss : 0.848
Epoch 8/15 - Duration: 4.47 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 26.23it/s]


Train Loss : 0.841
Epoch 9/15 - Duration: 4.50 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 26.40it/s]


Train Loss : 0.838
Epoch 10/15 - Duration: 4.48 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 25.09it/s]


Train Loss : 0.833
Epoch 11/15 - Duration: 4.71 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 25.26it/s]


Train Loss : 0.828
Epoch 12/15 - Duration: 4.68 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 25.07it/s]


Train Loss : 0.824
Epoch 13/15 - Duration: 4.71 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 26.19it/s]


Train Loss : 0.822
Epoch 14/15 - Duration: 4.51 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 26.76it/s]


Train Loss : 0.818
Epoch 15/15 - Duration: 4.42 seconds 
Average training time per epoch: 4.56 seconds

Test Accuracy : 0.873

Classification Report : 
              precision    recall  f1-score   support

       World       0.89      0.87      0.88      1900
      Sports       0.94      0.94      0.94      1900
    Business       0.82      0.84      0.83      1900
    Sci/Tech       0.84      0.84      0.84      1900

    accuracy                           0.87      7600
   macro avg       0.87      0.87      0.87      7600
weighted avg       0.87      0.87      0.87      7600


Confusion Matrix : 
[[1651   55  117   77]
 [  44 1780   45   31]
 [  76   28 1604  192]
 [  82   29  186 1603]]


# 1-layer LongShortTermMemory

In [12]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class LSTM1(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        super(LSTM1, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)
        # Using LSTM instead of RNN
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True)
        self.linear = nn.Linear(hidden_dim, output_dim)

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        # output from LSTM; lstm_out is the last output state
        lstm_out, (hidden, cell) = self.lstm(embeddings)
        # We use the last hidden state for the last time step; hidden[-1] refers to the last layer's last hidden state
        logits = self.linear(hidden[-1])  # hidden[-1] has shape (batch, hidden_dim)
        probs = F.softmax(logits, dim=1)
        return probs

# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier4 = LSTM1(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes)).to(device)
loss_fn4 = nn.CrossEntropyLoss().to(device)
optimizer4 = torch.optim.Adam(classifier4.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier4)
print('Total parameters: ', count_parameters(classifier4))
print('\n\n')




Model:
LSTM1(
  (embedding_layer): Embedding(21254, 100)
  (lstm): LSTM(100, 64, batch_first=True)
  (linear): Linear(in_features=64, out_features=4, bias=True)
)
Total parameters:  2168156





In [13]:
epoch_durations4 = TrainModel(classifier4, loss_fn4, optimizer4, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration4 = sum(epoch_durations4) / len(epoch_durations4)
print(f"Average training time per epoch: {average_duration4:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data4 = EvaluateModel(classifier4, loss_fn4, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))

Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 26.80it/s]


Train Loss : 1.249
Epoch 1/15 - Duration: 4.41 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 26.12it/s]


Train Loss : 0.977
Epoch 2/15 - Duration: 4.52 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 25.90it/s]


Train Loss : 0.909
Epoch 3/15 - Duration: 4.56 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 26.50it/s]


Train Loss : 0.881
Epoch 4/15 - Duration: 4.46 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 26.35it/s]


Train Loss : 0.863
Epoch 5/15 - Duration: 4.48 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 27.12it/s]


Train Loss : 0.850
Epoch 6/15 - Duration: 4.35 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 25.39it/s]


Train Loss : 0.842
Epoch 7/15 - Duration: 4.65 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 26.96it/s]


Train Loss : 0.835
Epoch 8/15 - Duration: 4.38 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 26.15it/s]


Train Loss : 0.830
Epoch 9/15 - Duration: 4.52 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 25.50it/s]


Train Loss : 0.824
Epoch 10/15 - Duration: 4.63 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 27.35it/s]


Train Loss : 0.821
Epoch 11/15 - Duration: 4.32 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 26.31it/s]


Train Loss : 0.818
Epoch 12/15 - Duration: 4.49 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 26.80it/s]


Train Loss : 0.814
Epoch 13/15 - Duration: 4.41 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 27.14it/s]


Train Loss : 0.812
Epoch 14/15 - Duration: 4.35 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 25.31it/s]


Train Loss : 0.810
Epoch 15/15 - Duration: 4.67 seconds 
Average training time per epoch: 4.48 seconds

Test Accuracy : 0.879

Classification Report : 
              precision    recall  f1-score   support

       World       0.88      0.89      0.88      1900
      Sports       0.94      0.93      0.93      1900
    Business       0.85      0.83      0.84      1900
    Sci/Tech       0.85      0.87      0.86      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600


Confusion Matrix : 
[[1685   55   93   67]
 [  63 1759   50   28]
 [  85   31 1581  203]
 [  83   30  135 1652]]


# Bidirectional 1-layer LSTM

In [14]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class BiLSTM1(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        super(BiLSTM1, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)
        # Using a bidirectional LSTM
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True, bidirectional=True)
        # Adjust linear layer to handle bidirectional output
        self.linear = nn.Linear(2 * hidden_dim, output_dim)

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        # output from LSTM; lstm_out is the last output state
        lstm_out, (hidden, cell) = self.lstm(embeddings)
        # Concatenate the hidden states for both directions
        hidden_forward = hidden[-2,:,:]  # Forward direction of the last layer
        hidden_backward = hidden[-1,:,:]  # Backward direction of the last layer
        hidden_concat = torch.cat((hidden_forward, hidden_backward), dim=1)
        logits = self.linear(hidden_concat)
        probs = F.softmax(logits, dim=1)
        return probs

# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier5 = BiLSTM1(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes)).to(device)
loss_fn5 = nn.CrossEntropyLoss().to(device)
optimizer5 = torch.optim.Adam(classifier5.parameters(), lr=LEARNING_RATE)

# Function to count parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print('\nModel:')
print(classifier5)
print('Total parameters: ', count_parameters(classifier5))
print('\n\n')



Model:
BiLSTM1(
  (embedding_layer): Embedding(21254, 100)
  (lstm): LSTM(100, 64, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  2210908





In [15]:
epoch_durations5 = TrainModel(classifier5, loss_fn5, optimizer5, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration5 = sum(epoch_durations5) / len(epoch_durations5)
print(f"Average training time per epoch: {average_duration5:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data5 = EvaluateModel(classifier5, loss_fn5, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))

Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 24.56it/s]


Train Loss : 1.205
Epoch 1/15 - Duration: 4.81 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 24.54it/s]


Train Loss : 0.935
Epoch 2/15 - Duration: 4.81 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 24.62it/s]


Train Loss : 0.883
Epoch 3/15 - Duration: 4.80 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 24.12it/s]


Train Loss : 0.860
Epoch 4/15 - Duration: 4.90 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 24.53it/s]


Train Loss : 0.845
Epoch 5/15 - Duration: 4.81 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 24.96it/s]


Train Loss : 0.834
Epoch 6/15 - Duration: 4.73 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 24.82it/s]


Train Loss : 0.826
Epoch 7/15 - Duration: 4.76 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 24.87it/s]


Train Loss : 0.820
Epoch 8/15 - Duration: 4.75 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 24.15it/s]


Train Loss : 0.814
Epoch 9/15 - Duration: 4.89 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 24.50it/s]


Train Loss : 0.809
Epoch 10/15 - Duration: 4.82 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 23.89it/s]


Train Loss : 0.807
Epoch 11/15 - Duration: 4.94 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 24.52it/s]


Train Loss : 0.804
Epoch 12/15 - Duration: 4.82 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 24.48it/s]


Train Loss : 0.801
Epoch 13/15 - Duration: 4.83 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 25.07it/s]


Train Loss : 0.799
Epoch 14/15 - Duration: 4.71 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 24.68it/s]


Train Loss : 0.798
Epoch 15/15 - Duration: 4.79 seconds 
Average training time per epoch: 4.81 seconds

Test Accuracy : 0.886

Classification Report : 
              precision    recall  f1-score   support

       World       0.92      0.87      0.89      1900
      Sports       0.92      0.96      0.94      1900
    Business       0.84      0.86      0.85      1900
    Sci/Tech       0.86      0.86      0.86      1900

    accuracy                           0.89      7600
   macro avg       0.89      0.89      0.89      7600
weighted avg       0.89      0.89      0.89      7600


Confusion Matrix : 
[[1649   80  106   65]
 [  21 1816   36   27]
 [  62   36 1631  171]
 [  55   41  168 1636]]


## Bidirectional 2-layer LSTM

In [16]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class BiLSTM2(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
        super(BiLSTM2, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)
        # Using a 2-layer bidirectional LSTM
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True, bidirectional=True, num_layers=2)
        # Adjust linear layer to handle bidirectional output from the top layer
        self.linear = nn.Linear(2 * hidden_dim, output_dim)

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        # output from LSTM; lstm_out is the last output state
        lstm_out, (hidden, cell) = self.lstm(embeddings)
        # We use the hidden states from the last layer (both forward and backward)
        hidden_forward = hidden[-2,:,:]  # Forward direction of the last layer
        hidden_backward = hidden[-1,:,:]  # Backward direction of the last layer
        hidden_concat = torch.cat((hidden_forward, hidden_backward), dim=1)
        logits = self.linear(hidden_concat)
        probs = F.softmax(logits, dim=1)
        return probs

# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier6 = BiLSTM2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes)).to(device)
loss_fn6 = nn.CrossEntropyLoss().to(device)
optimizer6 = torch.optim.Adam(classifier6.parameters(), lr=LEARNING_RATE)

# Function to count parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print('\nModel:')
print(classifier6)
print('Total parameters: ', count_parameters(classifier6))
print('\n\n')



Model:
BiLSTM2(
  (embedding_layer): Embedding(21254, 100)
  (lstm): LSTM(100, 64, num_layers=2, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  2310236





In [17]:
epoch_durations6 = TrainModel(classifier6, loss_fn6, optimizer6, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration6 = sum(epoch_durations6) / len(epoch_durations6)
print(f"Average training time per epoch: {average_duration6:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data6  = EvaluateModel(classifier6, loss_fn6, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))

Epoch: 1


100%|██████████| 118/118 [00:05<00:00, 22.89it/s]


Train Loss : 1.152
Epoch 1/15 - Duration: 5.16 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 23.99it/s]


Train Loss : 0.926
Epoch 2/15 - Duration: 4.92 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 24.34it/s]


Train Loss : 0.881
Epoch 3/15 - Duration: 4.85 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 24.12it/s]


Train Loss : 0.861
Epoch 4/15 - Duration: 4.90 seconds 
Epoch: 5


100%|██████████| 118/118 [00:05<00:00, 23.18it/s]


Train Loss : 0.848
Epoch 5/15 - Duration: 5.10 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 24.54it/s]


Train Loss : 0.840
Epoch 6/15 - Duration: 4.82 seconds 
Epoch: 7


100%|██████████| 118/118 [00:05<00:00, 23.57it/s]


Train Loss : 0.833
Epoch 7/15 - Duration: 5.01 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 24.08it/s]


Train Loss : 0.828
Epoch 8/15 - Duration: 4.91 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 23.83it/s]


Train Loss : 0.822
Epoch 9/15 - Duration: 4.96 seconds 
Epoch: 10


100%|██████████| 118/118 [00:05<00:00, 22.90it/s]


Train Loss : 0.817
Epoch 10/15 - Duration: 5.16 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 23.84it/s]


Train Loss : 0.813
Epoch 11/15 - Duration: 4.95 seconds 
Epoch: 12


100%|██████████| 118/118 [00:05<00:00, 22.60it/s]


Train Loss : 0.811
Epoch 12/15 - Duration: 5.23 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 23.88it/s]


Train Loss : 0.810
Epoch 13/15 - Duration: 4.95 seconds 
Epoch: 14


100%|██████████| 118/118 [00:05<00:00, 23.57it/s]


Train Loss : 0.806
Epoch 14/15 - Duration: 5.01 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 24.40it/s]


Train Loss : 0.805
Epoch 15/15 - Duration: 4.84 seconds 
Average training time per epoch: 4.98 seconds

Test Accuracy : 0.883

Classification Report : 
              precision    recall  f1-score   support

       World       0.92      0.86      0.89      1900
      Sports       0.94      0.95      0.94      1900
    Business       0.86      0.83      0.84      1900
    Sci/Tech       0.83      0.89      0.86      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600


Confusion Matrix : 
[[1637   65  113   85]
 [  32 1796   39   33]
 [  59   24 1582  235]
 [  57   33  115 1695]]


# Results

|  | 1RNN | 1-BiRNN | 2-BiRNN | 1LSTM | 1-BiLSTM |2-BiLSTM  |
|:---------|:--------:|---------:|---------:|---------:|---------:|---------:|
|  Accuracy   |  0.867   |  0.879  |0.873     |0.879     |0.886     |0.883 |
|  Parameters   |  2136284   |  2147164  |217199    |2168156     |2210908    |2310236     |
|  Time Cost(per epoch)   |  4.28 seconds  |  4.33 seconds   |4.56 seconds     |4.48 seconds     | 4.81 seconds    |4.98 seconds     |

Single-layer models (1RNN, 1LSTM, 1-BiRNN, 1-BiLSTM) generally perform well, but their bidirectional version outperforms them. This happens because bidirectional models process data in both directions and can understand the context better in the input sequence.

Two-layer models (2-BiRNN, 2-BiLSTM) show a mix of results. The two-layer bidirectional LSTM (2-BiLSTM) has slightly lower accuracy than the single-layer version (1-BiLSTM). This could be due to the added complexity constituting the model slightly harder to train effectively within the same number of epochs or training parameters.

Increasing the number of layers or making the model bidirectional usually increases the number of parameters. More parameters can capture more complex patterns but also make the model prone to overfitting without regularizing properly or providing more training data.
The LSTM-architecture models have more parameters than RNN-architecture models due to the additional gates (input, forget, output) in LSTMs.

The (average) time per epoch tends to increase when the number of parameters is increasing. This happens because they are more complex models. Single-layer models are faster per epoch than their two-layer counterparts. Bidirectional models require more time per  epoch than their unidirectional counterparts because they process data in poth directions.  

Adding layers can potentially improve accuracy since the model can learn more abstract  information but it can result to overfitting if they training data does not support the complexity. On this experiment, two-layer models do not consistently outperform the single layer ones bacause the training dataset and training duration might not be appropriate.



# Detect misclassifications

Data from all of the previous models were used

In [18]:
print(misclf_data1)



In [19]:
# Access and print all keys
keys = misclf_data1.keys()
print(keys)
keys = misclf_data2.keys()
print(keys)
keys = misclf_data3.keys()
print(keys)
keys = misclf_data4.keys()
print(keys)
keys = misclf_data5.keys()
print(keys)
keys = misclf_data6.keys()
print(keys)

dict_keys([4, 1, 2, 3])
dict_keys([4, 1, 3, 2])
dict_keys([3, 4, 2, 1])
dict_keys([4, 1, 3, 2])
dict_keys([3, 4, 1, 2])
dict_keys([3, 4, 1, 2])


In [20]:
def find_common_texts_across_models(common_classes, *models):
    common_texts_per_class = {}
    for class_label in common_classes:
        # Initialize list of sets containing texts for each model
        model_texts = [set((text for _, text in model[class_label])) for model in models if class_label in model]

        # Find intersection of all sets (common texts across all models)
        common_texts = set.intersection(*model_texts) if model_texts else set()

        # Store the common texts for this class
        common_texts_per_class[class_label] = list(common_texts)

    return common_texts_per_class

a=common_texts_per_class = find_common_texts_across_models(
    [1,2,3,4],
    misclf_data1,
    misclf_data2,
    misclf_data3,
    misclf_data4,
    misclf_data5,
    misclf_data6
)

print("Texts that were missclassified by all of the models, displayed for each class:")
print("World",len(a[1]))
print("Sports",len(a[2]))
print("Business",len(a[3]))
print("Sci/Tech",len(a[4]))

Texts that were missclassified by all of the models, displayed for each class:
World 113
Sports 16
Business 118
Sci/Tech 73


In [21]:
import random

def display_random_common_text_and_predictions(common_texts_per_class, *models):
    # Filter classes with at least one common text
    classes_with_texts = {class_label: texts for class_label, texts in common_texts_per_class.items() if texts}

    if not classes_with_texts:
        print("No common misclassified texts found across all models.")
        return

    # Randomly select a class that has misclassified texts
    chosen_class = random.choice(list(classes_with_texts.keys()))
    # Randomly select a text from this class
    chosen_text = random.choice(classes_with_texts[chosen_class])

    print(f"Randomly selected misclassified text from Class {chosen_class}: '{chosen_text}'")

    # Display predictions for the selected text from each model
    for idx, model in enumerate(models, 1):
        # Find the prediction for the chosen text in the current model's data for the chosen class
        prediction = [pred for pred, text in model.get(chosen_class, []) if text == chosen_text]
        if prediction:
            print(f"Model {idx} predicted as: {prediction[0]}")
        else:
            print(f"Model {idx} did not misclassify this text.")

    return chosen_class, chosen_text



# Call the function to display a random text and its predictions
misclass, txt = display_random_common_text_and_predictions(a, misclf_data1, misclf_data2, misclf_data3, misclf_data4, misclf_data5, misclf_data6)


Randomly selected misclassified text from Class 3: 'Judge asked to penalize Microsoft over e-mails Burst.com asked a US judge to penalize Microsoft for destroying e-mails it says the world #39;s largest software company should have preserved as evidence in antitrust suits.'
Model 1 predicted as: 4
Model 2 predicted as: 4
Model 3 predicted as: 4
Model 4 predicted as: 4
Model 5 predicted as: 4
Model 6 predicted as: 4


In [22]:
# Finding an exact match
for idx, text in misclf_data1[misclass]:
    if text == txt:
        print(f"Model 1 found exact match that was predicted as {idx}: {text} ")
        break
else:
    print("No exact match found.")

# Finding an exact match
for idx, text in misclf_data2[misclass]:
    if text == txt:
        print(f"Model 2 found exact match that was predicted as {idx}: {text} ")
        break
else:
    print("No exact match found.")

# Finding an exact match
for idx, text in misclf_data3[misclass]:
    if text == txt:
        print(f"Model 3 found exact match that was predicted as {idx}: {text} ")
        break
else:
    print("No exact match found.")


# Finding an exact match
for idx, text in misclf_data4[misclass]:
    if text == txt:
        print(f"Model 4 found exact match that was predicted as {idx}: {text} ")
        break
else:
    print("No exact match found.")

# Finding an exact match
for idx, text in misclf_data5[misclass]:
    if text == txt:
        print(f"Model 5 found exact match that was predicted as {idx}: {text} ")
        break
else:
    print("No exact match found.")

# Finding an exact match
for idx, text in misclf_data6[misclass]:
    if text == txt:
        print(f"Model 6 found exact match that was predicted as {idx}: {text} ")
        break
else:
    print("No exact match found.")

Model 1 found exact match that was predicted as 4: Judge asked to penalize Microsoft over e-mails Burst.com asked a US judge to penalize Microsoft for destroying e-mails it says the world #39;s largest software company should have preserved as evidence in antitrust suits. 
Model 2 found exact match that was predicted as 4: Judge asked to penalize Microsoft over e-mails Burst.com asked a US judge to penalize Microsoft for destroying e-mails it says the world #39;s largest software company should have preserved as evidence in antitrust suits. 
Model 3 found exact match that was predicted as 4: Judge asked to penalize Microsoft over e-mails Burst.com asked a US judge to penalize Microsoft for destroying e-mails it says the world #39;s largest software company should have preserved as evidence in antitrust suits. 
Model 4 found exact match that was predicted as 4: Judge asked to penalize Microsoft over e-mails Burst.com asked a US judge to penalize Microsoft for destroying e-mails it says 

In [23]:
from collections import Counter

def find_common_texts_across_models(common_classes, *models):
    common_texts_per_class = {}
    for class_label in common_classes:
        # Initialize list of sets containing texts for each model
        model_texts = [set((text for _, text in model[class_label])) for model in models if class_label in model]

        # Find intersection of all sets (common texts across all models)
        common_texts = set.intersection(*model_texts) if model_texts else set()

        # Store the common texts for this class
        common_texts_per_class[class_label] = list(common_texts)

    return common_texts_per_class

def count_common_misclass_pairs(common_texts, models):
    misclass_freqs = Counter()

    # Iterate through each class and their common misclassified texts
    for cls, texts in common_texts.items():
        if not texts:
            continue  # There are no common misclassified texts for this class

        # Count how many times each text is predicted as each label in each model
        for text in texts:
            for model in models:
                for pred_label, model_text in model.get(cls, []):
                    if model_text == text:
                        # Increment the frequency of this (true_label, predicted_label) pair
                        misclass_freqs[(cls, pred_label)] += 1

    # Sort the pairs by frequency in descending order before returning
    sorted_pairs = sorted(misclass_freqs.items(), key=lambda item: item[1], reverse=True)
    return sorted_pairs

# Assuming class labels are 1, 2, 3, 4 representing World, Sports, Business, Sci/Tech respectively

# Example of using these functions together
common_classes = [1, 2, 3, 4]

models = [misclf_data1, misclf_data2, misclf_data3, misclf_data4, misclf_data5, misclf_data6]  # These should be already defined dictionaries as described

# common_texts = find_common_misclassified_texts(models, common_classes)
# sorted_misclass_pairs = count_common_misclass_pairs(common_texts, models)

class_names = {
    1: "World",
    2: "Sports",
    3: "Business",
    4: "Sci/Tech"
}

# Find common misclassified texts
common_texts = find_common_texts_across_models(common_classes, *models)

# Count and sort misclassification pairs
sorted_misclass_pairs = count_common_misclass_pairs(common_texts, models)


# Print the results, now sorted by frequency
print("Most common Misclassification Pairs:")
for (real_class, predicted_class), frequency in sorted_misclass_pairs:
    real_class_name = class_names[real_class]
    predicted_class_name = class_names[predicted_class]
    print(f"Real Class: {real_class_name}, Predicted Class: {predicted_class_name}, Frequency: {frequency}")


Most common Misclassification Pairs:
Real Class: Business, Predicted Class: Sci/Tech, Frequency: 505
Real Class: World, Predicted Class: Business, Frequency: 326
Real Class: Sci/Tech, Predicted Class: Business, Frequency: 302
Real Class: World, Predicted Class: Sports, Frequency: 196
Real Class: World, Predicted Class: Sci/Tech, Frequency: 156
Real Class: Business, Predicted Class: World, Frequency: 139
Real Class: Sci/Tech, Predicted Class: World, Frequency: 96
Real Class: Business, Predicted Class: Sports, Frequency: 64
Real Class: Sports, Predicted Class: Business, Frequency: 46
Real Class: Sci/Tech, Predicted Class: Sports, Frequency: 40
Real Class: Sports, Predicted Class: Sci/Tech, Frequency: 26
Real Class: Sports, Predicted Class: World, Frequency: 24


# MAX WORDS = 50 experiment on dataset loaders



In [24]:
MAX_WORDS = 50
# All texts are truncated and padded to MAX_WORDS tokens
def collate_batch2(batch):
    Y, X = list(zip(*batch))
    Y = torch.tensor(Y) - 1 # Target names in range [0,1,2,3] instead of [1,2,3,4]
    X = [vocab(tokenizer(text)) for text in X]
    # Bringing all samples to MAX_WORDS length. Shorter texts are padded with <PAD> sequences, longer texts are truncated.
    X = [tokens+([vocab['<PAD>']]* (MAX_WORDS-len(tokens))) if len(tokens)<MAX_WORDS else tokens[:MAX_WORDS] for tokens in X]
    return torch.tensor(X, dtype=torch.int32).to(device), Y.to(device)

train_loader2 = DataLoader(train_dataset, batch_size=BATCH_SIZE,
                              shuffle=True, collate_fn=collate_batch2)
test_loader2 = DataLoader(test_dataset, batch_size=BATCH_SIZE,
                              shuffle=False, collate_fn=collate_batch2)


In [25]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier7 = model(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes)).to(device)
loss_fn7 = nn.CrossEntropyLoss().to(device)
optimizer7 = torch.optim.Adam(classifier7.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier7)
print('Total parameters: ', count_parameters(classifier7))
print('\n\n')

epoch_durations7 = TrainModel(classifier7, loss_fn7, optimizer7, train_loader2, EPOCHS)
# Calculate and print the average duration per epoch
average_duration7 = sum(epoch_durations7) / len(epoch_durations7)
print(f"Average training time per epoch: {average_duration7:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data7 = EvaluateModel(classifier7, loss_fn7, test_loader2, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
model(
  (embedding_layer): Embedding(21254, 100)
  (rnn): RNN(100, 64, batch_first=True)
  (linear): Linear(in_features=64, out_features=4, bias=True)
)
Total parameters:  2136284



Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 26.55it/s]


Train Loss : 1.373
Epoch 1/15 - Duration: 4.45 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 25.39it/s]


Train Loss : 1.337
Epoch 2/15 - Duration: 4.65 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 26.68it/s]


Train Loss : 1.305
Epoch 3/15 - Duration: 4.43 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 25.61it/s]


Train Loss : 1.319
Epoch 4/15 - Duration: 4.62 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 26.47it/s]


Train Loss : 1.287
Epoch 5/15 - Duration: 4.46 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 25.36it/s]


Train Loss : 1.238
Epoch 6/15 - Duration: 4.66 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 26.01it/s]


Train Loss : 1.207
Epoch 7/15 - Duration: 4.54 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 25.66it/s]


Train Loss : 1.194
Epoch 8/15 - Duration: 4.60 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 25.87it/s]


Train Loss : 1.182
Epoch 9/15 - Duration: 4.57 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 26.81it/s]


Train Loss : 1.162
Epoch 10/15 - Duration: 4.41 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 26.41it/s]


Train Loss : 1.189
Epoch 11/15 - Duration: 4.47 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 26.09it/s]


Train Loss : 1.216
Epoch 12/15 - Duration: 4.53 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 25.98it/s]


Train Loss : 1.169
Epoch 13/15 - Duration: 4.55 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 26.60it/s]


Train Loss : 1.243
Epoch 14/15 - Duration: 4.44 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 26.22it/s]


Train Loss : 1.276
Epoch 15/15 - Duration: 4.51 seconds 
Average training time per epoch: 4.53 seconds

Test Accuracy : 0.439

Classification Report : 
              precision    recall  f1-score   support

       World       0.48      0.82      0.60      1900
      Sports       0.64      0.10      0.17      1900
    Business       0.49      0.24      0.32      1900
    Sci/Tech       0.36      0.59      0.45      1900

    accuracy                           0.44      7600
   macro avg       0.49      0.44      0.39      7600
weighted avg       0.49      0.44      0.39      7600


Confusion Matrix : 
[[1558   26  101  215]
 [ 941  190   83  686]
 [ 315   33  460 1092]
 [ 437   49  288 1126]]


In [26]:
# Usage example assuming vocab2, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier8 = BiRNN1(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes)).to(device)
loss_fn8 = nn.CrossEntropyLoss().to(device)
optimizer8 = torch.optim.Adam(classifier8.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier8)
print('Total parameters: ', count_parameters(classifier8))
print('\n\n')

epoch_durations8 = TrainModel(classifier8, loss_fn8, optimizer8, train_loader2, EPOCHS)
# Calculate and print the average duration per epoch
average_duration8 = sum(epoch_durations8) / len(epoch_durations8)
print(f"Average training time per epoch: {average_duration8:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data8 = EvaluateModel(classifier8, loss_fn8, test_loader2, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiRNN1(
  (embedding_layer): Embedding(21254, 100)
  (rnn): RNN(100, 64, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  2147164



Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 25.55it/s]


Train Loss : 1.262
Epoch 1/15 - Duration: 4.62 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 24.08it/s]


Train Loss : 1.033
Epoch 2/15 - Duration: 4.91 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 25.30it/s]


Train Loss : 0.952
Epoch 3/15 - Duration: 4.67 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 24.53it/s]


Train Loss : 0.909
Epoch 4/15 - Duration: 4.82 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 24.81it/s]


Train Loss : 0.887
Epoch 5/15 - Duration: 4.76 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 25.59it/s]


Train Loss : 0.872
Epoch 6/15 - Duration: 4.62 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 25.13it/s]


Train Loss : 0.861
Epoch 7/15 - Duration: 4.70 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 25.36it/s]


Train Loss : 0.852
Epoch 8/15 - Duration: 4.66 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 24.23it/s]


Train Loss : 0.846
Epoch 9/15 - Duration: 4.87 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 24.88it/s]


Train Loss : 0.839
Epoch 10/15 - Duration: 4.75 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 25.66it/s]


Train Loss : 0.836
Epoch 11/15 - Duration: 4.60 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 24.93it/s]


Train Loss : 0.832
Epoch 12/15 - Duration: 4.74 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 24.44it/s]


Train Loss : 0.828
Epoch 13/15 - Duration: 4.84 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 25.58it/s]


Train Loss : 0.828
Epoch 14/15 - Duration: 4.62 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 24.92it/s]


Train Loss : 0.824
Epoch 15/15 - Duration: 4.74 seconds 
Average training time per epoch: 4.73 seconds

Test Accuracy : 0.877

Classification Report : 
              precision    recall  f1-score   support

       World       0.89      0.88      0.88      1900
      Sports       0.94      0.94      0.94      1900
    Business       0.84      0.84      0.84      1900
    Sci/Tech       0.85      0.85      0.85      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600


Confusion Matrix : 
[[1665   72  108   55]
 [  50 1793   11   46]
 [ 101   19 1589  191]
 [  60   28  194 1618]]


In [27]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier9 = BiRNN2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes)).to(device)
loss_fn9 = nn.CrossEntropyLoss().to(device)
optimizer9 = torch.optim.Adam(classifier9.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier9)
print('Total parameters: ', count_parameters(classifier9))
print('\n\n')

epoch_durations9= TrainModel(classifier9, loss_fn9, optimizer9, train_loader2, EPOCHS)
# Calculate and print the average duration per epoch
average_duration9 = sum(epoch_durations9) / len(epoch_durations9)
print(f"Average training time per epoch: {average_duration9:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data9 = EvaluateModel(classifier9, loss_fn9, test_loader2, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiRNN2(
  (embedding_layer): Embedding(21254, 100)
  (rnn): RNN(100, 64, num_layers=2, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  2171996



Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 24.13it/s]


Train Loss : 1.229
Epoch 1/15 - Duration: 4.90 seconds 
Epoch: 2


100%|██████████| 118/118 [00:05<00:00, 23.46it/s]


Train Loss : 1.020
Epoch 2/15 - Duration: 5.03 seconds 
Epoch: 3


100%|██████████| 118/118 [00:05<00:00, 23.43it/s]


Train Loss : 0.955
Epoch 3/15 - Duration: 5.04 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 23.85it/s]


Train Loss : 0.923
Epoch 4/15 - Duration: 4.95 seconds 
Epoch: 5


100%|██████████| 118/118 [00:05<00:00, 23.58it/s]


Train Loss : 0.906
Epoch 5/15 - Duration: 5.01 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 23.81it/s]


Train Loss : 0.893
Epoch 6/15 - Duration: 4.96 seconds 
Epoch: 7


100%|██████████| 118/118 [00:05<00:00, 23.41it/s]


Train Loss : 0.881
Epoch 7/15 - Duration: 5.05 seconds 
Epoch: 8


100%|██████████| 118/118 [00:05<00:00, 23.39it/s]


Train Loss : 0.868
Epoch 8/15 - Duration: 5.05 seconds 
Epoch: 9


100%|██████████| 118/118 [00:05<00:00, 23.60it/s]


Train Loss : 0.863
Epoch 9/15 - Duration: 5.01 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 24.02it/s]


Train Loss : 0.867
Epoch 10/15 - Duration: 4.92 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 24.04it/s]


Train Loss : 0.867
Epoch 11/15 - Duration: 4.91 seconds 
Epoch: 12


100%|██████████| 118/118 [00:05<00:00, 23.08it/s]


Train Loss : 0.857
Epoch 12/15 - Duration: 5.12 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 23.93it/s]


Train Loss : 0.857
Epoch 13/15 - Duration: 4.93 seconds 
Epoch: 14


100%|██████████| 118/118 [00:05<00:00, 22.91it/s]


Train Loss : 0.852
Epoch 14/15 - Duration: 5.15 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 24.30it/s]


Train Loss : 0.857
Epoch 15/15 - Duration: 4.86 seconds 
Average training time per epoch: 4.99 seconds

Test Accuracy : 0.871

Classification Report : 
              precision    recall  f1-score   support

       World       0.88      0.87      0.88      1900
      Sports       0.91      0.94      0.92      1900
    Business       0.85      0.83      0.84      1900
    Sci/Tech       0.84      0.84      0.84      1900

    accuracy                           0.87      7600
   macro avg       0.87      0.87      0.87      7600
weighted avg       0.87      0.87      0.87      7600


Confusion Matrix : 
[[1652  110   76   62]
 [  39 1791   33   37]
 [  86   33 1580  201]
 [  91   43  166 1600]]


In [28]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier10 = LSTM1(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes)).to(device)
loss_fn10 = nn.CrossEntropyLoss().to(device)
optimizer10 = torch.optim.Adam(classifier10.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier10)
print('Total parameters: ', count_parameters(classifier10))
print('\n\n')

epoch_durations10 = TrainModel(classifier10, loss_fn10, optimizer10, train_loader2, EPOCHS)
# Calculate and print the average duration per epoch
average_duration10 = sum(epoch_durations10) / len(epoch_durations10)
print(f"Average training time per epoch: {average_duration10:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data10 = EvaluateModel(classifier10, loss_fn10, test_loader2, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
LSTM1(
  (embedding_layer): Embedding(21254, 100)
  (lstm): LSTM(100, 64, batch_first=True)
  (linear): Linear(in_features=64, out_features=4, bias=True)
)
Total parameters:  2168156



Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 23.92it/s]


Train Loss : 1.345
Epoch 1/15 - Duration: 4.94 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 24.43it/s]


Train Loss : 1.133
Epoch 2/15 - Duration: 4.84 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 24.61it/s]


Train Loss : 1.004
Epoch 3/15 - Duration: 4.80 seconds 
Epoch: 4


100%|██████████| 118/118 [00:05<00:00, 23.32it/s]


Train Loss : 0.938
Epoch 4/15 - Duration: 5.06 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 24.41it/s]


Train Loss : 0.913
Epoch 5/15 - Duration: 4.84 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 23.90it/s]


Train Loss : 0.890
Epoch 6/15 - Duration: 4.94 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 24.40it/s]


Train Loss : 0.876
Epoch 7/15 - Duration: 4.84 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 23.70it/s]


Train Loss : 0.868
Epoch 8/15 - Duration: 4.99 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 23.82it/s]


Train Loss : 0.863
Epoch 9/15 - Duration: 4.96 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 24.40it/s]


Train Loss : 0.857
Epoch 10/15 - Duration: 4.84 seconds 
Epoch: 11


100%|██████████| 118/118 [00:05<00:00, 23.29it/s]


Train Loss : 0.852
Epoch 11/15 - Duration: 5.07 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 24.23it/s]


Train Loss : 0.852
Epoch 12/15 - Duration: 4.87 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 24.23it/s]


Train Loss : 0.843
Epoch 13/15 - Duration: 4.87 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 24.13it/s]


Train Loss : 0.839
Epoch 14/15 - Duration: 4.90 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 24.64it/s]


Train Loss : 0.834
Epoch 15/15 - Duration: 4.79 seconds 
Average training time per epoch: 4.90 seconds

Test Accuracy : 0.883

Classification Report : 
              precision    recall  f1-score   support

       World       0.88      0.89      0.88      1900
      Sports       0.92      0.96      0.94      1900
    Business       0.87      0.82      0.85      1900
    Sci/Tech       0.86      0.86      0.86      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600


Confusion Matrix : 
[[1682   81   79   58]
 [  44 1818   14   24]
 [ 104   36 1566  194]
 [  83   34  140 1643]]


In [29]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier11 = BiLSTM1(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes)).to(device)
loss_fn11 = nn.CrossEntropyLoss().to(device)
optimizer11 = torch.optim.Adam(classifier11.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier11)
print('Total parameters: ', count_parameters(classifier11))
print('\n\n')

epoch_durations11 = TrainModel(classifier11, loss_fn11, optimizer11, train_loader2, EPOCHS)
# Calculate and print the average duration per epoch
average_duration11 = sum(epoch_durations11) / len(epoch_durations11)
print(f"Average training time per epoch: {average_duration11:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data11 = EvaluateModel(classifier11, loss_fn11, test_loader2, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiLSTM1(
  (embedding_layer): Embedding(21254, 100)
  (lstm): LSTM(100, 64, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  2210908



Epoch: 1


100%|██████████| 118/118 [00:05<00:00, 21.80it/s]


Train Loss : 1.215
Epoch 1/15 - Duration: 5.42 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 23.82it/s]


Train Loss : 0.933
Epoch 2/15 - Duration: 4.96 seconds 
Epoch: 3


100%|██████████| 118/118 [00:05<00:00, 22.98it/s]


Train Loss : 0.876
Epoch 3/15 - Duration: 5.14 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 23.79it/s]


Train Loss : 0.853
Epoch 4/15 - Duration: 4.97 seconds 
Epoch: 5


100%|██████████| 118/118 [00:05<00:00, 22.73it/s]


Train Loss : 0.840
Epoch 5/15 - Duration: 5.20 seconds 
Epoch: 6


100%|██████████| 118/118 [00:05<00:00, 22.54it/s]


Train Loss : 0.830
Epoch 6/15 - Duration: 5.24 seconds 
Epoch: 7


100%|██████████| 118/118 [00:05<00:00, 22.32it/s]


Train Loss : 0.823
Epoch 7/15 - Duration: 5.29 seconds 
Epoch: 8


100%|██████████| 118/118 [00:05<00:00, 23.01it/s]


Train Loss : 0.818
Epoch 8/15 - Duration: 5.13 seconds 
Epoch: 9


100%|██████████| 118/118 [00:05<00:00, 23.45it/s]


Train Loss : 0.812
Epoch 9/15 - Duration: 5.04 seconds 
Epoch: 10


100%|██████████| 118/118 [00:05<00:00, 23.10it/s]


Train Loss : 0.809
Epoch 10/15 - Duration: 5.11 seconds 
Epoch: 11


100%|██████████| 118/118 [00:05<00:00, 23.52it/s]


Train Loss : 0.805
Epoch 11/15 - Duration: 5.02 seconds 
Epoch: 12


100%|██████████| 118/118 [00:05<00:00, 21.94it/s]


Train Loss : 0.803
Epoch 12/15 - Duration: 5.38 seconds 
Epoch: 13


100%|██████████| 118/118 [00:05<00:00, 23.20it/s]


Train Loss : 0.801
Epoch 13/15 - Duration: 5.09 seconds 
Epoch: 14


100%|██████████| 118/118 [00:05<00:00, 23.15it/s]


Train Loss : 0.799
Epoch 14/15 - Duration: 5.10 seconds 
Epoch: 15


100%|██████████| 118/118 [00:05<00:00, 21.62it/s]


Train Loss : 0.798
Epoch 15/15 - Duration: 5.46 seconds 
Average training time per epoch: 5.17 seconds

Test Accuracy : 0.898

Classification Report : 
              precision    recall  f1-score   support

       World       0.91      0.89      0.90      1900
      Sports       0.94      0.96      0.95      1900
    Business       0.86      0.87      0.87      1900
    Sci/Tech       0.87      0.87      0.87      1900

    accuracy                           0.90      7600
   macro avg       0.90      0.90      0.90      7600
weighted avg       0.90      0.90      0.90      7600


Confusion Matrix : 
[[1684   66   88   62]
 [  31 1826   27   16]
 [  60   24 1656  160]
 [  68   18  154 1660]]


In [30]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier12 = BiLSTM2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes)).to(device)
loss_fr12 = nn.CrossEntropyLoss().to(device)
optimizer12 = torch.optim.Adam(classifier12.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier12)
print('Total parameters: ', count_parameters(classifier12))
print('\n\n')

epoch_durations12 = TrainModel(classifier12, loss_fr12, optimizer12, train_loader2, EPOCHS)
# Calculate and print the average duration per epoch
average_duration12 = sum(epoch_durations12) / len(epoch_durations12)
print(f"Average training time per epoch: {average_duration12:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_datr12 = EvaluateModel(classifier12, loss_fr12, test_loader2, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiLSTM2(
  (embedding_layer): Embedding(21254, 100)
  (lstm): LSTM(100, 64, num_layers=2, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  2310236



Epoch: 1


100%|██████████| 118/118 [00:05<00:00, 19.79it/s]


Train Loss : 1.154
Epoch 1/15 - Duration: 5.97 seconds 
Epoch: 2


100%|██████████| 118/118 [00:05<00:00, 21.17it/s]


Train Loss : 0.916
Epoch 2/15 - Duration: 5.58 seconds 
Epoch: 3


100%|██████████| 118/118 [00:05<00:00, 20.81it/s]


Train Loss : 0.872
Epoch 3/15 - Duration: 5.68 seconds 
Epoch: 4


100%|██████████| 118/118 [00:05<00:00, 19.98it/s]


Train Loss : 0.854
Epoch 4/15 - Duration: 5.91 seconds 
Epoch: 5


100%|██████████| 118/118 [00:05<00:00, 21.15it/s]


Train Loss : 0.840
Epoch 5/15 - Duration: 5.59 seconds 
Epoch: 6


100%|██████████| 118/118 [00:05<00:00, 20.91it/s]


Train Loss : 0.832
Epoch 6/15 - Duration: 5.65 seconds 
Epoch: 7


100%|██████████| 118/118 [00:05<00:00, 21.00it/s]


Train Loss : 0.827
Epoch 7/15 - Duration: 5.63 seconds 
Epoch: 8


100%|██████████| 118/118 [00:05<00:00, 21.94it/s]


Train Loss : 0.822
Epoch 8/15 - Duration: 5.38 seconds 
Epoch: 9


100%|██████████| 118/118 [00:05<00:00, 21.45it/s]


Train Loss : 0.820
Epoch 9/15 - Duration: 5.51 seconds 
Epoch: 10


100%|██████████| 118/118 [00:05<00:00, 21.95it/s]


Train Loss : 0.813
Epoch 10/15 - Duration: 5.38 seconds 
Epoch: 11


100%|██████████| 118/118 [00:05<00:00, 22.15it/s]


Train Loss : 0.810
Epoch 11/15 - Duration: 5.33 seconds 
Epoch: 12


100%|██████████| 118/118 [00:05<00:00, 21.70it/s]


Train Loss : 0.807
Epoch 12/15 - Duration: 5.44 seconds 
Epoch: 13


100%|██████████| 118/118 [00:05<00:00, 21.06it/s]


Train Loss : 0.806
Epoch 13/15 - Duration: 5.61 seconds 
Epoch: 14


100%|██████████| 118/118 [00:05<00:00, 21.92it/s]


Train Loss : 0.802
Epoch 14/15 - Duration: 5.39 seconds 
Epoch: 15


100%|██████████| 118/118 [00:05<00:00, 21.20it/s]


Train Loss : 0.800
Epoch 15/15 - Duration: 5.57 seconds 
Average training time per epoch: 5.57 seconds

Test Accuracy : 0.899

Classification Report : 
              precision    recall  f1-score   support

       World       0.91      0.89      0.90      1900
      Sports       0.95      0.96      0.95      1900
    Business       0.88      0.85      0.87      1900
    Sci/Tech       0.85      0.90      0.87      1900

    accuracy                           0.90      7600
   macro avg       0.90      0.90      0.90      7600
weighted avg       0.90      0.90      0.90      7600


Confusion Matrix : 
[[1694   61   77   68]
 [  28 1817   24   31]
 [  72   11 1622  195]
 [  61   20  117 1702]]


# Results MAX_WORDS=50

|  | 1RNN | 1-BiRNN | 2-BiRNN | 1LSTM | 1-BiLSTM |2-BiLSTM  |
|:---------|:--------:|---------:|---------:|---------:|---------:|---------:|
|  Accuracy   | 0.439     | 0.877  |  0.871  | 0.883  | 0.898   |   0.899|
|  Parameters   | 2136284      | 2147164 |   2171996  | 2168156  | 2210908   |    2310236|
|  Time Cost(per epoch)   |  4.53 seconds|   4.73 seconds|    4.99 seconds|    4.90 seconds|   5.17 seconds|  5.57 seconds|

By setting max_words=50 RNN models perform worse. More specifically, 1RNN's accuracy is reduced dramatically( from 8.867 to 0.439). The other bidirectional versions of RNN have slightly worse accuracy(some tenths only). 1-RNN and 1-BiRNN versions are running several tenths of a second slower (3-6) compared to max_words=25 version. Complexity remains exactly the same( same number of parameters)

On the other hand, increasing the max_words result to higher accuracy of LSTM's versions. Approximately (0.5-2 %). The number of parameters remains steady on this occasion. Finally, they run approximately half a second slower than when max_words were 25.  

In [31]:
print(vocab["lost"])

614


# Initialize embeddings of NNs with GloVe pretrained embeddings glove6B100d

In [32]:
import numpy as np
def load_targeted_glove_embeddings(path, vocabulary):
    embeddings_index = {}
    with open(path, 'r', encoding='utf8') as f:
        for line in f:
            values = line.split()
            word = values[0]
            if word in vocabulary:
                vector = np.asarray(values[1:], dtype='float32')
                embeddings_index[word] = vector
    return embeddings_index

# Specify the path of the train file in your Google Drive
glove_file_path = '/content/drive/My Drive/nlp/nlp2/glove.6B.100d.txt'
embeddings_index = load_targeted_glove_embeddings(glove_file_path, vocab)

# Define dimensions for the embedding
vocab_size = len(vocab)  # Vocab object provides the number of entries directly
EMBEDDING_DIM = 100  # Assuming you're using 100-dimensional GloVe embeddings

# Initialize the embedding matrix with zeros
embedding_matrix = np.zeros((vocab_size, EMBEDDING_DIM))

# Populate the embedding matrix using the .stoi property of the vocab object
for word, idx in vocab.get_stoi().items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        embedding_matrix[idx] = embedding_vector

# Convert the numpy array to a torch tensor
embedding_matrix = torch.tensor(embedding_matrix).float()

In [33]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Model2(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, embedding_matrix=None, freeze=False):
        super(Model2, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)

        # If an embedding matrix is provided, initialize the embeddings with it
        if embedding_matrix is not None:
            if not isinstance(embedding_matrix, torch.Tensor):
                # Convert the NumPy array to a tensor if it's not already a tensor
                embedding_matrix = torch.tensor(embedding_matrix, dtype=torch.float)
            self.embedding_layer.weight.data.copy_(embedding_matrix)
            self.embedding_layer.weight.requires_grad = freeze  # Optionally make embeddings non-trainable

        self.rnn = nn.RNN(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True)
        self.linear = nn.Linear(hidden_dim, output_dim)

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn(embeddings)
        logits = self.linear(output[:, -1])  # Using the last output of RNN for sequence classification
        probs = F.softmax(logits, dim=1)
        return probs



In [34]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier13 = Model2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes), embedding_matrix, False ).to(device)
loss_fn13 = nn.CrossEntropyLoss().to(device)
optimizer13 = torch.optim.Adam(classifier13.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier13)
print('Total parameters: ', count_parameters(classifier13))
print('\n\n')


epoch_durations13 = TrainModel(classifier13, loss_fn13, optimizer13, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration13 = sum(epoch_durations13) / len(epoch_durations13)
print(f"Average training time per epoch: {average_duration13:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data13 = EvaluateModel(classifier13, loss_fn13, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
Model2(
  (embedding_layer): Embedding(21254, 100)
  (rnn): RNN(100, 64, batch_first=True)
  (linear): Linear(in_features=64, out_features=4, bias=True)
)
Total parameters:  10884



Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 26.03it/s]


Train Loss : 1.268
Epoch 1/15 - Duration: 4.54 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 26.29it/s]


Train Loss : 1.254
Epoch 2/15 - Duration: 4.49 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 25.26it/s]


Train Loss : 1.272
Epoch 3/15 - Duration: 4.68 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 25.20it/s]


Train Loss : 1.312
Epoch 4/15 - Duration: 4.69 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 25.57it/s]


Train Loss : 1.281
Epoch 5/15 - Duration: 4.62 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 25.36it/s]


Train Loss : 1.272
Epoch 6/15 - Duration: 4.66 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 25.96it/s]


Train Loss : 1.266
Epoch 7/15 - Duration: 4.55 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 25.25it/s]


Train Loss : 1.254
Epoch 8/15 - Duration: 4.68 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 25.97it/s]


Train Loss : 1.253
Epoch 9/15 - Duration: 4.55 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 25.46it/s]


Train Loss : 1.262
Epoch 10/15 - Duration: 4.64 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 25.22it/s]


Train Loss : 1.202
Epoch 11/15 - Duration: 4.69 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 25.09it/s]


Train Loss : 1.191
Epoch 12/15 - Duration: 4.71 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 25.04it/s]


Train Loss : 1.243
Epoch 13/15 - Duration: 4.72 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 25.84it/s]


Train Loss : 1.240
Epoch 14/15 - Duration: 4.57 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 26.72it/s]


Train Loss : 1.237
Epoch 15/15 - Duration: 4.42 seconds 
Average training time per epoch: 4.61 seconds

Test Accuracy : 0.473

Classification Report : 
              precision    recall  f1-score   support

       World       0.44      0.22      0.29      1900
      Sports       0.36      0.86      0.51      1900
    Business       0.72      0.44      0.54      1900
    Sci/Tech       0.71      0.38      0.49      1900

    accuracy                           0.47      7600
   macro avg       0.56      0.47      0.46      7600
weighted avg       0.56      0.47      0.46      7600


Confusion Matrix : 
[[ 413 1378   89   20]
 [ 225 1636   15   24]
 [ 143  682  829  246]
 [ 163  797  226  714]]


In [35]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class BiRNN1_2(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, embedding_matrix=None, freeze=False):
        super(BiRNN1_2, self).__init__()
        # Initialize the embedding layer
        if embedding_matrix is not None:
            # Check if the embedding matrix is already a tensor, if not, convert it
            if not isinstance(embedding_matrix, torch.Tensor):
                embedding_matrix = torch.tensor(embedding_matrix, dtype=torch.float)
            self.embedding_layer = nn.Embedding.from_pretrained(embedding_matrix, freeze=freeze)  # Optionally make embeddings non-trainable
        else:
            self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)

        # Initialize RNN with bidirectional set to True
        self.rnn = nn.RNN(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True, bidirectional=True, num_layers=1)
        # Initialize the linear layer to handle bidirectional outputs
        self.linear = nn.Linear(2 * hidden_dim, output_dim)  # 2 * hidden_dim because it's bidirectional

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn(embeddings)
        # Concatenate the hidden states of the last layer of the bidirectional RNN
        final_hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)
        logits = self.linear(final_hidden)
        probs = F.softmax(logits, dim=1)
        return probs


In [36]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier14 = BiRNN1_2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes), embedding_matrix, False).to(device)
loss_fn14 = nn.CrossEntropyLoss().to(device)
optimizer14 = torch.optim.Adam(classifier14.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier14)
print('Total parameters: ', count_parameters(classifier14))
print('\n\n')


epoch_durations14 = TrainModel(classifier14, loss_fn14, optimizer14, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration14 = sum(epoch_durations14) / len(epoch_durations14)
print(f"Average training time per epoch: {average_duration14:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data14 = EvaluateModel(classifier14, loss_fn14, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiRNN1_2(
  (embedding_layer): Embedding(21254, 100)
  (rnn): RNN(100, 64, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  2147164



Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 25.46it/s]


Train Loss : 1.045
Epoch 1/15 - Duration: 4.64 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 25.63it/s]


Train Loss : 0.874
Epoch 2/15 - Duration: 4.61 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 25.09it/s]


Train Loss : 0.856
Epoch 3/15 - Duration: 4.71 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 24.07it/s]


Train Loss : 0.843
Epoch 4/15 - Duration: 4.91 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 25.22it/s]


Train Loss : 0.835
Epoch 5/15 - Duration: 4.68 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 24.48it/s]


Train Loss : 0.829
Epoch 6/15 - Duration: 4.83 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 25.40it/s]


Train Loss : 0.822
Epoch 7/15 - Duration: 4.65 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 25.04it/s]


Train Loss : 0.819
Epoch 8/15 - Duration: 4.72 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 26.02it/s]


Train Loss : 0.817
Epoch 9/15 - Duration: 4.54 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 26.27it/s]


Train Loss : 0.813
Epoch 10/15 - Duration: 4.50 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 23.79it/s]


Train Loss : 0.823
Epoch 11/15 - Duration: 4.97 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 24.84it/s]


Train Loss : 0.811
Epoch 12/15 - Duration: 4.75 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 24.74it/s]


Train Loss : 0.808
Epoch 13/15 - Duration: 4.78 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 25.56it/s]


Train Loss : 0.810
Epoch 14/15 - Duration: 4.62 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 24.55it/s]


Train Loss : 0.809
Epoch 15/15 - Duration: 4.81 seconds 
Average training time per epoch: 4.71 seconds

Test Accuracy : 0.907

Classification Report : 
              precision    recall  f1-score   support

       World       0.92      0.90      0.91      1900
      Sports       0.94      0.98      0.96      1900
    Business       0.88      0.86      0.87      1900
    Sci/Tech       0.88      0.89      0.89      1900

    accuracy                           0.91      7600
   macro avg       0.91      0.91      0.91      7600
weighted avg       0.91      0.91      0.91      7600


Confusion Matrix : 
[[1704   66   80   50]
 [   9 1867   16    8]
 [  80   25 1633  162]
 [  55   27  128 1690]]


In [37]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class BiRNN2_2(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, embedding_matrix=None, freeze=False):
        super(BiRNN2_2, self).__init__()
        # Check if embedding_matrix is provided and is a tensor
        if embedding_matrix is not None:
            if not isinstance(embedding_matrix, torch.Tensor):
                embedding_matrix = torch.tensor(embedding_matrix, dtype=torch.float)
            self.embedding_layer = nn.Embedding.from_pretrained(embedding_matrix, freeze=freeze)
        else:
            self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)

        self.rnn = nn.RNN(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True, bidirectional=True, num_layers=2)
        self.linear = nn.Linear(2 * hidden_dim, output_dim)

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn(embeddings)
        final_hidden = torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)
        logits = self.linear(final_hidden)
        probs = F.softmax(logits, dim=1)
        return probs


In [38]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier15 = BiRNN2_2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes), embedding_matrix , False).to(device)
loss_fn15 = nn.CrossEntropyLoss().to(device)
optimizer15 = torch.optim.Adam(classifier15.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier15)
print('Total parameters: ', count_parameters(classifier15))
print('\n\n')


epoch_durations15 = TrainModel(classifier15, loss_fn15, optimizer15, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration15 = sum(epoch_durations15) / len(epoch_durations15)
print(f"Average training time per epoch: {average_duration15:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data15 = EvaluateModel(classifier15, loss_fn15, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiRNN2_2(
  (embedding_layer): Embedding(21254, 100)
  (rnn): RNN(100, 64, num_layers=2, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  2171996



Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 23.63it/s]


Train Loss : 0.964
Epoch 1/15 - Duration: 5.00 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 24.08it/s]


Train Loss : 0.859
Epoch 2/15 - Duration: 4.91 seconds 
Epoch: 3


100%|██████████| 118/118 [00:05<00:00, 23.51it/s]


Train Loss : 0.850
Epoch 3/15 - Duration: 5.02 seconds 
Epoch: 4


100%|██████████| 118/118 [00:05<00:00, 23.41it/s]


Train Loss : 0.841
Epoch 4/15 - Duration: 5.05 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 23.78it/s]


Train Loss : 0.839
Epoch 5/15 - Duration: 4.97 seconds 
Epoch: 6


100%|██████████| 118/118 [00:05<00:00, 22.91it/s]


Train Loss : 0.835
Epoch 6/15 - Duration: 5.16 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 24.01it/s]


Train Loss : 0.829
Epoch 7/15 - Duration: 4.92 seconds 
Epoch: 8


100%|██████████| 118/118 [00:05<00:00, 23.59it/s]


Train Loss : 0.832
Epoch 8/15 - Duration: 5.01 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 24.03it/s]


Train Loss : 0.829
Epoch 9/15 - Duration: 4.92 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 23.82it/s]


Train Loss : 0.861
Epoch 10/15 - Duration: 4.96 seconds 
Epoch: 11


100%|██████████| 118/118 [00:05<00:00, 23.18it/s]


Train Loss : 0.838
Epoch 11/15 - Duration: 5.10 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 23.77it/s]


Train Loss : 0.824
Epoch 12/15 - Duration: 4.97 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 23.73it/s]


Train Loss : 0.818
Epoch 13/15 - Duration: 4.98 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 24.32it/s]


Train Loss : 0.819
Epoch 14/15 - Duration: 4.86 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 23.66it/s]


Train Loss : 0.817
Epoch 15/15 - Duration: 4.99 seconds 
Average training time per epoch: 4.99 seconds

Test Accuracy : 0.907

Classification Report : 
              precision    recall  f1-score   support

       World       0.93      0.89      0.91      1900
      Sports       0.95      0.98      0.96      1900
    Business       0.85      0.90      0.87      1900
    Sci/Tech       0.90      0.86      0.88      1900

    accuracy                           0.91      7600
   macro avg       0.91      0.91      0.91      7600
weighted avg       0.91      0.91      0.91      7600


Confusion Matrix : 
[[1691   61  106   42]
 [  12 1855   19   14]
 [  61   16 1705  118]
 [  60   17  182 1641]]


In [39]:
class LSTM1_2(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, embedding_matrix=None, freeze=None):
        super(LSTM1_2, self).__init__()
        # Initialize the embedding layer
        if embedding_matrix is not None:
            if not isinstance(embedding_matrix, torch.Tensor):
                # Convert the NumPy array to a tensor if it's not already a tensor
                embedding_matrix = torch.tensor(embedding_matrix, dtype=torch.float)
            # Use the from_pretrained method to utilize the pre-trained embeddings
            self.embedding_layer = nn.Embedding.from_pretrained(embedding_matrix, freeze=freeze)
        else:
            self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)

        # Using LSTM instead of RNN
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True)
        self.linear = nn.Linear(hidden_dim, output_dim)

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        # Output from LSTM; lstm_out is the last output state
        lstm_out, (hidden, cell) = self.lstm(embeddings)
        # We use the last hidden state for the last time step; hidden[-1] refers to the last layer's last hidden state
        logits = self.linear(hidden[-1])  # hidden[-1] has shape (batch, hidden_dim)
        probs = F.softmax(logits, dim=1)
        return probs




In [40]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier16 = LSTM1_2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes), embedding_matrix , False).to(device)
loss_fn16 = nn.CrossEntropyLoss().to(device)
optimizer16 = torch.optim.Adam(classifier16.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier16)
print('Total parameters: ', count_parameters(classifier16))
print('\n\n')


epoch_durations16 = TrainModel(classifier16, loss_fn16, optimizer16, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration16 = sum(epoch_durations16) / len(epoch_durations16)
print(f"Average training time per epoch: {average_duration16:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data16 = EvaluateModel(classifier16, loss_fn16, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
LSTM1_2(
  (embedding_layer): Embedding(21254, 100)
  (lstm): LSTM(100, 64, batch_first=True)
  (linear): Linear(in_features=64, out_features=4, bias=True)
)
Total parameters:  2168156



Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 24.79it/s]


Train Loss : 1.120
Epoch 1/15 - Duration: 4.76 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 24.70it/s]


Train Loss : 0.941
Epoch 2/15 - Duration: 4.78 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 24.19it/s]


Train Loss : 0.890
Epoch 3/15 - Duration: 4.88 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 24.92it/s]


Train Loss : 0.865
Epoch 4/15 - Duration: 4.74 seconds 
Epoch: 5


100%|██████████| 118/118 [00:05<00:00, 23.25it/s]


Train Loss : 0.854
Epoch 5/15 - Duration: 5.08 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 24.70it/s]


Train Loss : 0.847
Epoch 6/15 - Duration: 4.78 seconds 
Epoch: 7


100%|██████████| 118/118 [00:05<00:00, 23.18it/s]


Train Loss : 0.845
Epoch 7/15 - Duration: 5.10 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 24.74it/s]


Train Loss : 0.834
Epoch 8/15 - Duration: 4.78 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 24.54it/s]


Train Loss : 0.829
Epoch 9/15 - Duration: 4.81 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 24.12it/s]


Train Loss : 0.829
Epoch 10/15 - Duration: 4.90 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 24.90it/s]


Train Loss : 0.826
Epoch 11/15 - Duration: 4.74 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 23.72it/s]


Train Loss : 0.845
Epoch 12/15 - Duration: 4.98 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 24.51it/s]


Train Loss : 0.825
Epoch 13/15 - Duration: 4.82 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 24.53it/s]


Train Loss : 0.823
Epoch 14/15 - Duration: 4.82 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 24.29it/s]


Train Loss : 0.820
Epoch 15/15 - Duration: 4.86 seconds 
Average training time per epoch: 4.86 seconds

Test Accuracy : 0.909

Classification Report : 
              precision    recall  f1-score   support

       World       0.94      0.88      0.91      1900
      Sports       0.95      0.98      0.96      1900
    Business       0.85      0.90      0.87      1900
    Sci/Tech       0.90      0.88      0.89      1900

    accuracy                           0.91      7600
   macro avg       0.91      0.91      0.91      7600
weighted avg       0.91      0.91      0.91      7600


Confusion Matrix : 
[[1667   69  117   47]
 [   5 1861   26    8]
 [  44   17 1703  136]
 [  53   11  158 1678]]


In [41]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class BiLSTM1_2(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, embedding_matrix=None, freeze=None):
        super(BiLSTM1_2, self).__init__()
        # Initialize the embedding layer
        if embedding_matrix is not None:
            if not isinstance(embedding_matrix, torch.Tensor):
                # Convert the NumPy array to a tensor if it's not already a tensor
                embedding_matrix = torch.tensor(embedding_matrix, dtype=torch.float)
            # Use the from_pretrained method to utilize the pre-trained embeddings
            self.embedding_layer = nn.Embedding.from_pretrained(embedding_matrix, freeze=freeze)
        else:
            self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)

        # Using a bidirectional LSTM
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True, bidirectional=True)
        # Adjust linear layer to handle bidirectional output
        self.linear = nn.Linear(2 * hidden_dim, output_dim)

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        # output from LSTM; lstm_out is the last output state
        lstm_out, (hidden, cell) = self.lstm(embeddings)
        # Concatenate the hidden states for both directions
        hidden_forward = hidden[-2,:,:]  # Forward direction of the last layer
        hidden_backward = hidden[-1,:,:]  # Backward direction of the last layer
        hidden_concat = torch.cat((hidden_forward, hidden_backward), dim=1)
        logits = self.linear(hidden_concat)
        probs = F.softmax(logits, dim=1)
        return probs


In [42]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier17 = BiLSTM1_2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes), embedding_matrix , False).to(device)
loss_fn17 = nn.CrossEntropyLoss().to(device)
optimizer17 = torch.optim.Adam(classifier17.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier17)
print('Total parameters: ', count_parameters(classifier17))
print('\n\n')


epoch_durations17 = TrainModel(classifier17, loss_fn17, optimizer17, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration17 = sum(epoch_durations17) / len(epoch_durations17)
print(f"Average training time per epoch: {average_duration17:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data17 = EvaluateModel(classifier17, loss_fn17, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiLSTM1_2(
  (embedding_layer): Embedding(21254, 100)
  (lstm): LSTM(100, 64, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  2210908



Epoch: 1


100%|██████████| 118/118 [00:05<00:00, 22.78it/s]


Train Loss : 1.009
Epoch 1/15 - Duration: 5.19 seconds 
Epoch: 2


100%|██████████| 118/118 [00:05<00:00, 23.23it/s]


Train Loss : 0.843
Epoch 2/15 - Duration: 5.09 seconds 
Epoch: 3


100%|██████████| 118/118 [00:05<00:00, 23.54it/s]


Train Loss : 0.830
Epoch 3/15 - Duration: 5.02 seconds 
Epoch: 4


100%|██████████| 118/118 [00:05<00:00, 23.15it/s]


Train Loss : 0.822
Epoch 4/15 - Duration: 5.10 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 23.86it/s]


Train Loss : 0.816
Epoch 5/15 - Duration: 4.95 seconds 
Epoch: 6


100%|██████████| 118/118 [00:05<00:00, 22.66it/s]


Train Loss : 0.811
Epoch 6/15 - Duration: 5.21 seconds 
Epoch: 7


100%|██████████| 118/118 [00:05<00:00, 23.42it/s]


Train Loss : 0.807
Epoch 7/15 - Duration: 5.04 seconds 
Epoch: 8


100%|██████████| 118/118 [00:05<00:00, 22.82it/s]


Train Loss : 0.804
Epoch 8/15 - Duration: 5.18 seconds 
Epoch: 9


100%|██████████| 118/118 [00:05<00:00, 23.35it/s]


Train Loss : 0.800
Epoch 9/15 - Duration: 5.06 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 23.87it/s]


Train Loss : 0.798
Epoch 10/15 - Duration: 4.95 seconds 
Epoch: 11


100%|██████████| 118/118 [00:05<00:00, 23.52it/s]


Train Loss : 0.795
Epoch 11/15 - Duration: 5.02 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 23.64it/s]


Train Loss : 0.794
Epoch 12/15 - Duration: 5.00 seconds 
Epoch: 13


100%|██████████| 118/118 [00:05<00:00, 23.03it/s]


Train Loss : 0.793
Epoch 13/15 - Duration: 5.13 seconds 
Epoch: 14


100%|██████████| 118/118 [00:05<00:00, 23.37it/s]


Train Loss : 0.792
Epoch 14/15 - Duration: 5.05 seconds 
Epoch: 15


100%|██████████| 118/118 [00:05<00:00, 23.52it/s]


Train Loss : 0.792
Epoch 15/15 - Duration: 5.02 seconds 
Average training time per epoch: 5.07 seconds

Test Accuracy : 0.918

Classification Report : 
              precision    recall  f1-score   support

       World       0.95      0.90      0.92      1900
      Sports       0.96      0.99      0.97      1900
    Business       0.86      0.91      0.88      1900
    Sci/Tech       0.91      0.88      0.89      1900

    accuracy                           0.92      7600
   macro avg       0.92      0.92      0.92      7600
weighted avg       0.92      0.92      0.92      7600


Confusion Matrix : 
[[1707   58   87   48]
 [   8 1875   14    3]
 [  48    9 1721  122]
 [  43   10  170 1677]]


In [43]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class BiLSTM2_2(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim, embedding_matrix=None, freeze=False):
        super(BiLSTM2_2, self).__init__()
        # Initialize the embedding layer with pre-trained embeddings if provided
        if embedding_matrix is not None:
            if not isinstance(embedding_matrix, torch.Tensor):
                embedding_matrix = torch.tensor(embedding_matrix, dtype=torch.float)
            self.embedding_layer = nn.Embedding.from_pretrained(embedding_matrix, freeze=freeze)
        else:
            self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)

        # Using a 2-layer bidirectional LSTM
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, batch_first=True, bidirectional=True, num_layers=2)
        # Adjust the linear layer to handle bidirectional output from the top layer
        self.linear = nn.Linear(2 * hidden_dim, output_dim)

    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        # output from LSTM; lstm_out is the last output state
        lstm_out, (hidden, cell) = self.lstm(embeddings)
        # We use the hidden states from the last layer (both forward and backward)
        hidden_forward = hidden[-2,:,:]  # Forward direction of the last layer
        hidden_backward = hidden[-1,:,:]  # Backward direction of the last layer
        hidden_concat = torch.cat((hidden_forward, hidden_backward), dim=1)
        logits = self.linear(hidden_concat)
        probs = F.softmax(logits, dim=1)
        return probs


In [44]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier18 = BiLSTM2_2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes), embedding_matrix , False).to(device)
loss_fn18 = nn.CrossEntropyLoss().to(device)
optimizer18 = torch.optim.Adam(classifier18.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier18)
print('Total parameters: ', count_parameters(classifier18))
print('\n\n')


epoch_durations18 = TrainModel(classifier18, loss_fn18, optimizer18, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration18 = sum(epoch_durations18) / len(epoch_durations18)
print(f"Average training time per epoch: {average_duration18:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data18 = EvaluateModel(classifier18, loss_fn18, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiLSTM2_2(
  (embedding_layer): Embedding(21254, 100)
  (lstm): LSTM(100, 64, num_layers=2, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  2310236



Epoch: 1


100%|██████████| 118/118 [00:05<00:00, 21.96it/s]


Train Loss : 0.980
Epoch 1/15 - Duration: 5.38 seconds 
Epoch: 2


100%|██████████| 118/118 [00:05<00:00, 21.79it/s]


Train Loss : 0.852
Epoch 2/15 - Duration: 5.42 seconds 
Epoch: 3


100%|██████████| 118/118 [00:05<00:00, 22.19it/s]


Train Loss : 0.838
Epoch 3/15 - Duration: 5.32 seconds 
Epoch: 4


100%|██████████| 118/118 [00:05<00:00, 22.10it/s]


Train Loss : 0.831
Epoch 4/15 - Duration: 5.35 seconds 
Epoch: 5


100%|██████████| 118/118 [00:05<00:00, 21.18it/s]


Train Loss : 0.824
Epoch 5/15 - Duration: 5.58 seconds 
Epoch: 6


100%|██████████| 118/118 [00:05<00:00, 22.31it/s]


Train Loss : 0.821
Epoch 6/15 - Duration: 5.29 seconds 
Epoch: 7


100%|██████████| 118/118 [00:05<00:00, 22.11it/s]


Train Loss : 0.817
Epoch 7/15 - Duration: 5.34 seconds 
Epoch: 8


100%|██████████| 118/118 [00:05<00:00, 22.11it/s]


Train Loss : 0.818
Epoch 8/15 - Duration: 5.34 seconds 
Epoch: 9


100%|██████████| 118/118 [00:05<00:00, 21.41it/s]


Train Loss : 0.813
Epoch 9/15 - Duration: 5.52 seconds 
Epoch: 10


100%|██████████| 118/118 [00:05<00:00, 22.11it/s]


Train Loss : 0.810
Epoch 10/15 - Duration: 5.34 seconds 
Epoch: 11


100%|██████████| 118/118 [00:05<00:00, 22.38it/s]


Train Loss : 0.808
Epoch 11/15 - Duration: 5.28 seconds 
Epoch: 12


100%|██████████| 118/118 [00:05<00:00, 22.04it/s]


Train Loss : 0.805
Epoch 12/15 - Duration: 5.36 seconds 
Epoch: 13


100%|██████████| 118/118 [00:05<00:00, 22.50it/s]


Train Loss : 0.803
Epoch 13/15 - Duration: 5.25 seconds 
Epoch: 14


100%|██████████| 118/118 [00:05<00:00, 21.28it/s]


Train Loss : 0.810
Epoch 14/15 - Duration: 5.55 seconds 
Epoch: 15


100%|██████████| 118/118 [00:05<00:00, 22.23it/s]


Train Loss : 0.801
Epoch 15/15 - Duration: 5.31 seconds 
Average training time per epoch: 5.38 seconds

Test Accuracy : 0.915

Classification Report : 
              precision    recall  f1-score   support

       World       0.94      0.90      0.92      1900
      Sports       0.95      0.98      0.97      1900
    Business       0.87      0.89      0.88      1900
    Sci/Tech       0.90      0.88      0.89      1900

    accuracy                           0.92      7600
   macro avg       0.92      0.92      0.92      7600
weighted avg       0.92      0.92      0.92      7600


Confusion Matrix : 
[[1708   63   83   46]
 [   7 1870   18    5]
 [  60   17 1697  126]
 [  46   15  158 1681]]


# Results GloVe pretrained embeddings initialization - No Freeze  (GloVe6B100d)

|  | 1RNN | 1-BiRNN | 2-BiRNN | 1LSTM | 1-BiLSTM |2-BiLSTM  |
|:---------|:--------:|---------:|---------:|---------:|---------:|---------:|
|  Accuracy  |0.473       |     0.907       |      0.907   |    0.909        |       0.918     |       0.915          |
|  Parameters   | 10884       |     2147164     |      2171996 |       2168156   |        2210908  |        2310236       |
|  Time Cost(per epoch)   |4.61 seconds|    4.71 seconds |  4.99 seconds|    4.86 seconds |    5.07 seconds |    5.38 seconds      |





1RNN's complexity is reduced significantly on this occasion but so does the accuracy. This model is deemed highly inefficient and more time consuming than the model without the GloVe embeddings. By increasing the number of layers and providing bidirectionality, the accuracy increases drastically, reaching 90.7%. The same number of parameters are used as before(without the embeddings, max_words=25). They are slightly more time consuming, since they are 0.2-0.3 seconds slower.
LSTM architecture further increaces the efficiency of the model. All of LSTM's versions are 2-3% more accurate than their previous versions(without Glove). The peak accuracy was achieved by 1-layer bidirectional lstm, which 91.8%. Their number of parameters is exactly the same as before. Unfortunately, they are 0.2-0.5 seconds slower per epoch than their previous counterparts. In conslusion, GloVe embeddings  boost performance majorly.

# Initialize embeddings of NNs with GloVe pretrained embeddings glove6B100d - With Weights Freezed

In [45]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier19 = Model2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes), embedding_matrix , True).to(device)
loss_fn19 = nn.CrossEntropyLoss().to(device)
optimizer19 = torch.optim.Adam(classifier19.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier19)
print('Total parameters: ', count_parameters(classifier19))
print('\n\n')


epoch_durations19 = TrainModel(classifier19, loss_fn19, optimizer19, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration19 = sum(epoch_durations19) / len(epoch_durations19)
print(f"Average training time per epoch: {average_duration19:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data19 = EvaluateModel(classifier19, loss_fn19, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
Model2(
  (embedding_layer): Embedding(21254, 100)
  (rnn): RNN(100, 64, batch_first=True)
  (linear): Linear(in_features=64, out_features=4, bias=True)
)
Total parameters:  2136284



Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 25.84it/s]


Train Loss : 1.302
Epoch 1/15 - Duration: 4.57 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 26.30it/s]


Train Loss : 1.202
Epoch 2/15 - Duration: 4.49 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 26.79it/s]


Train Loss : 1.273
Epoch 3/15 - Duration: 4.41 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 26.48it/s]


Train Loss : 1.294
Epoch 4/15 - Duration: 4.46 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 26.98it/s]


Train Loss : 1.286
Epoch 5/15 - Duration: 4.38 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 25.53it/s]


Train Loss : 1.252
Epoch 6/15 - Duration: 4.63 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 26.41it/s]


Train Loss : 1.140
Epoch 7/15 - Duration: 4.47 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 26.76it/s]


Train Loss : 1.132
Epoch 8/15 - Duration: 4.42 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 26.30it/s]


Train Loss : 1.286
Epoch 9/15 - Duration: 4.49 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 25.68it/s]


Train Loss : 1.302
Epoch 10/15 - Duration: 4.60 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 26.74it/s]


Train Loss : 1.331
Epoch 11/15 - Duration: 4.42 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 26.06it/s]


Train Loss : 1.301
Epoch 12/15 - Duration: 4.53 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 25.71it/s]


Train Loss : 1.289
Epoch 13/15 - Duration: 4.60 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 26.46it/s]


Train Loss : 1.286
Epoch 14/15 - Duration: 4.46 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 26.77it/s]


Train Loss : 1.284
Epoch 15/15 - Duration: 4.41 seconds 
Average training time per epoch: 4.49 seconds

Test Accuracy : 0.420

Classification Report : 
              precision    recall  f1-score   support

       World       0.59      0.70      0.64      1900
      Sports       0.35      0.92      0.50      1900
    Business       0.00      0.00      0.00      1900
    Sci/Tech       0.36      0.06      0.10      1900

    accuracy                           0.42      7600
   macro avg       0.33      0.42      0.31      7600
weighted avg       0.33      0.42      0.31      7600


Confusion Matrix : 
[[1329  523    0   48]
 [ 101 1756    0   43]
 [ 528 1269    0  103]
 [ 283 1508    0  109]]


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [46]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier20 = BiRNN1_2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes), embedding_matrix , True).to(device)
loss_fn20 = nn.CrossEntropyLoss().to(device)
optimizer20 = torch.optim.Adam(classifier20.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier20)
print('Total parameters: ', count_parameters(classifier20))
print('\n\n')


epoch_durations20 = TrainModel(classifier20, loss_fn20, optimizer20, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration20 = sum(epoch_durations20) / len(epoch_durations20)
print(f"Average training time per epoch: {average_duration20:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data20 = EvaluateModel(classifier20, loss_fn20, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiRNN1_2(
  (embedding_layer): Embedding(21254, 100)
  (rnn): RNN(100, 64, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  21764



Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 26.23it/s]


Train Loss : 1.033
Epoch 1/15 - Duration: 4.50 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 24.88it/s]


Train Loss : 0.933
Epoch 2/15 - Duration: 4.75 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 26.24it/s]


Train Loss : 0.902
Epoch 3/15 - Duration: 4.50 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 25.80it/s]


Train Loss : 0.893
Epoch 4/15 - Duration: 4.58 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 24.73it/s]


Train Loss : 0.877
Epoch 5/15 - Duration: 4.78 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 25.55it/s]


Train Loss : 0.888
Epoch 6/15 - Duration: 4.62 seconds 
Epoch: 7


100%|██████████| 118/118 [00:04<00:00, 25.54it/s]


Train Loss : 0.891
Epoch 7/15 - Duration: 4.63 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 25.44it/s]


Train Loss : 0.916
Epoch 8/15 - Duration: 4.64 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 25.48it/s]


Train Loss : 0.974
Epoch 9/15 - Duration: 4.64 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 24.63it/s]


Train Loss : 0.901
Epoch 10/15 - Duration: 4.80 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 25.56it/s]


Train Loss : 0.894
Epoch 11/15 - Duration: 4.62 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 25.65it/s]


Train Loss : 0.894
Epoch 12/15 - Duration: 4.61 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 25.77it/s]


Train Loss : 0.939
Epoch 13/15 - Duration: 4.58 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 25.03it/s]


Train Loss : 0.885
Epoch 14/15 - Duration: 4.72 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 25.66it/s]


Train Loss : 0.886
Epoch 15/15 - Duration: 4.60 seconds 
Average training time per epoch: 4.64 seconds

Test Accuracy : 0.860

Classification Report : 
              precision    recall  f1-score   support

       World       0.90      0.85      0.87      1900
      Sports       0.92      0.95      0.94      1900
    Business       0.83      0.79      0.81      1900
    Sci/Tech       0.79      0.86      0.82      1900

    accuracy                           0.86      7600
   macro avg       0.86      0.86      0.86      7600
weighted avg       0.86      0.86      0.86      7600


Confusion Matrix : 
[[1606   67  127  100]
 [  37 1799   27   37]
 [  85   22 1502  291]
 [  62   58  148 1632]]


In [47]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier21 = BiRNN2_2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes), embedding_matrix , True).to(device)
loss_fn21 = nn.CrossEntropyLoss().to(device)
optimizer21 = torch.optim.Adam(classifier21.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier21)
print('Total parameters: ', count_parameters(classifier21))
print('\n\n')


epoch_durations21 = TrainModel(classifier21, loss_fn21, optimizer21, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration21 = sum(epoch_durations21) / len(epoch_durations21)
print(f"Average training time per epoch: {average_duration21:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data21 = EvaluateModel(classifier21, loss_fn21, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiRNN2_2(
  (embedding_layer): Embedding(21254, 100)
  (rnn): RNN(100, 64, num_layers=2, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  46596



Epoch: 1


100%|██████████| 118/118 [00:05<00:00, 23.15it/s]


Train Loss : 0.996
Epoch 1/15 - Duration: 5.10 seconds 
Epoch: 2


100%|██████████| 118/118 [00:05<00:00, 23.40it/s]


Train Loss : 0.912
Epoch 2/15 - Duration: 5.05 seconds 
Epoch: 3


100%|██████████| 118/118 [00:04<00:00, 24.03it/s]


Train Loss : 0.887
Epoch 3/15 - Duration: 4.92 seconds 
Epoch: 4


100%|██████████| 118/118 [00:04<00:00, 23.88it/s]


Train Loss : 0.952
Epoch 4/15 - Duration: 4.94 seconds 
Epoch: 5


100%|██████████| 118/118 [00:05<00:00, 22.96it/s]


Train Loss : 1.087
Epoch 5/15 - Duration: 5.14 seconds 
Epoch: 6


100%|██████████| 118/118 [00:05<00:00, 23.28it/s]


Train Loss : 0.932
Epoch 6/15 - Duration: 5.08 seconds 
Epoch: 7


100%|██████████| 118/118 [00:05<00:00, 23.59it/s]


Train Loss : 0.921
Epoch 7/15 - Duration: 5.01 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 23.70it/s]


Train Loss : 0.928
Epoch 8/15 - Duration: 4.98 seconds 
Epoch: 9


100%|██████████| 118/118 [00:04<00:00, 23.72it/s]


Train Loss : 0.890
Epoch 9/15 - Duration: 4.98 seconds 
Epoch: 10


100%|██████████| 118/118 [00:05<00:00, 22.80it/s]


Train Loss : 0.932
Epoch 10/15 - Duration: 5.18 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 24.02it/s]


Train Loss : 0.883
Epoch 11/15 - Duration: 4.92 seconds 
Epoch: 12


100%|██████████| 118/118 [00:05<00:00, 23.51it/s]


Train Loss : 0.929
Epoch 12/15 - Duration: 5.02 seconds 
Epoch: 13


100%|██████████| 118/118 [00:05<00:00, 23.44it/s]


Train Loss : 0.894
Epoch 13/15 - Duration: 5.04 seconds 
Epoch: 14


100%|██████████| 118/118 [00:04<00:00, 23.65it/s]


Train Loss : 0.899
Epoch 14/15 - Duration: 4.99 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 24.06it/s]


Train Loss : 0.908
Epoch 15/15 - Duration: 4.91 seconds 
Average training time per epoch: 5.02 seconds

Test Accuracy : 0.595

Classification Report : 
              precision    recall  f1-score   support

       World       0.00      0.00      0.00      1900
      Sports       0.96      0.65      0.77      1900
    Business       0.42      0.80      0.55      1900
    Sci/Tech       0.66      0.93      0.77      1900

    accuracy                           0.59      7600
   macro avg       0.51      0.59      0.52      7600
weighted avg       0.51      0.59      0.52      7600


Confusion Matrix : 
[[   0   37 1503  360]
 [   0 1238  495  167]
 [   0   17 1521  362]
 [   0    4  133 1763]]


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [48]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier22 = LSTM1_2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes), embedding_matrix , True).to(device)
loss_fn22 = nn.CrossEntropyLoss().to(device)
optimizer22 = torch.optim.Adam(classifier22.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier22)
print('Total parameters: ', count_parameters(classifier22))
print('\n\n')


epoch_durations22 = TrainModel(classifier22, loss_fn22, optimizer22, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration22 = sum(epoch_durations22) / len(epoch_durations22)
print(f"Average training time per epoch: {average_duration22:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data22 = EvaluateModel(classifier22, loss_fn22, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
LSTM1_2(
  (embedding_layer): Embedding(21254, 100)
  (lstm): LSTM(100, 64, batch_first=True)
  (linear): Linear(in_features=64, out_features=4, bias=True)
)
Total parameters:  42756



Epoch: 1


100%|██████████| 118/118 [00:04<00:00, 24.24it/s]


Train Loss : 1.164
Epoch 1/15 - Duration: 4.88 seconds 
Epoch: 2


100%|██████████| 118/118 [00:04<00:00, 23.70it/s]


Train Loss : 0.948
Epoch 2/15 - Duration: 4.98 seconds 
Epoch: 3


100%|██████████| 118/118 [00:05<00:00, 23.49it/s]


Train Loss : 0.926
Epoch 3/15 - Duration: 5.03 seconds 
Epoch: 4


100%|██████████| 118/118 [00:05<00:00, 23.33it/s]


Train Loss : 0.876
Epoch 4/15 - Duration: 5.06 seconds 
Epoch: 5


100%|██████████| 118/118 [00:04<00:00, 24.14it/s]


Train Loss : 0.865
Epoch 5/15 - Duration: 4.90 seconds 
Epoch: 6


100%|██████████| 118/118 [00:04<00:00, 23.80it/s]


Train Loss : 0.865
Epoch 6/15 - Duration: 4.96 seconds 
Epoch: 7


100%|██████████| 118/118 [00:05<00:00, 23.39it/s]


Train Loss : 0.858
Epoch 7/15 - Duration: 5.05 seconds 
Epoch: 8


100%|██████████| 118/118 [00:04<00:00, 23.93it/s]


Train Loss : 0.855
Epoch 8/15 - Duration: 4.94 seconds 
Epoch: 9


100%|██████████| 118/118 [00:05<00:00, 23.20it/s]


Train Loss : 0.852
Epoch 9/15 - Duration: 5.09 seconds 
Epoch: 10


100%|██████████| 118/118 [00:04<00:00, 24.65it/s]


Train Loss : 0.849
Epoch 10/15 - Duration: 4.79 seconds 
Epoch: 11


100%|██████████| 118/118 [00:04<00:00, 23.63it/s]


Train Loss : 0.846
Epoch 11/15 - Duration: 5.00 seconds 
Epoch: 12


100%|██████████| 118/118 [00:04<00:00, 24.56it/s]


Train Loss : 0.845
Epoch 12/15 - Duration: 4.81 seconds 
Epoch: 13


100%|██████████| 118/118 [00:04<00:00, 24.60it/s]


Train Loss : 0.842
Epoch 13/15 - Duration: 4.80 seconds 
Epoch: 14


100%|██████████| 118/118 [00:05<00:00, 23.57it/s]


Train Loss : 0.842
Epoch 14/15 - Duration: 5.01 seconds 
Epoch: 15


100%|██████████| 118/118 [00:04<00:00, 24.52it/s]


Train Loss : 0.841
Epoch 15/15 - Duration: 4.82 seconds 
Average training time per epoch: 4.94 seconds

Test Accuracy : 0.898

Classification Report : 
              precision    recall  f1-score   support

       World       0.90      0.89      0.90      1900
      Sports       0.96      0.96      0.96      1900
    Business       0.88      0.84      0.86      1900
    Sci/Tech       0.86      0.90      0.88      1900

    accuracy                           0.90      7600
   macro avg       0.90      0.90      0.90      7600
weighted avg       0.90      0.90      0.90      7600


Confusion Matrix : 
[[1692   58   97   53]
 [  25 1832   22   21]
 [  76   11 1601  212]
 [  79   12  107 1702]]


In [49]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier23 = BiLSTM1_2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes), embedding_matrix , True).to(device)
loss_fn23 = nn.CrossEntropyLoss().to(device)
optimizer23 = torch.optim.Adam(classifier23.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier23)
print('Total parameters: ', count_parameters(classifier23))
print('\n\n')


epoch_durations23 = TrainModel(classifier23, loss_fn23, optimizer23, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration23 = sum(epoch_durations23) / len(epoch_durations23)
print(f"Average training time per epoch: {average_duration23:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data23 = EvaluateModel(classifier23, loss_fn23, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiLSTM1_2(
  (embedding_layer): Embedding(21254, 100)
  (lstm): LSTM(100, 64, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  85508



Epoch: 1


100%|██████████| 118/118 [00:05<00:00, 22.74it/s]


Train Loss : 1.010
Epoch 1/15 - Duration: 5.19 seconds 
Epoch: 2


100%|██████████| 118/118 [00:05<00:00, 23.00it/s]


Train Loss : 0.859
Epoch 2/15 - Duration: 5.13 seconds 
Epoch: 3


100%|██████████| 118/118 [00:05<00:00, 22.40it/s]


Train Loss : 0.851
Epoch 3/15 - Duration: 5.27 seconds 
Epoch: 4


100%|██████████| 118/118 [00:05<00:00, 22.44it/s]


Train Loss : 0.846
Epoch 4/15 - Duration: 5.26 seconds 
Epoch: 5


100%|██████████| 118/118 [00:05<00:00, 23.04it/s]


Train Loss : 0.843
Epoch 5/15 - Duration: 5.13 seconds 
Epoch: 6


100%|██████████| 118/118 [00:05<00:00, 22.16it/s]


Train Loss : 0.840
Epoch 6/15 - Duration: 5.33 seconds 
Epoch: 7


100%|██████████| 118/118 [00:05<00:00, 23.12it/s]


Train Loss : 0.838
Epoch 7/15 - Duration: 5.11 seconds 
Epoch: 8


100%|██████████| 118/118 [00:05<00:00, 22.74it/s]


Train Loss : 0.837
Epoch 8/15 - Duration: 5.20 seconds 
Epoch: 9


100%|██████████| 118/118 [00:05<00:00, 23.22it/s]


Train Loss : 0.833
Epoch 9/15 - Duration: 5.09 seconds 
Epoch: 10


100%|██████████| 118/118 [00:05<00:00, 22.87it/s]


Train Loss : 0.832
Epoch 10/15 - Duration: 5.16 seconds 
Epoch: 11


100%|██████████| 118/118 [00:05<00:00, 22.42it/s]


Train Loss : 0.829
Epoch 11/15 - Duration: 5.27 seconds 
Epoch: 12


100%|██████████| 118/118 [00:05<00:00, 23.09it/s]


Train Loss : 0.828
Epoch 12/15 - Duration: 5.12 seconds 
Epoch: 13


100%|██████████| 118/118 [00:05<00:00, 23.07it/s]


Train Loss : 0.826
Epoch 13/15 - Duration: 5.12 seconds 
Epoch: 14


100%|██████████| 118/118 [00:05<00:00, 23.12it/s]


Train Loss : 0.824
Epoch 14/15 - Duration: 5.11 seconds 
Epoch: 15


100%|██████████| 118/118 [00:05<00:00, 22.08it/s]


Train Loss : 0.823
Epoch 15/15 - Duration: 5.35 seconds 
Average training time per epoch: 5.19 seconds

Test Accuracy : 0.911

Classification Report : 
              precision    recall  f1-score   support

       World       0.93      0.90      0.91      1900
      Sports       0.95      0.99      0.97      1900
    Business       0.88      0.87      0.87      1900
    Sci/Tech       0.88      0.89      0.89      1900

    accuracy                           0.91      7600
   macro avg       0.91      0.91      0.91      7600
weighted avg       0.91      0.91      0.91      7600


Confusion Matrix : 
[[1708   60   78   54]
 [   8 1872   12    8]
 [  72   20 1649  159]
 [  51   15  143 1691]]


In [50]:
# Usage example assuming vocab, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier24 = BiLSTM2_2(len(vocab), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes), embedding_matrix , True).to(device)
loss_fn24 = nn.CrossEntropyLoss().to(device)
optimizer24 = torch.optim.Adam(classifier24.parameters(), lr=LEARNING_RATE)


print('\nModel:')
print(classifier24)
print('Total parameters: ', count_parameters(classifier24))
print('\n\n')


epoch_durations24 = TrainModel(classifier24, loss_fn24, optimizer24, train_loader, EPOCHS)
# Calculate and print the average duration per epoch
average_duration24 = sum(epoch_durations24) / len(epoch_durations24)
print(f"Average training time per epoch: {average_duration24:.2f} seconds")


######################################################################
# Evaluate the model with test dataset
# ------------------------------------

_, Y_actual, Y_preds, misclf_data24 = EvaluateModel(classifier24, loss_fn24, test_loader, test_data["text"])

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiLSTM2_2(
  (embedding_layer): Embedding(21254, 100)
  (lstm): LSTM(100, 64, num_layers=2, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=4, bias=True)
)
Total parameters:  184836



Epoch: 1


100%|██████████| 118/118 [00:05<00:00, 21.90it/s]


Train Loss : 0.990
Epoch 1/15 - Duration: 5.39 seconds 
Epoch: 2


100%|██████████| 118/118 [00:05<00:00, 21.49it/s]


Train Loss : 0.863
Epoch 2/15 - Duration: 5.50 seconds 
Epoch: 3


100%|██████████| 118/118 [00:05<00:00, 22.22it/s]


Train Loss : 0.854
Epoch 3/15 - Duration: 5.32 seconds 
Epoch: 4


100%|██████████| 118/118 [00:05<00:00, 22.15it/s]


Train Loss : 0.852
Epoch 4/15 - Duration: 5.33 seconds 
Epoch: 5


100%|██████████| 118/118 [00:05<00:00, 22.08it/s]


Train Loss : 0.848
Epoch 5/15 - Duration: 5.35 seconds 
Epoch: 6


100%|██████████| 118/118 [00:05<00:00, 22.24it/s]


Train Loss : 0.845
Epoch 6/15 - Duration: 5.31 seconds 
Epoch: 7


100%|██████████| 118/118 [00:05<00:00, 21.32it/s]


Train Loss : 0.844
Epoch 7/15 - Duration: 5.54 seconds 
Epoch: 8


100%|██████████| 118/118 [00:05<00:00, 21.82it/s]


Train Loss : 0.846
Epoch 8/15 - Duration: 5.42 seconds 
Epoch: 9


100%|██████████| 118/118 [00:05<00:00, 21.76it/s]


Train Loss : 0.839
Epoch 9/15 - Duration: 5.43 seconds 
Epoch: 10


100%|██████████| 118/118 [00:05<00:00, 21.77it/s]


Train Loss : 0.836
Epoch 10/15 - Duration: 5.43 seconds 
Epoch: 11


100%|██████████| 118/118 [00:05<00:00, 21.60it/s]


Train Loss : 0.837
Epoch 11/15 - Duration: 5.47 seconds 
Epoch: 12


100%|██████████| 118/118 [00:05<00:00, 21.64it/s]


Train Loss : 0.832
Epoch 12/15 - Duration: 5.46 seconds 
Epoch: 13


100%|██████████| 118/118 [00:05<00:00, 21.71it/s]


Train Loss : 0.831
Epoch 13/15 - Duration: 5.44 seconds 
Epoch: 14


100%|██████████| 118/118 [00:05<00:00, 21.83it/s]


Train Loss : 0.831
Epoch 14/15 - Duration: 5.41 seconds 
Epoch: 15


100%|██████████| 118/118 [00:05<00:00, 22.27it/s]


Train Loss : 0.829
Epoch 15/15 - Duration: 5.30 seconds 
Average training time per epoch: 5.41 seconds

Test Accuracy : 0.907

Classification Report : 
              precision    recall  f1-score   support

       World       0.93      0.89      0.91      1900
      Sports       0.95      0.98      0.97      1900
    Business       0.87      0.87      0.87      1900
    Sci/Tech       0.88      0.89      0.88      1900

    accuracy                           0.91      7600
   macro avg       0.91      0.91      0.91      7600
weighted avg       0.91      0.91      0.91      7600


Confusion Matrix : 
[[1688   65   85   62]
 [   6 1870   12   12]
 [  68   22 1647  163]
 [  56   17  141 1686]]


# Results GloVe pretrained embeddings initialization - With Freeze  (GloVe6B100d)

|  | 1RNN | 1-BiRNN | 2-BiRNN | 1LSTM | 1-BiLSTM |2-BiLSTM  |
|:---------|:--------:|---------:|---------:|---------:|---------:|---------:|
|  Accuracy  |0.420     |     0.860     |  0.595  |    0.898    |   0.911    |       0.907     |
|  Parameters   |2136284   |     21764   |      46596 |       42756  |        85508 |        184836     |
|  Time Cost(per epoch)   | 4.49 seconds|    4.64 seconds |   5.02 seconds|    4.94 seconds|    5.19 seconds |    5.41 seconds      |


The only change that was made in the code was in this line :             self.embedding_layer = nn.Embedding.from_pretrained(embedding_matrix, freeze=freeze)
Freeze was set to 'True' from 'False'

Except for 1RNN, all of the other models have around 1.8-2 million less parameters. No major differences were realized regarding time per epoch. They differ 0.1 seconds approximately per epoch(total epochs=15). 1RNN and 2-BiRNN have decreased accuracy compared to the previous experiment (around 30-40% less), which means that the model does not learn the relationships at the training or overfits.

On the other hand, LSTM is more robust but not as efficient as before. Again the 1-BiLSTM is the best out of them since it has the greatest accuracy but it is approximately 1% less than without freezing. In conclusion, freezing is not recommended in this situation since the model does not learn the weights of the parameters properly. Gradient update can help the NN capture the context of the dataset better.


# IMDB Movie Review dataset

This dataset contains 50k movie reviews that are classified into 2 categories. Several adjustments were made at the code to be able to accept 2 labels instead of 4.

In [62]:
from sklearn.model_selection import train_test_split
# Specify the path of the train file in your Google Drive
imdb_file_path = '/content/drive/My Drive/nlp/nlp2/IMDB Dataset.csv'
# Read the file using appropriate methods (e.g., pandas, numpy, etc.)
# Example for reading a CSV file using pandas:

train_data_imdb = pd.read_csv(imdb_file_path)

train_data_imdb['sentiment'] = train_data_imdb['sentiment'].map({'negative': 1, 'positive': 2})

train_data_imdb.head()

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,2
1,A wonderful little production. <br /><br />The...,2
2,I thought this was a wonderful way to spend ti...,2
3,Basically there's a family where a little boy ...,1
4,"Petter Mattei's ""Love in the Time of Money"" is...",2


Split the dataset to 80% for training and 20% for test.

In [81]:
X_train_imdb, X_test_imdb, y_train_imdb, y_test_imdb = train_test_split(train_data_imdb["review"], train_data_imdb["sentiment"],test_size = 0.2, random_state=42)
train_data_imdb2 = pd.concat([X_train_imdb , y_train_imdb], axis = 1)
test_data_imdb2 = pd.concat([X_test_imdb , y_test_imdb], axis = 1)
train_data_imdb2.head()

Unnamed: 0,review,sentiment
39087,That's what I kept asking myself during the ma...,1
30893,I did not watch the entire movie. I could not ...,1
45278,A touching love story reminiscent of In the M...,2
16398,This latter-day Fulci schlocker is a totally a...,1
13653,"First of all, I firmly believe that Norwegian ...",1


In [82]:
print(train_data_imdb2['sentiment'])
print(train_data_imdb2['review'])



39087    1
30893    1
45278    2
16398    1
13653    1
        ..
11284    2
44732    2
38158    1
860      2
15795    2
Name: sentiment, Length: 40000, dtype: int64
39087    That's what I kept asking myself during the ma...
30893    I did not watch the entire movie. I could not ...
45278    A touching love story reminiscent of In the M...
16398    This latter-day Fulci schlocker is a totally a...
13653    First of all, I firmly believe that Norwegian ...
                               ...                        
11284    `Shadow Magic' recaptures the joy and amazemen...
44732    I found this movie to be quite enjoyable and f...
38158    Avoid this one! It is a terrible movie. So wha...
860      This production was quite a surprise for me. I...
15795    This is a decent movie. Although little bit sh...
Name: review, Length: 40000, dtype: object


MAX_WORDS were adjusted to 150 in order to improve accuracy. This happens probably because the reviews are longer on this occasion

In [91]:
######################################################################
# Data processing
# -----------------------------
tokenizer = get_tokenizer("basic_english")
MAX_WORDS=150
# All texts are truncated and padded to MAX_WORDS tokens
def collate_batch(batch):
    Y, X = list(zip(*batch))
    Y = torch.tensor(Y) - 1 # Target names in range [0,1,2,3] instead of [1,2,3,4]
    X = [vocab(tokenizer(text)) for text in X]
    # Bringing all samples to MAX_WORDS length. Shorter texts are padded with <PAD> sequences, longer texts are truncated.
    X = [tokens+([vocab['<PAD>']]* (MAX_WORDS-len(tokens))) if len(tokens)<MAX_WORDS else tokens[:MAX_WORDS] for tokens in X]
    return torch.tensor(X, dtype=torch.int32).to(device), Y.to(device)

# train_dataset_imdb3 = [(label,train_data_imdb2['review'][i]) for i,label in enumerate(train_data_imdb2['sentiment'])]
# test_dataset_imdb3 = [(label,test_data_imdb2['review'][i]) for i,label in enumerate(test_data_imdb2['sentiment'])]

train_dataset_imdb3 = [(row.sentiment, row.review) for row in train_data_imdb2.itertuples()]
test_dataset_imdb3 = [(row.sentiment, row.review) for row in test_data_imdb2.itertuples()]

train_loader_imdb = DataLoader(train_dataset_imdb3, batch_size=BATCH_SIZE,
                              shuffle=True, collate_fn=collate_batch)
test_loader_imdb = DataLoader(test_dataset_imdb3, batch_size=BATCH_SIZE,
                              shuffle=False, collate_fn=collate_batch)

target_classes2 = ["Negative", "Positive"]

def build_vocabulary(datasets):
    for dataset in datasets:
        for _, text in dataset:
            yield tokenizer(text)

# Vocabulary includes all tokens with at least 10 occurrences in the texts
# Special tokens <PAD> and <UNK> are used for padding sequences and unknown words respectively
vocab3 = build_vocab_from_iterator(build_vocabulary([train_dataset_imdb3, test_dataset_imdb3]), min_freq=10, specials=["<PAD>","<UNK>"])
vocab3.set_default_index(vocab3["<UNK>"])

In [92]:
def EvaluateModel2(model, loss_fn, val_loader):
    model.eval()
    with torch.no_grad():
        Y_actual, Y_preds, losses = [],[],[]
        for X, Y in val_loader:
            preds = model(X)
            loss = loss_fn(preds, Y)
            losses.append(loss.item())

            Y_actual.append(Y)
            Y_preds.append(preds.argmax(dim=-1))

        Y_actual = torch.cat(Y_actual)
        Y_preds = torch.cat(Y_preds)

    # Returns mean loss, actual labels, predicted labels
    return torch.tensor(losses).mean(), Y_actual.detach().cpu().numpy(), Y_preds.detach().cpu().numpy()

In [93]:
# Usage example assuming vocab3, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier25 = model(len(vocab3), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes2)).to(device)
loss_fn25 = nn.CrossEntropyLoss().to(device)
optimizer25 = torch.optim.Adam([param for param in classifier25.parameters() if param.requires_grad], lr=LEARNING_RATE)

# Function to count parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print('\nModel:')
print(classifier25)
print('Total parameters: ', count_parameters(classifier25))
print('\n\n')

epoch_durations25 = TrainModel(classifier25, loss_fn25, optimizer25, train_loader_imdb, EPOCHS)
# Calculate and print the average duration per epoch
average_duration25 = sum(epoch_durations25) / len(epoch_durations25)
print(f"Average training time per epoch: {average_duration25:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------


_, Y_actual, Y_preds = EvaluateModel2(classifier25, loss_fn25, test_loader_imdb)

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes2))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
model(
  (embedding_layer): Embedding(29065, 100)
  (rnn): RNN(100, 64, batch_first=True)
  (linear): Linear(in_features=64, out_features=2, bias=True)
)
Total parameters:  2917254



Epoch: 1


100%|██████████| 40/40 [00:06<00:00,  5.81it/s]


Train Loss : 0.694
Epoch 1/15 - Duration: 6.89 seconds 
Epoch: 2


100%|██████████| 40/40 [00:06<00:00,  5.97it/s]


Train Loss : 0.690
Epoch 2/15 - Duration: 6.70 seconds 
Epoch: 3


100%|██████████| 40/40 [00:06<00:00,  6.00it/s]


Train Loss : 0.685
Epoch 3/15 - Duration: 6.67 seconds 
Epoch: 4


100%|██████████| 40/40 [00:06<00:00,  6.06it/s]


Train Loss : 0.683
Epoch 4/15 - Duration: 6.60 seconds 
Epoch: 5


100%|██████████| 40/40 [00:06<00:00,  6.04it/s]


Train Loss : 0.677
Epoch 5/15 - Duration: 6.62 seconds 
Epoch: 6


100%|██████████| 40/40 [00:06<00:00,  5.99it/s]


Train Loss : 0.674
Epoch 6/15 - Duration: 6.69 seconds 
Epoch: 7


100%|██████████| 40/40 [00:06<00:00,  5.90it/s]


Train Loss : 0.675
Epoch 7/15 - Duration: 6.78 seconds 
Epoch: 8


100%|██████████| 40/40 [00:06<00:00,  5.96it/s]


Train Loss : 0.659
Epoch 8/15 - Duration: 6.71 seconds 
Epoch: 9


100%|██████████| 40/40 [00:06<00:00,  5.94it/s]


Train Loss : 0.630
Epoch 9/15 - Duration: 6.74 seconds 
Epoch: 10


100%|██████████| 40/40 [00:06<00:00,  5.97it/s]


Train Loss : 0.705
Epoch 10/15 - Duration: 6.70 seconds 
Epoch: 11


100%|██████████| 40/40 [00:06<00:00,  6.07it/s]


Train Loss : 0.684
Epoch 11/15 - Duration: 6.60 seconds 
Epoch: 12


100%|██████████| 40/40 [00:06<00:00,  6.08it/s]


Train Loss : 0.680
Epoch 12/15 - Duration: 6.59 seconds 
Epoch: 13


100%|██████████| 40/40 [00:06<00:00,  5.99it/s]


Train Loss : 0.677
Epoch 13/15 - Duration: 6.68 seconds 
Epoch: 14


100%|██████████| 40/40 [00:06<00:00,  6.03it/s]


Train Loss : 0.671
Epoch 14/15 - Duration: 6.64 seconds 
Epoch: 15


100%|██████████| 40/40 [00:06<00:00,  5.87it/s]


Train Loss : 0.666
Epoch 15/15 - Duration: 6.82 seconds 
Average training time per epoch: 6.70 seconds

Test Accuracy : 0.568

Classification Report : 
              precision    recall  f1-score   support

    Negative       0.58      0.49      0.53      4961
    Positive       0.56      0.65      0.60      5039

    accuracy                           0.57     10000
   macro avg       0.57      0.57      0.56     10000
weighted avg       0.57      0.57      0.56     10000


Confusion Matrix : 
[[2409 2552]
 [1769 3270]]


In [94]:
# Usage example assuming vocab3, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier26 = BiRNN1(len(vocab3), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes2)).to(device)
loss_fn26 = nn.CrossEntropyLoss().to(device)
optimizer26 = torch.optim.Adam([param for param in classifier26.parameters() if param.requires_grad], lr=LEARNING_RATE)

# Function to count parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print('\nModel:')
print(classifier26)
print('Total parameters: ', count_parameters(classifier26))
print('\n\n')

epoch_durations26 = TrainModel(classifier26, loss_fn26, optimizer26, train_loader_imdb, EPOCHS)
# Calculate and print the average duration per epoch
average_duration26 = sum(epoch_durations26) / len(epoch_durations26)
print(f"Average training time per epoch: {average_duration26:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------


_, Y_actual, Y_preds = EvaluateModel2(classifier26, loss_fn26, test_loader_imdb)

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes2))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiRNN1(
  (embedding_layer): Embedding(29065, 100)
  (rnn): RNN(100, 64, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=2, bias=True)
)
Total parameters:  2928006



Epoch: 1


100%|██████████| 40/40 [00:06<00:00,  5.84it/s]


Train Loss : 0.690
Epoch 1/15 - Duration: 6.86 seconds 
Epoch: 2


100%|██████████| 40/40 [00:06<00:00,  5.76it/s]


Train Loss : 0.667
Epoch 2/15 - Duration: 6.95 seconds 
Epoch: 3


100%|██████████| 40/40 [00:07<00:00,  5.52it/s]


Train Loss : 0.660
Epoch 3/15 - Duration: 7.25 seconds 
Epoch: 4


100%|██████████| 40/40 [00:06<00:00,  5.82it/s]


Train Loss : 0.640
Epoch 4/15 - Duration: 6.88 seconds 
Epoch: 5


100%|██████████| 40/40 [00:07<00:00,  5.68it/s]


Train Loss : 0.619
Epoch 5/15 - Duration: 7.05 seconds 
Epoch: 6


100%|██████████| 40/40 [00:06<00:00,  5.87it/s]


Train Loss : 0.647
Epoch 6/15 - Duration: 6.82 seconds 
Epoch: 7


100%|██████████| 40/40 [00:06<00:00,  5.85it/s]


Train Loss : 0.610
Epoch 7/15 - Duration: 6.84 seconds 
Epoch: 8


100%|██████████| 40/40 [00:06<00:00,  5.88it/s]


Train Loss : 0.578
Epoch 8/15 - Duration: 6.81 seconds 
Epoch: 9


100%|██████████| 40/40 [00:06<00:00,  5.86it/s]


Train Loss : 0.556
Epoch 9/15 - Duration: 6.83 seconds 
Epoch: 10


100%|██████████| 40/40 [00:07<00:00,  5.63it/s]


Train Loss : 0.549
Epoch 10/15 - Duration: 7.10 seconds 
Epoch: 11


100%|██████████| 40/40 [00:06<00:00,  5.72it/s]


Train Loss : 0.576
Epoch 11/15 - Duration: 7.00 seconds 
Epoch: 12


100%|██████████| 40/40 [00:07<00:00,  5.67it/s]


Train Loss : 0.551
Epoch 12/15 - Duration: 7.06 seconds 
Epoch: 13


100%|██████████| 40/40 [00:06<00:00,  5.75it/s]


Train Loss : 0.591
Epoch 13/15 - Duration: 6.96 seconds 
Epoch: 14


100%|██████████| 40/40 [00:06<00:00,  5.78it/s]


Train Loss : 0.571
Epoch 14/15 - Duration: 6.93 seconds 
Epoch: 15


100%|██████████| 40/40 [00:06<00:00,  5.75it/s]


Train Loss : 0.556
Epoch 15/15 - Duration: 6.96 seconds 
Average training time per epoch: 6.95 seconds

Test Accuracy : 0.665

Classification Report : 
              precision    recall  f1-score   support

    Negative       0.70      0.57      0.63      4961
    Positive       0.64      0.76      0.70      5039

    accuracy                           0.67     10000
   macro avg       0.67      0.66      0.66     10000
weighted avg       0.67      0.67      0.66     10000


Confusion Matrix : 
[[2839 2122]
 [1225 3814]]


In [95]:
# Usage example assuming vocab3, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier27 = BiRNN2(len(vocab3), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes2)).to(device)
loss_fn27 = nn.CrossEntropyLoss().to(device)
optimizer27 = torch.optim.Adam([param for param in classifier27.parameters() if param.requires_grad], lr=LEARNING_RATE)

# Function to count parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print('\nModel:')
print(classifier27)
print('Total parameters: ', count_parameters(classifier27))
print('\n\n')

epoch_durations27 = TrainModel(classifier27, loss_fn27, optimizer27, train_loader_imdb, EPOCHS)
# Calculate and print the average duration per epoch
average_duration27 = sum(epoch_durations27) / len(epoch_durations27)
print(f"Average training time per epoch: {average_duration27:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------


_, Y_actual, Y_preds = EvaluateModel2(classifier27, loss_fn27, test_loader_imdb)

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes2))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiRNN2(
  (embedding_layer): Embedding(29065, 100)
  (rnn): RNN(100, 64, num_layers=2, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=2, bias=True)
)
Total parameters:  2952838



Epoch: 1


100%|██████████| 40/40 [00:06<00:00,  5.78it/s]


Train Loss : 0.688
Epoch 1/15 - Duration: 6.93 seconds 
Epoch: 2


100%|██████████| 40/40 [00:07<00:00,  5.68it/s]


Train Loss : 0.663
Epoch 2/15 - Duration: 7.05 seconds 
Epoch: 3


100%|██████████| 40/40 [00:07<00:00,  5.68it/s]


Train Loss : 0.667
Epoch 3/15 - Duration: 7.05 seconds 
Epoch: 4


100%|██████████| 40/40 [00:07<00:00,  5.70it/s]


Train Loss : 0.630
Epoch 4/15 - Duration: 7.02 seconds 
Epoch: 5


100%|██████████| 40/40 [00:07<00:00,  5.65it/s]


Train Loss : 0.675
Epoch 5/15 - Duration: 7.09 seconds 
Epoch: 6


100%|██████████| 40/40 [00:06<00:00,  5.73it/s]


Train Loss : 0.652
Epoch 6/15 - Duration: 6.99 seconds 
Epoch: 7


100%|██████████| 40/40 [00:07<00:00,  5.70it/s]


Train Loss : 0.617
Epoch 7/15 - Duration: 7.02 seconds 
Epoch: 8


100%|██████████| 40/40 [00:07<00:00,  5.55it/s]


Train Loss : 0.606
Epoch 8/15 - Duration: 7.21 seconds 
Epoch: 9


100%|██████████| 40/40 [00:06<00:00,  5.74it/s]


Train Loss : 0.587
Epoch 9/15 - Duration: 6.97 seconds 
Epoch: 10


100%|██████████| 40/40 [00:07<00:00,  5.31it/s]


Train Loss : 0.585
Epoch 10/15 - Duration: 7.54 seconds 
Epoch: 11


100%|██████████| 40/40 [00:07<00:00,  5.54it/s]


Train Loss : 0.559
Epoch 11/15 - Duration: 7.23 seconds 
Epoch: 12


100%|██████████| 40/40 [00:07<00:00,  5.61it/s]


Train Loss : 0.573
Epoch 12/15 - Duration: 7.13 seconds 
Epoch: 13


100%|██████████| 40/40 [00:07<00:00,  5.62it/s]


Train Loss : 0.568
Epoch 13/15 - Duration: 7.13 seconds 
Epoch: 14


100%|██████████| 40/40 [00:07<00:00,  5.51it/s]


Train Loss : 0.609
Epoch 14/15 - Duration: 7.26 seconds 
Epoch: 15


100%|██████████| 40/40 [00:07<00:00,  5.54it/s]


Train Loss : 0.573
Epoch 15/15 - Duration: 7.23 seconds 
Average training time per epoch: 7.12 seconds

Test Accuracy : 0.697

Classification Report : 
              precision    recall  f1-score   support

    Negative       0.71      0.67      0.69      4961
    Positive       0.69      0.72      0.71      5039

    accuracy                           0.70     10000
   macro avg       0.70      0.70      0.70     10000
weighted avg       0.70      0.70      0.70     10000


Confusion Matrix : 
[[3313 1648]
 [1386 3653]]


In [96]:
# Usage example assuming vocab3, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier28 = LSTM1(len(vocab3), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes2)).to(device)
loss_fn28 = nn.CrossEntropyLoss().to(device)
optimizer28 = torch.optim.Adam([param for param in classifier28.parameters() if param.requires_grad], lr=LEARNING_RATE)

# Function to count parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print('\nModel:')
print(classifier28)
print('Total parameters: ', count_parameters(classifier28))
print('\n\n')

epoch_durations28 = TrainModel(classifier28, loss_fn28, optimizer28, train_loader_imdb, EPOCHS)
# Calculate and print the average duration per epoch
average_duration28 = sum(epoch_durations28) / len(epoch_durations28)
print(f"Average training time per epoch: {average_duration28:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------


_, Y_actual, Y_preds = EvaluateModel2(classifier28, loss_fn28, test_loader_imdb)

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes2))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
LSTM1(
  (embedding_layer): Embedding(29065, 100)
  (lstm): LSTM(100, 64, batch_first=True)
  (linear): Linear(in_features=64, out_features=2, bias=True)
)
Total parameters:  2949126



Epoch: 1


100%|██████████| 40/40 [00:06<00:00,  5.72it/s]


Train Loss : 0.692
Epoch 1/15 - Duration: 6.99 seconds 
Epoch: 2


100%|██████████| 40/40 [00:07<00:00,  5.66it/s]


Train Loss : 0.689
Epoch 2/15 - Duration: 7.07 seconds 
Epoch: 3


100%|██████████| 40/40 [00:07<00:00,  5.68it/s]


Train Loss : 0.678
Epoch 3/15 - Duration: 7.05 seconds 
Epoch: 4


100%|██████████| 40/40 [00:07<00:00,  5.63it/s]


Train Loss : 0.651
Epoch 4/15 - Duration: 7.11 seconds 
Epoch: 5


100%|██████████| 40/40 [00:07<00:00,  5.44it/s]


Train Loss : 0.640
Epoch 5/15 - Duration: 7.36 seconds 
Epoch: 6


100%|██████████| 40/40 [00:07<00:00,  5.61it/s]


Train Loss : 0.627
Epoch 6/15 - Duration: 7.13 seconds 
Epoch: 7


100%|██████████| 40/40 [00:07<00:00,  5.59it/s]


Train Loss : 0.597
Epoch 7/15 - Duration: 7.16 seconds 
Epoch: 8


100%|██████████| 40/40 [00:07<00:00,  5.63it/s]


Train Loss : 0.586
Epoch 8/15 - Duration: 7.11 seconds 
Epoch: 9


100%|██████████| 40/40 [00:07<00:00,  5.70it/s]


Train Loss : 0.584
Epoch 9/15 - Duration: 7.02 seconds 
Epoch: 10


100%|██████████| 40/40 [00:07<00:00,  5.59it/s]


Train Loss : 0.554
Epoch 10/15 - Duration: 7.16 seconds 
Epoch: 11


100%|██████████| 40/40 [00:06<00:00,  5.72it/s]


Train Loss : 0.537
Epoch 11/15 - Duration: 7.00 seconds 
Epoch: 12


100%|██████████| 40/40 [00:07<00:00,  5.58it/s]


Train Loss : 0.529
Epoch 12/15 - Duration: 7.17 seconds 
Epoch: 13


100%|██████████| 40/40 [00:07<00:00,  5.71it/s]


Train Loss : 0.515
Epoch 13/15 - Duration: 7.01 seconds 
Epoch: 14


100%|██████████| 40/40 [00:07<00:00,  5.70it/s]


Train Loss : 0.536
Epoch 14/15 - Duration: 7.03 seconds 
Epoch: 15


100%|██████████| 40/40 [00:07<00:00,  5.60it/s]


Train Loss : 0.583
Epoch 15/15 - Duration: 7.15 seconds 
Average training time per epoch: 7.10 seconds

Test Accuracy : 0.605

Classification Report : 
              precision    recall  f1-score   support

    Negative       0.89      0.23      0.37      4961
    Positive       0.56      0.97      0.71      5039

    accuracy                           0.61     10000
   macro avg       0.73      0.60      0.54     10000
weighted avg       0.73      0.61      0.54     10000


Confusion Matrix : 
[[1152 3809]
 [ 140 4899]]


In [97]:
# Usage example assuming vocab3, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier29 = BiLSTM1(len(vocab3), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes2)).to(device)
loss_fn29 = nn.CrossEntropyLoss().to(device)
optimizer29 = torch.optim.Adam([param for param in classifier29.parameters() if param.requires_grad], lr=LEARNING_RATE)

# Function to count parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print('\nModel:')
print(classifier29)
print('Total parameters: ', count_parameters(classifier29))
print('\n\n')

epoch_durations29 = TrainModel(classifier29, loss_fn29, optimizer29, train_loader_imdb, EPOCHS)
# Calculate and print the average duration per epoch
average_duration29 = sum(epoch_durations29) / len(epoch_durations29)
print(f"Average training time per epoch: {average_duration29:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------


_, Y_actual, Y_preds = EvaluateModel2(classifier29, loss_fn29, test_loader_imdb)

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes2))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiLSTM1(
  (embedding_layer): Embedding(29065, 100)
  (lstm): LSTM(100, 64, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=2, bias=True)
)
Total parameters:  2991750



Epoch: 1


100%|██████████| 40/40 [00:07<00:00,  5.41it/s]


Train Loss : 0.688
Epoch 1/15 - Duration: 7.39 seconds 
Epoch: 2


100%|██████████| 40/40 [00:07<00:00,  5.46it/s]


Train Loss : 0.653
Epoch 2/15 - Duration: 7.34 seconds 
Epoch: 3


100%|██████████| 40/40 [00:07<00:00,  5.47it/s]


Train Loss : 0.618
Epoch 3/15 - Duration: 7.32 seconds 
Epoch: 4


100%|██████████| 40/40 [00:07<00:00,  5.48it/s]


Train Loss : 0.584
Epoch 4/15 - Duration: 7.31 seconds 
Epoch: 5


100%|██████████| 40/40 [00:07<00:00,  5.44it/s]


Train Loss : 0.546
Epoch 5/15 - Duration: 7.35 seconds 
Epoch: 6


100%|██████████| 40/40 [00:07<00:00,  5.34it/s]


Train Loss : 0.519
Epoch 6/15 - Duration: 7.49 seconds 
Epoch: 7


100%|██████████| 40/40 [00:07<00:00,  5.35it/s]


Train Loss : 0.500
Epoch 7/15 - Duration: 7.48 seconds 
Epoch: 8


100%|██████████| 40/40 [00:07<00:00,  5.15it/s]


Train Loss : 0.490
Epoch 8/15 - Duration: 7.77 seconds 
Epoch: 9


100%|██████████| 40/40 [00:07<00:00,  5.34it/s]


Train Loss : 0.486
Epoch 9/15 - Duration: 7.50 seconds 
Epoch: 10


100%|██████████| 40/40 [00:07<00:00,  5.39it/s]


Train Loss : 0.472
Epoch 10/15 - Duration: 7.43 seconds 
Epoch: 11


100%|██████████| 40/40 [00:07<00:00,  5.34it/s]


Train Loss : 0.457
Epoch 11/15 - Duration: 7.49 seconds 
Epoch: 12


100%|██████████| 40/40 [00:07<00:00,  5.37it/s]


Train Loss : 0.448
Epoch 12/15 - Duration: 7.46 seconds 
Epoch: 13


100%|██████████| 40/40 [00:07<00:00,  5.35it/s]


Train Loss : 0.440
Epoch 13/15 - Duration: 7.49 seconds 
Epoch: 14


100%|██████████| 40/40 [00:07<00:00,  5.35it/s]


Train Loss : 0.436
Epoch 14/15 - Duration: 7.48 seconds 
Epoch: 15


100%|██████████| 40/40 [00:07<00:00,  5.41it/s]


Train Loss : 0.427
Epoch 15/15 - Duration: 7.39 seconds 
Average training time per epoch: 7.45 seconds

Test Accuracy : 0.818

Classification Report : 
              precision    recall  f1-score   support

    Negative       0.87      0.74      0.80      4961
    Positive       0.78      0.89      0.83      5039

    accuracy                           0.82     10000
   macro avg       0.82      0.82      0.82     10000
weighted avg       0.82      0.82      0.82     10000


Confusion Matrix : 
[[3677 1284]
 [ 541 4498]]


In [98]:
# Usage example assuming vocab3, EMBEDDING_DIM, HIDDEN_DIM, target_classes, device, and LEARNING_RATE are defined
classifier30 = BiLSTM2(len(vocab3), EMBEDDING_DIM, HIDDEN_DIM, len(target_classes2)).to(device)
loss_fn30 = nn.CrossEntropyLoss().to(device)
optimizer30 = torch.optim.Adam([param for param in classifier30.parameters() if param.requires_grad], lr=LEARNING_RATE)

# Function to count parameters
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

print('\nModel:')
print(classifier30)
print('Total parameters: ', count_parameters(classifier30))
print('\n\n')

epoch_durations30 = TrainModel(classifier30, loss_fn30, optimizer30, train_loader_imdb, EPOCHS)
# Calculate and print the average duration per epoch
average_duration30 = sum(epoch_durations30) / len(epoch_durations30)
print(f"Average training time per epoch: {average_duration30:.2f} seconds")

######################################################################
# Evaluate the model with test dataset
# ------------------------------------


_, Y_actual, Y_preds = EvaluateModel2(classifier30, loss_fn30, test_loader_imdb)

print("\nTest Accuracy : {:.3f}".format(accuracy_score(Y_actual, Y_preds)))
print("\nClassification Report : ")
print(classification_report(Y_actual, Y_preds, target_names=target_classes2))
print("\nConfusion Matrix : ")
print(confusion_matrix(Y_actual, Y_preds))


Model:
BiLSTM2(
  (embedding_layer): Embedding(29065, 100)
  (lstm): LSTM(100, 64, num_layers=2, batch_first=True, bidirectional=True)
  (linear): Linear(in_features=128, out_features=2, bias=True)
)
Total parameters:  3091078



Epoch: 1


100%|██████████| 40/40 [00:07<00:00,  5.12it/s]


Train Loss : 0.682
Epoch 1/15 - Duration: 7.82 seconds 
Epoch: 2


100%|██████████| 40/40 [00:07<00:00,  5.19it/s]


Train Loss : 0.639
Epoch 2/15 - Duration: 7.71 seconds 
Epoch: 3


100%|██████████| 40/40 [00:07<00:00,  5.14it/s]


Train Loss : 0.607
Epoch 3/15 - Duration: 7.79 seconds 
Epoch: 4


100%|██████████| 40/40 [00:07<00:00,  5.14it/s]


Train Loss : 0.589
Epoch 4/15 - Duration: 7.79 seconds 
Epoch: 5


100%|██████████| 40/40 [00:07<00:00,  5.24it/s]


Train Loss : 0.598
Epoch 5/15 - Duration: 7.65 seconds 
Epoch: 6


100%|██████████| 40/40 [00:07<00:00,  5.11it/s]


Train Loss : 0.568
Epoch 6/15 - Duration: 7.84 seconds 
Epoch: 7


100%|██████████| 40/40 [00:07<00:00,  5.19it/s]


Train Loss : 0.708
Epoch 7/15 - Duration: 7.71 seconds 
Epoch: 8


100%|██████████| 40/40 [00:07<00:00,  5.14it/s]


Train Loss : 0.770
Epoch 8/15 - Duration: 7.79 seconds 
Epoch: 9


100%|██████████| 40/40 [00:07<00:00,  5.22it/s]


Train Loss : 0.736
Epoch 9/15 - Duration: 7.66 seconds 
Epoch: 10


100%|██████████| 40/40 [00:07<00:00,  5.20it/s]


Train Loss : 0.627
Epoch 10/15 - Duration: 7.70 seconds 
Epoch: 11


100%|██████████| 40/40 [00:07<00:00,  5.11it/s]


Train Loss : 0.569
Epoch 11/15 - Duration: 7.84 seconds 
Epoch: 12


100%|██████████| 40/40 [00:07<00:00,  5.21it/s]


Train Loss : 0.541
Epoch 12/15 - Duration: 7.68 seconds 
Epoch: 13


100%|██████████| 40/40 [00:07<00:00,  5.10it/s]


Train Loss : 0.526
Epoch 13/15 - Duration: 7.85 seconds 
Epoch: 14


100%|██████████| 40/40 [00:07<00:00,  5.11it/s]


Train Loss : 0.570
Epoch 14/15 - Duration: 7.83 seconds 
Epoch: 15


100%|██████████| 40/40 [00:07<00:00,  5.27it/s]


Train Loss : 0.518
Epoch 15/15 - Duration: 7.59 seconds 
Average training time per epoch: 7.75 seconds

Test Accuracy : 0.787

Classification Report : 
              precision    recall  f1-score   support

    Negative       0.77      0.82      0.79      4961
    Positive       0.81      0.75      0.78      5039

    accuracy                           0.79     10000
   macro avg       0.79      0.79      0.79     10000
weighted avg       0.79      0.79      0.79     10000


Confusion Matrix : 
[[4064  897]
 [1235 3804]]


# Results on IMDB Movie Review Dataset (MAX_WORDS=150)

|  | 1RNN | 1-BiRNN | 2-BiRNN | 1LSTM | 1-BiLSTM |2-BiLSTM  |
|:---------|:--------:|---------:|---------:|---------:|---------:|---------:|
|  Accuracy  |0.568     |     0.665      |       0.697  |    0.605    |       0.818   |       0.787      |
|  Parameters   |2917254  |     2928006   |      2952838 |       2949126  |    2991750|   3091078     |
|  Time Cost(per epoch)   | 6.70 seconds|    6.95 seconds |   7.12 seconds|    4.94 seconds|   7.45 seconds |   7.75 seconds |

From the table above, it can be understood that RNN faces obstacles when it has to deal with long sequences. This is proved by the very low accuracy produced by all of the RNN architectures, but especially unidirectional RNN with 1 layer.
Unidirectional Lstm also struggles to perform well on this dataset for the same reason. On the other hand, bidirectional LSTM with one and two layers indicate increased performance. The peak accuracy was reached by unidirectional RNN with 1 layer. The extra layer might result to overfitting but it is still more robust than the unidirectional and RNN approach. LSTM accurate versions need slightly more time per epoch than RNN (almost 1 sec per epoch). Several tenths of a second difference is observed in the running time of RNN models. LSTM unidirectional is faster by 2 seconds but it produces a lot of inaccurate predictions. All of the models produce approximately 800k more parameters than the AG News Topic classification dataset. LSTMs need almost 80k more parameters than RNNs. There is almost 30k parameter differnce from the most simple version to the most complex one.  