we chose the IMDB dataset it contains 50 000 movie reviews each one labeled as positive or negative

In [1]:
import kagglehub
from kagglehub import KaggleDatasetAdapter
import pandas as pd

file_path = "IMDB Dataset.csv"

df = kagglehub.load_dataset(
  KaggleDatasetAdapter.PANDAS,
  "lakshmi25npathi/imdb-dataset-of-50k-movie-reviews",
  file_path,)

  df = kagglehub.load_dataset(


Using Colab cache for faster access to the 'imdb-dataset-of-50k-movie-reviews' dataset.


since neural networks only work with numbers we converted each sentiment label into binary values ( negative->0 positive->1)

In [2]:
df = df.copy()

df["label"] = df["sentiment"].map({
    "negative": 0,
    "positive": 1
}).astype(int)


to clean our data we implemented a preprocessing function that converts text to lowercase, remove html tags and extra spaces then convert text into tokens

In [3]:
from nltk.tokenize import word_tokenize
import nltk
nltk.download("punkt_tab")
import re

def preprocess_text(text):
    # Lowercase
    text = text.lower()

    # Remove HTML tags
    text = re.sub(r"<br\s*/?>", " ", text)

    # Remove extra spaces
    text = re.sub(r"\s+", " ", text).strip()

    # Tokenize
    tokens = word_tokenize(text)

    return tokens

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


we then applied this function on our reviews by adding a new column where each review is represented as a list of tokens

In [4]:
df["tokens"] = df["review"].astype(str).apply(preprocess_text)

we calculated the average and maximum length of our reviews , this helped us know the maximum sequence length for our models

In [5]:
import numpy as np

lengths = df["tokens"].apply(len)
print("Average length:", np.mean(lengths))
print("Max length:", np.max(lengths))


Average length: 264.68574
Max length: 2738


we split our dataset into train,test and validation and we also used startification so we can keep the classes distribution balanced among sets

In [6]:
import sklearn
from sklearn.model_selection import train_test_split
train_df, test_df = train_test_split(
    df, test_size=0.2, random_state=42, stratify=df["label"]
)

train_df, val_df = train_test_split(
    train_df, test_size=0.2, random_state=42, stratify=train_df["label"]
)

print("Train:", train_df.shape, "Val:", val_df.shape, "Test:", test_df.shape)
print("Label balance (train):\n", train_df["label"].value_counts(normalize=True))
print(df[["sentiment","label","tokens"]].head())

Train: (32000, 4) Val: (8000, 4) Test: (10000, 4)
Label balance (train):
 label
0    0.5
1    0.5
Name: proportion, dtype: float64
  sentiment  label                                             tokens
0  positive      1  [one, of, the, other, reviewers, has, mentione...
1  positive      1  [a, wonderful, little, production, ., the, fil...
2  positive      1  [i, thought, this, was, a, wonderful, way, to,...
3  negative      0  [basically, there, 's, a, family, where, a, li...
4  positive      1  [petter, mattei, 's, ``, love, in, the, time, ...


we then created our vocabulary of 20000 words (most frequent) and converted each word into its numerical index

In [7]:
from collections import Counter
MAX_VOCAB = 20000
PAD_TOKEN = "<pad>"
UNK_TOKEN = "<unk>"
PAD_IDX = 0
UNK_IDX = 1

counter = Counter()
for tokens in train_df["tokens"]:
    counter.update(tokens)


most_common = counter.most_common(MAX_VOCAB - 2)
itos = [PAD_TOKEN, UNK_TOKEN] + [w for w, _ in most_common]   # index -> word
stoi = {w: i for i, w in enumerate(itos)}                    # word -> index

print("Vocab size:", len(itos), "| Example:", itos[:10])
def numericalize(tokens):
    return [stoi.get(t, UNK_IDX) for t in tokens]

train_df = train_df.copy()
val_df   = val_df.copy()
test_df  = test_df.copy()

train_df["seq"] = train_df["tokens"].apply(numericalize)
val_df["seq"]   = val_df["tokens"].apply(numericalize)
test_df["seq"]  = test_df["tokens"].apply(numericalize)

# sanity check
print(train_df[["tokens","seq","label"]].head(2))
print("Example seq length:", len(train_df["seq"].iloc[0]))

Vocab size: 20000 | Example: ['<pad>', '<unk>', 'the', ',', '.', 'and', 'a', 'of', 'to', 'is']
                                                  tokens  \
26680  [oh, yes, ,, i, have, to, agree, with, the, ot...   
16648  [the, basic, hook, here, is, :, lincoln, is, s...   

                                                     seq  label  
26680  [440, 438, 3, 12, 34, 8, 1004, 18, 2, 405, 43,...      0  
16648  [2, 1202, 3987, 140, 9, 88, 3633, 9, 630, 4, 1...      1  
Example seq length: 536


we truncated our sequence into 250 tokens since neural networks require a fixed sequence input length

In [8]:
MAX_LEN = 250

def pad_truncate(seq, max_len=MAX_LEN, pad_value=PAD_IDX):
    if len(seq) > max_len:
        return seq[:max_len]
    return seq + [pad_value] * (max_len - len(seq))

train_df["seq_pad"] = train_df["seq"].apply(pad_truncate)
val_df["seq_pad"]   = val_df["seq"].apply(pad_truncate)
test_df["seq_pad"]  = test_df["seq"].apply(pad_truncate)

print(len(train_df["seq_pad"].iloc[0]), train_df["seq_pad"].iloc[0][:15])

250 [440, 438, 3, 12, 34, 8, 1004, 18, 2, 405, 43, 1653, 13, 17, 3353]


to start training we used pytorch and created dataloaders to efficiently feed data into batches while training

In [9]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", device)

class SeqDataset(Dataset):
    def __init__(self, df):
        self.X = df["seq_pad"].tolist()
        self.y = df["label"].tolist()

    def __len__(self):
        return len(self.y)

    def __getitem__(self, idx):
        x = torch.tensor(self.X[idx], dtype=torch.long)
        y = torch.tensor(self.y[idx], dtype=torch.float32)
        return x, y

BATCH_SIZE = 128

train_loader = DataLoader(SeqDataset(train_df), batch_size=BATCH_SIZE, shuffle=True)
val_loader   = DataLoader(SeqDataset(val_df), batch_size=BATCH_SIZE)
test_loader  = DataLoader(SeqDataset(test_df), batch_size=BATCH_SIZE)

Device: cuda


In [10]:
X_train_text = train_df["review"].astype(str).tolist()
y_train = train_df["label"].values

X_val_text = val_df["review"].astype(str).tolist()
y_val = val_df["label"].values

X_test_text = test_df["review"].astype(str).tolist()
y_test = test_df["label"].values

In [11]:
!pip -q install gensim

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.9/27.9 MB[0m [31m57.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [12]:
from gensim.models import Word2Vec
import numpy as np

we created our word2vec embeddings and trained them on our training dataset

In [13]:
EMBED_DIM = 100

w2v_model = Word2Vec(
    sentences=train_df["tokens"].tolist(),
    vector_size=EMBED_DIM,
    window=5,
    min_count=2,
    workers=4
)

In [14]:
VOCAB_SIZE = len(itos)

embedding_matrix_w2v = np.random.normal(0, 0.05, (VOCAB_SIZE, EMBED_DIM)).astype(np.float32)
embedding_matrix_w2v[PAD_IDX] = np.zeros(EMBED_DIM, dtype=np.float32)
for word, idx in stoi.items():
    if word in w2v_model.wv:
        embedding_matrix_w2v[idx] = w2v_model.wv[word]

In [15]:
import torch

embedding_tensor_w2v = torch.tensor(embedding_matrix_w2v, dtype=torch.float32)

In [16]:
#GRU WITH WORD2VEC EMBEDDINGS
import torch.nn as nn

class GRU_W2V(nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()

        self.embedding = nn.Embedding.from_pretrained(
            embedding_tensor_w2v,
            freeze=False,
            padding_idx=PAD_IDX
        )

        self.gru = nn.GRU(
            input_size=EMBED_DIM,
            hidden_size=hidden_dim,
            batch_first=True
        )

        self.fc = nn.Linear(hidden_dim, 1)

    def forward(self, x):
        emb = self.embedding(x)
        _, h = self.gru(emb)
        logits = self.fc(h[-1]).squeeze(1)
        return logits

In [17]:
#RNN WITH WORD2VEC EMBEDDINGS
class RNN_W2V(nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()

        self.embedding = nn.Embedding.from_pretrained(
            embedding_tensor_w2v,
            freeze=False,
            padding_idx=PAD_IDX
        )

        self.rnn = nn.RNN(
            input_size=EMBED_DIM,
            hidden_size=hidden_dim,
            batch_first=True
        )

        self.fc = nn.Linear(hidden_dim, 1)

    def forward(self, x):
        emb = self.embedding(x)
        _, h = self.rnn(emb)
        logits = self.fc(h[-1]).squeeze(1)
        return logits

In [18]:
#BIDIRECTIONAL GRU WITH W2V EMBEDDINGS
class BiGRU_W2V(nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()

        self.embedding = nn.Embedding.from_pretrained(
            embedding_tensor_w2v,
            freeze=False,
            padding_idx=PAD_IDX
        )

        self.bigru = nn.GRU(
            input_size=EMBED_DIM,
            hidden_size=hidden_dim,
            batch_first=True,
            bidirectional=True
        )

        self.fc = nn.Linear(hidden_dim * 2, 1)

    def forward(self, x):
        emb = self.embedding(x)
        _, h = self.bigru(emb)

        # h shape: [2, B, H]
        h_forward = h[0]
        h_backward = h[1]

        h_concat = torch.cat((h_forward, h_backward), dim=1)

        logits = self.fc(h_concat).squeeze(1)
        return logits

In [19]:
HIDDEN_DIM = 128

rnn_w2v_model = RNN_W2V(HIDDEN_DIM).to(device)
gru_w2v_model = GRU_W2V(HIDDEN_DIM).to(device)
bigru_w2v_model = BiGRU_W2V(HIDDEN_DIM).to(device)

In [20]:
!wget -q https://nlp.stanford.edu/data/glove.6B.zip
!unzip -q glove.6B.zip

In [21]:
#created glove embeddings
import numpy as np

GLOVE_PATH = "glove.6B.100d.txt"
EMBED_DIM = 100

glove_index = {}
with open(GLOVE_PATH, "r", encoding="utf-8") as f:
    for line in f:
        parts = line.rstrip().split(" ")
        word = parts[0]
        vec = np.array(parts[1:], dtype=np.float32)
        glove_index[word] = vec

print("Loaded GloVe vectors:", len(glove_index))

Loaded GloVe vectors: 400000


In [22]:
VOCAB_SIZE = len(itos)

embedding_matrix_glove = np.random.normal(0, 0.05, (VOCAB_SIZE, EMBED_DIM)).astype(np.float32)
embedding_matrix_glove[PAD_IDX] = np.zeros(EMBED_DIM, dtype=np.float32)
found = 0
for word, idx in stoi.items():
    vec = glove_index.get(word)
    if vec is not None:
        embedding_matrix_glove[idx] = vec
        found += 1

print(f"Found {found}/{VOCAB_SIZE} words in GloVe ({found/VOCAB_SIZE:.2%})")

Found 19729/20000 words in GloVe (98.65%)


In [23]:
import torch
embedding_tensor_glove = torch.tensor(embedding_matrix_glove, dtype=torch.float32)
print("GloVe embedding tensor shape:", embedding_tensor_glove.shape)

GloVe embedding tensor shape: torch.Size([20000, 100])


In [24]:
#RNN WITH GLOVE EMBEDDINGS
import torch.nn as nn

class RNN_GloVe(nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()
        self.embedding = nn.Embedding.from_pretrained(
            embedding_tensor_glove,
            freeze=False,
            padding_idx=PAD_IDX
        )
        self.rnn = nn.RNN(EMBED_DIM, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, 1)

    def forward(self, x):
        emb = self.embedding(x)
        _, h = self.rnn(emb)
        return self.fc(h[-1]).squeeze(1)

In [25]:
#GRU WITH GLOVE EMBEDDINGS
class GRU_GloVe(nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()
        self.embedding = nn.Embedding.from_pretrained(
            embedding_tensor_glove,
            freeze=False,
            padding_idx=PAD_IDX
        )
        self.gru = nn.GRU(EMBED_DIM, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, 1)

    def forward(self, x):
        emb = self.embedding(x)
        _, h = self.gru(emb)
        return self.fc(h[-1]).squeeze(1)

In [26]:
#BIDIRECTIONAL GRU WITH GLOVE EMBEDDINGS
class BiGRU_GloVe(nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()
        self.embedding = nn.Embedding.from_pretrained(
            embedding_tensor_glove,
            freeze=False,
            padding_idx=PAD_IDX
        )
        self.bigru = nn.GRU(EMBED_DIM, hidden_dim, batch_first=True, bidirectional=True)
        self.fc = nn.Linear(hidden_dim * 2, 1)

    def forward(self, x):
        emb = self.embedding(x)
        _, h = self.bigru(emb)
        h_cat = torch.cat((h[0], h[1]), dim=1)
        return self.fc(h_cat).squeeze(1)

In [27]:
HIDDEN_DIM = 128

rnn_glove = RNN_GloVe(HIDDEN_DIM).to(device)
gru_glove = GRU_GloVe(HIDDEN_DIM).to(device)
bigru_glove = BiGRU_GloVe(HIDDEN_DIM).to(device)

In [28]:
#CREATED TF IDF VECTORS
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np

tfidf_vec = TfidfVectorizer(max_features=20000)
tfidf_vec.fit(train_df["review"].astype(str))

tfidf_vocab = tfidf_vec.vocabulary_
idf = tfidf_vec.idf_

In [29]:
from collections import Counter

def token_tfidf_weights(tokens, max_len=MAX_LEN):
    counts = Counter(tokens)
    L = len(tokens)

    weights = []
    for t in tokens[:max_len]:
        j = tfidf_vocab.get(t, None)
        if j is None:
            weights.append(0.0)
        else:
            tf = counts[t] / L
            weights.append(tf * float(idf[j]))

    if len(weights) < max_len:
        weights += [0.0] * (max_len - len(weights))
    return weights
train_df = train_df.copy()
val_df   = val_df.copy()
test_df  = test_df.copy()

train_df["tfidf_w"] = train_df["tokens"].apply(token_tfidf_weights)
val_df["tfidf_w"]   = val_df["tokens"].apply(token_tfidf_weights)
test_df["tfidf_w"]  = test_df["tokens"].apply(token_tfidf_weights)

In [30]:
import torch
from torch.utils.data import Dataset, DataLoader

class SeqTfidfDataset(Dataset):
    def __init__(self, df):
        self.X = df["seq_pad"].tolist()
        self.W = df["tfidf_w"].tolist()
        self.y = df["label"].tolist()

    def __len__(self):
        return len(self.y)

    def __getitem__(self, idx):
        x = torch.tensor(self.X[idx], dtype=torch.long)
        w = torch.tensor(self.W[idx], dtype=torch.float32)
        y = torch.tensor(self.y[idx], dtype=torch.float32)
        return x, w, y

BATCH_SIZE = 128
train_loader_tfidfseq = DataLoader(SeqTfidfDataset(train_df), batch_size=BATCH_SIZE, shuffle=True)
val_loader_tfidfseq   = DataLoader(SeqTfidfDataset(val_df), batch_size=BATCH_SIZE)
test_loader_tfidfseq  = DataLoader(SeqTfidfDataset(test_df), batch_size=BATCH_SIZE)

In [31]:
#RNN WITH TF IDF SEQ

class RNN_TFIDFSeq(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, pad_idx):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=pad_idx)
        self.rnn = nn.RNN(embed_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, 1)

    def forward(self, x, w):
        emb = self.embedding(x)
        emb = emb * w.unsqueeze(-1)
        _, h = self.rnn(emb)
        return self.fc(h[-1]).squeeze(1)

#GRU WITH TF IDF SEQ
class GRU_TFIDFSeq(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, pad_idx):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=pad_idx)
        self.gru = nn.GRU(embed_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, 1)

    def forward(self, x, w):
        emb = self.embedding(x)
        emb = emb * w.unsqueeze(-1)
        _, h = self.gru(emb)
        return self.fc(h[-1]).squeeze(1)

#BIGRU WITH TF IDF SEQ
class BiGRU_TFIDFSeq(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, pad_idx, dropout=0.0):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=pad_idx)
        self.bigru = nn.GRU(embed_dim, hidden_dim, batch_first=True, bidirectional=True)
        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(hidden_dim * 2, 1)

    def forward(self, x, w):
        emb = self.embedding(x)
        emb = emb * w.unsqueeze(-1)
        _, h = self.bigru(emb)
        h = self.dropout(h)
        h_cat = torch.cat((h[0], h[1]), dim=1)
        return self.fc(h_cat).squeeze(1)

In [32]:
from sklearn.metrics import accuracy_score, f1_score
import torch

def run_epoch_tfidfseq(model, loader, criterion, optimizer=None, device="cuda", grad_clip=1.0):
    train = optimizer is not None
    model.train() if train else model.eval()

    all_preds, all_labels = [], []
    total_loss = 0.0

    for x, w, y in loader:
        x = x.to(device)
        w = w.to(device)
        y = y.to(device).float()          # IMPORTANT

        logits = model(x, w).view(-1)     # [B]
        loss = criterion(logits, y)

        if train:
            optimizer.zero_grad(set_to_none=True)
            loss.backward()
            if grad_clip is not None:
                torch.nn.utils.clip_grad_norm_(model.parameters(), grad_clip)
            optimizer.step()

        total_loss += loss.item() * x.size(0)

        preds = (torch.sigmoid(logits) >= 0.5).long().detach().cpu().numpy()
        labels = y.long().detach().cpu().numpy()
        all_preds.extend(preds.tolist())
        all_labels.extend(labels.tolist())

    avg_loss = total_loss / len(loader.dataset)
    acc = accuracy_score(all_labels, all_preds)
    f1  = f1_score(all_labels, all_preds)
    return avg_loss, acc, f1

In [33]:
import copy
import torch.nn as nn
import torch

def fit_model_tfidfseq(model, train_loader, val_loader, device,
                       epochs=6, lr=1e-3, weight_decay=0.0,
                       patience=2, grad_clip=1.0):
    model = model.to(device)
    criterion = nn.BCEWithLogitsLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)

    best_state = None
    best_f1 = -1.0
    bad_epochs = 0
    history = []

    for epoch in range(1, epochs + 1):
        tr_loss, tr_acc, tr_f1 = run_epoch_tfidfseq(
            model, train_loader, criterion, optimizer=optimizer, device=device, grad_clip=grad_clip
        )
        va_loss, va_acc, va_f1 = run_epoch_tfidfseq(
            model, val_loader, criterion, optimizer=None, device=device
        )

        history.append({
            "epoch": epoch,
            "train_loss": tr_loss, "train_acc": tr_acc, "train_f1": tr_f1,
            "val_loss": va_loss, "val_acc": va_acc, "val_f1": va_f1,
        })

        print(f"Epoch {epoch}/{epochs} | "
              f"train loss {tr_loss:.4f} acc {tr_acc:.4f} f1 {tr_f1:.4f} | "
              f"val loss {va_loss:.4f} acc {va_acc:.4f} f1 {va_f1:.4f}")

        if va_f1 > best_f1:
            best_f1 = va_f1
            best_state = copy.deepcopy(model.state_dict())
            bad_epochs = 0
        else:
            bad_epochs += 1
            if bad_epochs >= patience:
                print(f"Early stopping (no val F1 improvement for {patience} epochs).")
                break

    if best_state is not None:
        model.load_state_dict(best_state)

    return model, history


@torch.no_grad()
def test_model_tfidfseq(model, test_loader, device):
    criterion = nn.BCEWithLogitsLoss()
    te_loss, te_acc, te_f1 = run_epoch_tfidfseq(model, test_loader, criterion, optimizer=None, device=device)
    print(f"TEST | loss {te_loss:.4f} acc {te_acc:.4f} f1 {te_f1:.4f}")
    return {"loss": te_loss, "acc": te_acc, "f1": te_f1}

the run grid function is to experiment models with different hyperparameters

In [35]:
def run_grid_tfidfseq(make_model_fn, train_loader, val_loader, test_loader, device, grid):
    results = []
    for cfg in grid:
        print("\n" + "="*70)
        print("CONFIG:", cfg)

        model = make_model_fn(cfg)

        model, _ = fit_model_tfidfseq(
            model, train_loader, val_loader, device,
            epochs=cfg.get("epochs", 6),
            lr=cfg.get("lr", 1e-3),
            weight_decay=cfg.get("weight_decay", 0.0),
            patience=cfg.get("patience", 2),
            grad_clip=cfg.get("grad_clip", 1.0),
        )

        test_m = test_model_tfidfseq(model, test_loader, device)

        results.append({**cfg,
                        "test_loss": test_m["loss"],
                        "test_acc": test_m["acc"],
                        "test_f1": test_m["f1"]})
    return results

In [36]:
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

class SeqDataset(Dataset):
    def __init__(self, df, x_col="seq_pad", y_col="label"):
        self.X = df[x_col].tolist()
        self.y = df[y_col].astype(int).tolist()

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        x = torch.tensor(self.X[idx], dtype=torch.long)
        y = torch.tensor(self.y[idx], dtype=torch.float32)
        return x, y

def make_loaders(train_df, val_df, test_df, batch_size=64, num_workers=0):
    train_loader = DataLoader(
        SeqDataset(train_df), batch_size=batch_size, shuffle=True,
        num_workers=num_workers, pin_memory=True
    )
    val_loader = DataLoader(
        SeqDataset(val_df), batch_size=batch_size, shuffle=False,
        num_workers=num_workers, pin_memory=True
    )
    test_loader = DataLoader(
        SeqDataset(test_df), batch_size=batch_size, shuffle=False,
        num_workers=num_workers, pin_memory=True
    )
    return train_loader, val_loader, test_loader


#Metrics
@torch.no_grad()
def logits_to_metrics(logits: torch.Tensor, y: torch.Tensor, threshold=0.5):
    """
    logits: [N] (raw outputs before sigmoid)
    y     : [N] (0/1 floats)
    """
    probs = torch.sigmoid(logits)
    preds = (probs >= threshold).long()
    y_int = y.long()

    acc = (preds == y_int).float().mean().item()

    # confusion matrix counts: TP, FP, TN, FN
    tp = ((preds == 1) & (y_int == 1)).sum().item()
    fp = ((preds == 1) & (y_int == 0)).sum().item()
    tn = ((preds == 0) & (y_int == 0)).sum().item()
    fn = ((preds == 0) & (y_int == 1)).sum().item()

    precision = tp / (tp + fp + 1e-12)
    recall    = tp / (tp + fn + 1e-12)
    f1        = 2 * precision * recall / (precision + recall + 1e-12)

    return {
        "acc": acc,
        "precision": precision,
        "recall": recall,
        "f1": f1,
        "cm": np.array([[tn, fp],[fn, tp]], dtype=int)
    }



def train_one_epoch(model, loader, optimizer, criterion, device, grad_clip=None):
    model.train()
    total_loss = 0.0

    for xb, yb in loader:
        xb = xb.to(device, non_blocking=True)
        yb = yb.to(device, non_blocking=True)

        optimizer.zero_grad(set_to_none=True)
        logits = model(xb)
        logits = logits.view(-1)
        loss = criterion(logits, yb)

        loss.backward()
        if grad_clip is not None:
            torch.nn.utils.clip_grad_norm_(model.parameters(), grad_clip)

        optimizer.step()
        total_loss += loss.item() * xb.size(0)

    return total_loss / len(loader.dataset)


@torch.no_grad()
def eval_one_epoch(model, loader, criterion, device):
    model.eval()
    total_loss = 0.0
    all_logits = []
    all_y = []

    for xb, yb in loader:
        xb = xb.to(device, non_blocking=True)
        yb = yb.to(device, non_blocking=True)

        logits = model(xb).view(-1)
        loss = criterion(logits, yb)

        total_loss += loss.item() * xb.size(0)
        all_logits.append(logits.detach().cpu())
        all_y.append(yb.detach().cpu())

    all_logits = torch.cat(all_logits, dim=0)
    all_y = torch.cat(all_y, dim=0)

    metrics = logits_to_metrics(all_logits, all_y)
    metrics["loss"] = total_loss / len(loader.dataset)
    return metrics


# Full fit loop (with early stopping on val F1)
def fit_model(
    model,
    train_loader,
    val_loader,
    device,
    epochs=6,
    lr=1e-3,
    weight_decay=0.0,
    patience=2,
    grad_clip=1.0,
    pos_weight=None,  # set if dataset is imbalanced
):
    model = model.to(device)

    if pos_weight is not None:
        criterion = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([pos_weight], device=device))
    else:
        criterion = nn.BCEWithLogitsLoss()

    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)

    history = []
    best_state = None
    best_f1 = -1.0
    bad_epochs = 0

    for epoch in range(1, epochs + 1):
        train_loss = train_one_epoch(model, train_loader, optimizer, criterion, device, grad_clip=grad_clip)
        val_metrics = eval_one_epoch(model, val_loader, criterion, device)

        row = {
            "epoch": epoch,
            "train_loss": train_loss,
            "val_loss": val_metrics["loss"],
            "val_acc": val_metrics["acc"],
            "val_precision": val_metrics["precision"],
            "val_recall": val_metrics["recall"],
            "val_f1": val_metrics["f1"],
        }
        history.append(row)

        print(
            f"Epoch {epoch}/{epochs} | "
            f"train loss {train_loss:.4f} | "
            f"val loss {val_metrics['loss']:.4f} | "
            f"val acc {val_metrics['acc']:.4f} | "
            f"val f1 {val_metrics['f1']:.4f}"
        )

        if val_metrics["f1"] > best_f1:
            best_f1 = val_metrics["f1"]
            best_state = {k: v.detach().cpu().clone() for k, v in model.state_dict().items()}
            bad_epochs = 0
        else:
            bad_epochs += 1
            if bad_epochs >= patience:
                print(f"Early stopping (no val F1 improvement for {patience} epochs).")
                break

    if best_state is not None:
        model.load_state_dict(best_state)

    return model, history


@torch.no_grad()
def test_model(model, test_loader, device):
    criterion = nn.BCEWithLogitsLoss()
    metrics = eval_one_epoch(model.to(device), test_loader, criterion, device)
    print("\nTEST:")
    print(f"loss={metrics['loss']:.4f} acc={metrics['acc']:.4f} "
          f"prec={metrics['precision']:.4f} rec={metrics['recall']:.4f} f1={metrics['f1']:.4f}")
    print("Confusion matrix [[TN, FP],[FN, TP]]:\n", metrics["cm"])
    return metrics




#**TRAINING**:

In [37]:

#GRU with word2vec embeddings
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader, val_loader, test_loader = make_loaders(train_df, val_df, test_df, batch_size=64)

model = GRU_W2V(hidden_dim=128)
model, history = fit_model(model, train_loader, val_loader, device, epochs=6, lr=1e-3, patience=2)
test_metrics = test_model(model, test_loader, device)


def run_grid(make_model_fn, train_loader, val_loader, test_loader, device, grid):
    results = []
    for cfg in grid:
        print("\n" + "="*70)
        print("CONFIG:", cfg)

        model = make_model_fn(cfg)
        model, _ = fit_model(
            model, train_loader, val_loader, device,
            epochs=cfg.get("epochs", 6),
            lr=cfg.get("lr", 1e-3),
            patience=cfg.get("patience", 2),
            grad_clip=cfg.get("grad_clip", 1.0),
            weight_decay=cfg.get("weight_decay", 0.0),
        )
        test_m = test_model(model, test_loader, device)

        results.append({
            **cfg,
            "test_loss": test_m["loss"],
            "test_acc": test_m["acc"],
            "test_precision": test_m["precision"],
            "test_recall": test_m["recall"],
            "test_f1": test_m["f1"],
        })
    return results


grid = [
    {"hidden_dim": 128, "lr": 1e-3, "epochs": 6},
    {"hidden_dim": 256, "lr": 1e-3, "epochs": 6},
    {"hidden_dim": 128, "lr": 5e-4, "epochs": 6},
    {"hidden_dim": 128, "lr": 1e-3, "epochs": 6, "weight_decay": 1e-5},
]
results = run_grid(
    make_model_fn=lambda cfg: GRU_W2V(hidden_dim=cfg["hidden_dim"]),
    train_loader=train_loader, val_loader=val_loader, test_loader=test_loader,
    device=device, grid=grid
)


Epoch 1/6 | train loss 0.6622 | val loss 0.6456 | val acc 0.6201 | val f1 0.4991
Epoch 2/6 | train loss 0.4288 | val loss 0.3245 | val acc 0.8594 | val f1 0.8471
Epoch 3/6 | train loss 0.2490 | val loss 0.2508 | val acc 0.8971 | val f1 0.8969
Epoch 4/6 | train loss 0.1600 | val loss 0.2630 | val acc 0.9011 | val f1 0.9026
Epoch 5/6 | train loss 0.0955 | val loss 0.3348 | val acc 0.8895 | val f1 0.8855
Epoch 6/6 | train loss 0.0535 | val loss 0.4382 | val acc 0.8925 | val f1 0.8904
Early stopping (no val F1 improvement for 2 epochs).

TEST:
loss=0.2577 acc=0.9011 prec=0.8895 rec=0.9160 f1=0.9026
Confusion matrix [[TN, FP],[FN, TP]]:
 [[4431  569]
 [ 420 4580]]

CONFIG: {'hidden_dim': 128, 'lr': 0.001, 'epochs': 6}
Epoch 1/6 | train loss 0.6180 | val loss 0.5026 | val acc 0.7909 | val f1 0.7892
Epoch 2/6 | train loss 0.4153 | val loss 0.2905 | val acc 0.8790 | val f1 0.8810
Epoch 3/6 | train loss 0.2425 | val loss 0.2513 | val acc 0.8935 | val f1 0.8927
Epoch 4/6 | train loss 0.1571 | va

In [38]:
import pandas as pd
pd.DataFrame(results)

Unnamed: 0,hidden_dim,lr,epochs,test_loss,test_acc,test_precision,test_recall,test_f1,weight_decay
0,128,0.001,6,0.250119,0.8999,0.90191,0.8974,0.899649,
1,256,0.001,6,0.245872,0.8974,0.900443,0.8936,0.897009,
2,128,0.0005,6,0.266774,0.8925,0.876896,0.9132,0.89468,
3,128,0.001,6,0.266466,0.8989,0.879688,0.9242,0.901395,1e-05


In [39]:
#GRU with glove
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader, val_loader, test_loader = make_loaders(train_df, val_df, test_df, batch_size=64)

model = GRU_GloVe(hidden_dim=128)
model, history = fit_model(model, train_loader, val_loader, device, epochs=6, lr=1e-3, patience=2)
test_metrics = test_model(model, test_loader, device)


def run_grid(make_model_fn, train_loader, val_loader, test_loader, device, grid):
    results = []
    for cfg in grid:
        print("\n" + "="*70)
        print("CONFIG:", cfg)

        model = make_model_fn(cfg)
        model, _ = fit_model(
            model, train_loader, val_loader, device,
            epochs=cfg.get("epochs", 6),
            lr=cfg.get("lr", 1e-3),
            patience=cfg.get("patience", 2),
            grad_clip=cfg.get("grad_clip", 1.0),
            weight_decay=cfg.get("weight_decay", 0.0),
        )
        test_m = test_model(model, test_loader, device)

        results.append({
            **cfg,
            "test_loss": test_m["loss"],
            "test_acc": test_m["acc"],
            "test_precision": test_m["precision"],
            "test_recall": test_m["recall"],
            "test_f1": test_m["f1"],
        })
    return results


grid = [
    {"hidden_dim": 128, "lr": 1e-3, "epochs": 6},
    {"hidden_dim": 256, "lr": 1e-3, "epochs": 6},
    {"hidden_dim": 128, "lr": 5e-4, "epochs": 6},
    {"hidden_dim": 128, "lr": 1e-3, "epochs": 6, "weight_decay": 1e-5},
]
results = run_grid(
    make_model_fn=lambda cfg: GRU_GloVe(hidden_dim=cfg["hidden_dim"]),
    train_loader=train_loader, val_loader=val_loader, test_loader=test_loader,
    device=device, grid=grid
)


Epoch 1/6 | train loss 0.6280 | val loss 0.4138 | val acc 0.8411 | val f1 0.8469
Epoch 2/6 | train loss 0.3107 | val loss 0.2622 | val acc 0.8940 | val f1 0.8978
Epoch 3/6 | train loss 0.1828 | val loss 0.2536 | val acc 0.8950 | val f1 0.8948
Epoch 4/6 | train loss 0.1123 | val loss 0.3002 | val acc 0.8880 | val f1 0.8926
Early stopping (no val F1 improvement for 2 epochs).

TEST:
loss=0.2700 acc=0.8898 prec=0.8680 rec=0.9194 f1=0.8930
Confusion matrix [[TN, FP],[FN, TP]]:
 [[4301  699]
 [ 403 4597]]

CONFIG: {'hidden_dim': 128, 'lr': 0.001, 'epochs': 6}
Epoch 1/6 | train loss 0.6389 | val loss 0.4918 | val acc 0.7939 | val f1 0.8142
Epoch 2/6 | train loss 0.3436 | val loss 0.2730 | val acc 0.8870 | val f1 0.8857
Epoch 3/6 | train loss 0.2000 | val loss 0.2741 | val acc 0.8855 | val f1 0.8915
Epoch 4/6 | train loss 0.1258 | val loss 0.2813 | val acc 0.8882 | val f1 0.8868
Epoch 5/6 | train loss 0.0732 | val loss 0.3487 | val acc 0.8856 | val f1 0.8853
Early stopping (no val F1 improvem

In [40]:
pd.DataFrame(results)

Unnamed: 0,hidden_dim,lr,epochs,test_loss,test_acc,test_precision,test_recall,test_f1,weight_decay
0,128,0.001,6,0.27527,0.8894,0.852973,0.941,0.894827,
1,256,0.001,6,0.259081,0.899,0.902705,0.8944,0.898533,
2,128,0.0005,6,0.277446,0.8925,0.893997,0.8906,0.892295,
3,128,0.001,6,0.270923,0.891,0.871532,0.9172,0.893783,1e-05


In [41]:
#RNN WITH W2V
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader, val_loader, test_loader = make_loaders(train_df, val_df, test_df, batch_size=64)

model = RNN_W2V(hidden_dim=128)
model, history = fit_model(model, train_loader, val_loader, device, epochs=6, lr=1e-3, patience=2)
test_metrics = test_model(model, test_loader, device)


def run_grid(make_model_fn, train_loader, val_loader, test_loader, device, grid):
    results = []
    for cfg in grid:
        print("\n" + "="*70)
        print("CONFIG:", cfg)

        model = make_model_fn(cfg)
        model, _ = fit_model(
            model, train_loader, val_loader, device,
            epochs=cfg.get("epochs", 6),
            lr=cfg.get("lr", 1e-3),
            patience=cfg.get("patience", 2),
            grad_clip=cfg.get("grad_clip", 1.0),
            weight_decay=cfg.get("weight_decay", 0.0),
        )
        test_m = test_model(model, test_loader, device)

        results.append({
            **cfg,
            "test_loss": test_m["loss"],
            "test_acc": test_m["acc"],
            "test_precision": test_m["precision"],
            "test_recall": test_m["recall"],
            "test_f1": test_m["f1"],
        })
    return results


grid = [
    {"hidden_dim": 128, "lr": 1e-3, "epochs": 6},
    {"hidden_dim": 256, "lr": 1e-3, "epochs": 6},
    {"hidden_dim": 128, "lr": 5e-4, "epochs": 6},
    {"hidden_dim": 128, "lr": 1e-3, "epochs": 6, "weight_decay": 1e-5},
]
results = run_grid(
    make_model_fn=lambda cfg: RNN_W2V(hidden_dim=cfg["hidden_dim"]),
    train_loader=train_loader, val_loader=val_loader, test_loader=test_loader,
    device=device, grid=grid
)


Epoch 1/6 | train loss 0.6940 | val loss 0.7010 | val acc 0.4990 | val f1 0.5613
Epoch 2/6 | train loss 0.6896 | val loss 0.6845 | val acc 0.5470 | val f1 0.6565
Epoch 3/6 | train loss 0.6910 | val loss 0.6877 | val acc 0.5359 | val f1 0.6318
Epoch 4/6 | train loss 0.6784 | val loss 0.6504 | val acc 0.6480 | val f1 0.6393
Early stopping (no val F1 improvement for 2 epochs).

TEST:
loss=0.6877 acc=0.5368 prec=0.5225 rec=0.8562 f1=0.6489
Confusion matrix [[TN, FP],[FN, TP]]:
 [[1087 3913]
 [ 719 4281]]

CONFIG: {'hidden_dim': 128, 'lr': 0.001, 'epochs': 6}
Epoch 1/6 | train loss 0.6916 | val loss 0.6866 | val acc 0.5246 | val f1 0.3424
Epoch 2/6 | train loss 0.6822 | val loss 0.6805 | val acc 0.5476 | val f1 0.3776
Epoch 3/6 | train loss 0.6747 | val loss 0.6793 | val acc 0.5431 | val f1 0.5333
Epoch 4/6 | train loss 0.6574 | val loss 0.6772 | val acc 0.6087 | val f1 0.4720
Epoch 5/6 | train loss 0.6132 | val loss 0.6452 | val acc 0.6647 | val f1 0.6843
Epoch 6/6 | train loss 0.5819 | va

In [42]:
pd.DataFrame(results)

Unnamed: 0,hidden_dim,lr,epochs,test_loss,test_acc,test_precision,test_recall,test_f1,weight_decay
0,128,0.001,6,0.649062,0.6614,0.644107,0.7214,0.680566,
1,256,0.001,6,0.690358,0.5123,0.509867,0.6356,0.565833,
2,128,0.0005,6,0.608887,0.7055,0.676093,0.789,0.728196,
3,128,0.001,6,0.663791,0.6186,0.616183,0.629,0.622526,1e-05


In [43]:
#RNN WITH GLOVE
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader, val_loader, test_loader = make_loaders(train_df, val_df, test_df, batch_size=64)

model = RNN_GloVe(hidden_dim=128)
model, history = fit_model(model, train_loader, val_loader, device, epochs=6, lr=1e-3, patience=2)
test_metrics = test_model(model, test_loader, device)


def run_grid(make_model_fn, train_loader, val_loader, test_loader, device, grid):
    results = []
    for cfg in grid:
        print("\n" + "="*70)
        print("CONFIG:", cfg)

        model = make_model_fn(cfg)
        model, _ = fit_model(
            model, train_loader, val_loader, device,
            epochs=cfg.get("epochs", 6),
            lr=cfg.get("lr", 1e-3),
            patience=cfg.get("patience", 2),
            grad_clip=cfg.get("grad_clip", 1.0),
            weight_decay=cfg.get("weight_decay", 0.0),
        )
        test_m = test_model(model, test_loader, device)

        results.append({
            **cfg,
            "test_loss": test_m["loss"],
            "test_acc": test_m["acc"],
            "test_precision": test_m["precision"],
            "test_recall": test_m["recall"],
            "test_f1": test_m["f1"],
        })
    return results


grid = [
    {"hidden_dim": 128, "lr": 1e-3, "epochs": 6},
    {"hidden_dim": 256, "lr": 1e-3, "epochs": 6},
    {"hidden_dim": 128, "lr": 5e-4, "epochs": 6},
    {"hidden_dim": 128, "lr": 1e-3, "epochs": 6, "weight_decay": 1e-5},
]
results = run_grid(
    make_model_fn=lambda cfg: RNN_GloVe(hidden_dim=cfg["hidden_dim"]),
    train_loader=train_loader, val_loader=val_loader, test_loader=test_loader,
    device=device, grid=grid
)


Epoch 1/6 | train loss 0.6936 | val loss 0.6948 | val acc 0.5115 | val f1 0.0995
Epoch 2/6 | train loss 0.6929 | val loss 0.6927 | val acc 0.5067 | val f1 0.3516
Epoch 3/6 | train loss 0.6882 | val loss 0.6939 | val acc 0.5139 | val f1 0.6447
Epoch 4/6 | train loss 0.6718 | val loss 0.7037 | val acc 0.5090 | val f1 0.1717
Epoch 5/6 | train loss 0.6411 | val loss 0.7027 | val acc 0.5268 | val f1 0.5467
Early stopping (no val F1 improvement for 2 epochs).

TEST:
loss=0.6919 acc=0.5131 prec=0.5075 rec=0.8820 f1=0.6443
Confusion matrix [[TN, FP],[FN, TP]]:
 [[ 721 4279]
 [ 590 4410]]

CONFIG: {'hidden_dim': 128, 'lr': 0.001, 'epochs': 6}
Epoch 1/6 | train loss 0.6914 | val loss 0.6955 | val acc 0.5045 | val f1 0.6668
Epoch 2/6 | train loss 0.6905 | val loss 0.6911 | val acc 0.5335 | val f1 0.6057
Epoch 3/6 | train loss 0.6886 | val loss 0.6986 | val acc 0.5312 | val f1 0.5447
Early stopping (no val F1 improvement for 2 epochs).

TEST:
loss=0.6955 acc=0.5040 prec=0.5020 rec=0.9922 f1=0.6667

In [44]:
pd.DataFrame(results)

Unnamed: 0,hidden_dim,lr,epochs,test_loss,test_acc,test_precision,test_recall,test_f1,weight_decay
0,128,0.001,6,0.695475,0.504,0.502024,0.9922,0.666711,
1,256,0.001,6,0.691791,0.5157,0.509101,0.8782,0.64455,
2,128,0.0005,6,0.623915,0.6978,0.670694,0.7772,0.72003,
3,128,0.001,6,0.692689,0.5089,0.506115,0.7366,0.599984,1e-05


In [45]:
#BIGRU WITH W2V
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader, val_loader, test_loader = make_loaders(train_df, val_df, test_df, batch_size=64)

model = BiGRU_W2V(hidden_dim=128)
model, history = fit_model(model, train_loader, val_loader, device, epochs=6, lr=1e-3, patience=2)
test_metrics = test_model(model, test_loader, device)


def run_grid(make_model_fn, train_loader, val_loader, test_loader, device, grid):
    results = []
    for cfg in grid:
        print("\n" + "="*70)
        print("CONFIG:", cfg)

        model = make_model_fn(cfg)
        model, _ = fit_model(
            model, train_loader, val_loader, device,
            epochs=cfg.get("epochs", 6),
            lr=cfg.get("lr", 1e-3),
            patience=cfg.get("patience", 2),
            grad_clip=cfg.get("grad_clip", 1.0),
            weight_decay=cfg.get("weight_decay", 0.0),
        )
        test_m = test_model(model, test_loader, device)

        results.append({
            **cfg,
            "test_loss": test_m["loss"],
            "test_acc": test_m["acc"],
            "test_precision": test_m["precision"],
            "test_recall": test_m["recall"],
            "test_f1": test_m["f1"],
        })
    return results


grid = [
    {"hidden_dim": 128, "lr": 1e-3, "epochs": 6},
    {"hidden_dim": 256, "lr": 1e-3, "epochs": 6},
    {"hidden_dim": 128, "lr": 5e-4, "epochs": 6},
    {"hidden_dim": 128, "lr": 1e-3, "epochs": 6, "weight_decay": 1e-5},
]
results = run_grid(
    make_model_fn=lambda cfg: BiGRU_W2V(hidden_dim=cfg["hidden_dim"]),
    train_loader=train_loader, val_loader=val_loader, test_loader=test_loader,
    device=device, grid=grid
)


Epoch 1/6 | train loss 0.4669 | val loss 0.3020 | val acc 0.8740 | val f1 0.8757
Epoch 2/6 | train loss 0.2758 | val loss 0.2541 | val acc 0.8955 | val f1 0.8938
Epoch 3/6 | train loss 0.1819 | val loss 0.2670 | val acc 0.8917 | val f1 0.8876
Epoch 4/6 | train loss 0.1084 | val loss 0.3353 | val acc 0.8820 | val f1 0.8894
Early stopping (no val F1 improvement for 2 epochs).

TEST:
loss=0.2637 acc=0.8920 prec=0.9087 rec=0.8716 f1=0.8898
Confusion matrix [[TN, FP],[FN, TP]]:
 [[4562  438]
 [ 642 4358]]

CONFIG: {'hidden_dim': 128, 'lr': 0.001, 'epochs': 6}
Epoch 1/6 | train loss 0.4897 | val loss 0.3218 | val acc 0.8646 | val f1 0.8594
Epoch 2/6 | train loss 0.2725 | val loss 0.2588 | val acc 0.8955 | val f1 0.8998
Epoch 3/6 | train loss 0.1832 | val loss 0.2690 | val acc 0.8972 | val f1 0.9006
Epoch 4/6 | train loss 0.1081 | val loss 0.2733 | val acc 0.8989 | val f1 0.9001
Epoch 5/6 | train loss 0.0581 | val loss 0.3802 | val acc 0.8920 | val f1 0.8920
Early stopping (no val F1 improvem

In [46]:
pd.DataFrame(results)

Unnamed: 0,hidden_dim,lr,epochs,test_loss,test_acc,test_precision,test_recall,test_f1,weight_decay
0,128,0.001,6,0.270001,0.8949,0.872617,0.9248,0.897951,
1,256,0.001,6,0.259355,0.8967,0.903561,0.8882,0.895814,
2,128,0.0005,6,0.272611,0.8905,0.865568,0.9246,0.894111,
3,128,0.001,6,0.265342,0.8965,0.871603,0.93,0.899855,1e-05


In [47]:
#BIGRU WITH GLOVE
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader, val_loader, test_loader = make_loaders(train_df, val_df, test_df, batch_size=64)

model = BiGRU_GloVe(hidden_dim=128)
model, history = fit_model(model, train_loader, val_loader, device, epochs=6, lr=1e-3, patience=2)
test_metrics = test_model(model, test_loader, device)


def run_grid(make_model_fn, train_loader, val_loader, test_loader, device, grid):
    results = []
    for cfg in grid:
        print("\n" + "="*70)
        print("CONFIG:", cfg)

        model = make_model_fn(cfg)
        model, _ = fit_model(
            model, train_loader, val_loader, device,
            epochs=cfg.get("epochs", 6),
            lr=cfg.get("lr", 1e-3),
            patience=cfg.get("patience", 2),
            grad_clip=cfg.get("grad_clip", 1.0),
            weight_decay=cfg.get("weight_decay", 0.0),
        )
        test_m = test_model(model, test_loader, device)

        results.append({
            **cfg,
            "test_loss": test_m["loss"],
            "test_acc": test_m["acc"],
            "test_precision": test_m["precision"],
            "test_recall": test_m["recall"],
            "test_f1": test_m["f1"],
        })
    return results


grid = [
    {"hidden_dim": 128, "lr": 1e-3, "epochs": 6},
    {"hidden_dim": 256, "lr": 1e-3, "epochs": 6},
    {"hidden_dim": 128, "lr": 5e-4, "epochs": 6},
    {"hidden_dim": 128, "lr": 1e-3, "epochs": 6, "weight_decay": 1e-5},
]
results = run_grid(
    make_model_fn=lambda cfg: BiGRU_GloVe(hidden_dim=cfg["hidden_dim"]),
    train_loader=train_loader, val_loader=val_loader, test_loader=test_loader,
    device=device, grid=grid
)


Epoch 1/6 | train loss 0.4598 | val loss 0.2928 | val acc 0.8767 | val f1 0.8794
Epoch 2/6 | train loss 0.2362 | val loss 0.2664 | val acc 0.8881 | val f1 0.8852
Epoch 3/6 | train loss 0.1497 | val loss 0.2885 | val acc 0.8926 | val f1 0.8933
Epoch 4/6 | train loss 0.0841 | val loss 0.3650 | val acc 0.8798 | val f1 0.8846
Epoch 5/6 | train loss 0.0440 | val loss 0.4632 | val acc 0.8780 | val f1 0.8814
Early stopping (no val F1 improvement for 2 epochs).

TEST:
loss=0.2888 acc=0.8959 prec=0.8958 rec=0.8960 f1=0.8959
Confusion matrix [[TN, FP],[FN, TP]]:
 [[4479  521]
 [ 520 4480]]

CONFIG: {'hidden_dim': 128, 'lr': 0.001, 'epochs': 6}
Epoch 1/6 | train loss 0.4339 | val loss 0.3549 | val acc 0.8469 | val f1 0.8622
Epoch 2/6 | train loss 0.2314 | val loss 0.2623 | val acc 0.8930 | val f1 0.8923
Epoch 3/6 | train loss 0.1474 | val loss 0.2938 | val acc 0.8866 | val f1 0.8893
Epoch 4/6 | train loss 0.0823 | val loss 0.3342 | val acc 0.8842 | val f1 0.8826
Early stopping (no val F1 improvem

In [48]:
pd.DataFrame(results)

Unnamed: 0,hidden_dim,lr,epochs,test_loss,test_acc,test_precision,test_recall,test_f1,weight_decay
0,128,0.001,6,0.263853,0.8961,0.90609,0.8838,0.894806,
1,256,0.001,6,0.271467,0.8925,0.886852,0.8998,0.893279,
2,128,0.0005,6,0.279379,0.8863,0.903152,0.8654,0.883873,
3,128,0.001,6,0.262756,0.8917,0.889906,0.894,0.891949,1e-05


#CONCLUSION:

TF-IDF vectors represent token importance within documents but do not encode semantic similarity between words. Unlike GloVe and Word2Vec, which place similar words close in embedding space, TF-IDF treats words independently. As a result, recurrent models using TF-IDF-weighted embeddings struggled to capture deeper contextual meaning, leading to lower F1-scores.


Increasing the hidden dimension improved the model’s ability to capture patterns in the text by providing a larger representational capacity. Smaller hidden sizes sometimes led to underfitting, while larger sizes generally improved training performance. However, when the hidden size became too large, validation performance occasionally decreased due to overfitting.

The learning rate strongly influenced training stability. Higher learning rates allowed the model to converge faster but sometimes caused unstable validation performance. Lower learning rates led to more stable training and slightly better generalization, although convergence was slower

Introducing dropout helped reduce overfitting by preventing the model from relying too heavily on specific hidden units. Without dropout, training performance was higher but validation performance sometimes decreased. Moderate dropout values (e.g., 0.2–0.3) generally improved generalization.

Increasing the number of epochs improved training performance initially, but after a certain point, validation performance stopped improving. This indicated overfitting, making early stopping important to prevent performance degradation on unseen data.

Weight decay helped control model complexity by penalizing large weights. Small values improved generalization slightly, while larger values caused underfitting by restricting the model too much.

Smaller batch sizes introduced more variability in gradient updates, which sometimes improved generalization. Larger batch sizes produced more stable training but did not always lead to better validation performance.