# Q2: Consider the same predict task in Q1. Now, built another prediction model using the LSTM
(Long Short-Term Memory) neural network for the same task. You should continue to use the
resampled dataset in Q1. You can use the tutorial
(INFO_617_Week_11_LSTM_for_Text_Classification.ipynb) as a reference. First, built a
prediction model that uses unidirectional LSTM layer(s). Then, change the LSTM setting to bi-
directional and replicate the training and evaluation process.
You can experiment with different hyper-parameter settings and pick a well-performing one to
use. No formal hyper-parameter tuning is required. Compare the performance of the
unidirectional LSTM and bi-directional LSTM prediction models using four metrics – accuracy,
precision, recall, and F1-score.


 Import essential libraries for reproducibility, model building (PyTorch),
 evaluation (sklearn), and numerical operations (NumPy).
Ensures consistent results and provides tools for training and reporting performance.


In [None]:
#Imports & Reproducibility
import random
import time
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

from sklearn.metrics import classification_report

In [None]:
# For Google Colab integration
import os
from google.colab import drive
from yellowbrick.cluster import SilhouetteVisualizer
from sklearn.metrics import silhouette_score


from google.colab import drive
drive.mount('/content/drive')

# For data manipulation
import pandas as pd
import numpy as np

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# import data as dataframe
file_path = '/content/drive/MyDrive/text/INFO 617 Mental Health QA_LIWC (1).csv'
df = pd.read_csv(file_path)

In [None]:
# calling head() method
df.head()

Unnamed: 0,QID,Question_English,Answer_English,Usefulness_vote,Comment_count,Received_Bonus_Yes_No,Segment,WC,Analytic,Clout,...,nonflu,filler,AllPunc,Period,Comma,QMark,Exclam,Apostro,OtherP,Emoji
0,100788023,Watching the elderly matchmaking program was v...,Hello. Let's start by talking about young peop...,2,0,0,1,359,52.91,83.28,...,0.0,0.0,18.94,6.96,8.36,0.84,0.28,2.23,0.28,0.0
1,100788009,Falling in love with someone else and breaking...,Hello! Sending you a virtual hug first. Thumbs...,2,0,0,1,199,72.83,96.9,...,0.0,0.0,12.06,5.03,5.03,0.0,0.5,0.5,1.01,0.0
2,100788013,I have been suffering from insomnia for half a...,Hello! ☆ I saw the issue you raised about inso...,8,2,1,1,611,37.89,75.35,...,0.0,0.0,15.71,4.75,7.36,0.65,0.16,1.8,0.98,0.0
3,100788013,I have been suffering from insomnia for half a...,Hello! ☆ I saw the issue you raised about inso...,8,2,0,1,611,37.89,75.35,...,0.0,0.0,15.71,4.75,7.36,0.65,0.16,1.8,0.98,0.0
4,100788013,I have been suffering from insomnia for half a...,"Hi, I just want to hug you. I can feel the kin...",6,0,0,1,587,82.45,46.1,...,0.0,0.0,16.18,6.3,4.94,1.19,0.17,0.68,2.9,0.0


# Create binary 'Bonus' column: 1 if bonus received (value ≥ 1), else 0


In [None]:
df['Bonus'] = (df['Received_Bonus_Yes_No'] >= 1).astype(int)

 Set random seed for reproducibility across NumPy, Python, and PyTorch (CPU & GPU)
 Also sets the device to GPU if available, else defaults to CPU


In [None]:
def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

set_seed(42)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", device)

Device: cuda


# STEP: Train/Validation/Test Split and Downsampling
 1. Split the dataset with stratified sampling to preserve class distribution.
 2. Downsample the majority class (Bonus=0) in the training set to match the minority class (Bonus=1).
 3. Combine and shuffle to create a balanced training dataset.
 4. Print class distribution to confirm balance.


In [None]:
from sklearn.model_selection import train_test_split
from sklearn.utils import resample
from collections import Counter

# 1. Train/Val/Test Split with stratification
train_df, temp_df = train_test_split(df, test_size=0.2, stratify=df['Bonus'], random_state=42)
val_df, test_df = train_test_split(temp_df, test_size=0.5, stratify=temp_df['Bonus'], random_state=42)

# 2. Downsample Bonus = 0 (majority) to match Bonus = 1 (minority)
pos = train_df[train_df['Bonus'] == 1]
neg = train_df[train_df['Bonus'] == 0]
neg_downsampled = resample(neg, replace=True, n_samples=len(pos), random_state=42)

# 3. Combine and shuffle
train_balanced = pd.concat([pos, neg_downsampled]).sample(frac=1, random_state=42)

# 4. Check balance
print("Balanced train distribution:\n", Counter(train_balanced['Bonus']))


Balanced train distribution:
 Counter({0: 15320, 1: 15320})


# STEP: Text Tokenization and Padding
 1. Use Keras Tokenizer to convert text into sequences with a fixed vocabulary size.
 2. Pad all sequences to the same maximum length for input consistency.
 3. Apply transformation to training, validation, and test sets.


In [None]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Set vocab size and max sequence length
MAX_WORDS = 10000
MAX_LEN = 100

# Initialize and fit tokenizer on the balanced training set
tokenizer = Tokenizer(num_words=MAX_WORDS, oov_token="<OOV>")
tokenizer.fit_on_texts(train_balanced['Answer_English'])

# Function to convert and pad sequences
def seqs(df_):
    sequences = tokenizer.texts_to_sequences(df_['Answer_English'])
    return pad_sequences(sequences, maxlen=MAX_LEN)

# Apply to train, validation, and test
X_train = seqs(train_balanced)
X_val   = seqs(val_df)
X_test  = seqs(test_df)

y_train = train_balanced['Bonus'].values
y_val   = val_df['Bonus'].values
y_test  = test_df['Bonus'].values

print("Shapes:", X_train.shape, X_val.shape, X_test.shape)


Shapes: (30640, 100) (8640, 100) (8640, 100)


# STEP: Load GloVe embeddings and build embedding matrix
 1. Download and extract GloVe 300d vectors.
 2. Create a word-to-vector dictionary from GloVe.
 3. Initialize an embedding matrix aligned with tokenizer’s word index.
 4. Convert to PyTorch tensor for model use.


In [None]:
# Download GloVe 300d
!wget -q http://nlp.stanford.edu/data/glove.6B.zip
!unzip -q glove.6B.zip glove.6B.300d.txt

# Set embedding dimensions
EMBED_DIM = 300
word2idx = tokenizer.word_index
vocab_size = min(MAX_WORDS, len(word2idx)) + 1

# Load GloVe into a dictionary
emb_index = {}
with open('glove.6B.300d.txt', 'r', encoding='utf8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        emb_index[word] = np.asarray(values[1:], dtype='float32')

# Build embedding matrix
embedding_tensor = np.zeros((vocab_size, EMBED_DIM))
for word, idx in word2idx.items():
    if idx < vocab_size and word in emb_index:
        embedding_tensor[idx] = emb_index[word]

embedding_tensor = torch.tensor(embedding_tensor, dtype=torch.float32)
print("Embedding matrix shape:", embedding_tensor.shape)


Embedding matrix shape: torch.Size([10001, 300])


 STEP: Convert padded sequences and labels into PyTorch Datasets and DataLoaders
Enables efficient batching and shuffling for training and evaluation


In [None]:
  from torch.utils.data import TensorDataset, DataLoader

  BATCH_SIZE = 32

  # Create TensorDatasets
  train_ds = TensorDataset(torch.LongTensor(X_train), torch.LongTensor(y_train))
  val_ds   = TensorDataset(torch.LongTensor(X_val),   torch.LongTensor(y_val))
  test_ds  = TensorDataset(torch.LongTensor(X_test),  torch.LongTensor(y_test))

  # Create DataLoaders
  train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True)
  val_loader   = DataLoader(val_ds,   batch_size=BATCH_SIZE)
  test_loader  = DataLoader(test_ds,  batch_size=BATCH_SIZE)


# STEP: Define the LSTM-based text classification model
 Supports both unidirectional and bidirectional LSTMs, with optional pretrained embeddings.
 The final hidden state is passed through a dropout and a fully connected layer for binary classification.


In [None]:
import torch.nn as nn

class LSTM_NLP(nn.Module):
    """LSTM model for text classification."""
    def __init__(self,
                 hidden_dim,
                 output_dim,
                 n_layers=2,
                 bidirectional=False,
                 lstm_dropout=0.5,
                 pretrained_embedding=None,
                 freeze_embedding=False,
                 vocab_size=None,
                 embed_dim=300,
                 fc_dropout=0.5):
        super().__init__()

        # Embedding layer
        if pretrained_embedding is not None:
            self.embedding = nn.Embedding.from_pretrained(
                pretrained_embedding, freeze=freeze_embedding)
            self.embed_dim = pretrained_embedding.size(1)
        else:
            self.embed_dim = embed_dim
            self.embedding = nn.Embedding(vocab_size, self.embed_dim, padding_idx=0, max_norm=5.0)

        # LSTM layer
        self.lstm = nn.LSTM(input_size=self.embed_dim,
                            hidden_size=hidden_dim,
                            num_layers=n_layers,
                            bidirectional=bidirectional,
                            dropout=lstm_dropout,
                            batch_first=True)

        # Fully connected layer
        lstm_output_dim = hidden_dim * 2 if bidirectional else hidden_dim
        self.fc = nn.Linear(lstm_output_dim, output_dim)
        self.dropout = nn.Dropout(fc_dropout)

    def forward(self, input_ids):
        embedded = self.embedding(input_ids).float()  # shape: (B, L, E)
        lstm_out, (hidden, _) = self.lstm(embedded)
        if self.lstm.bidirectional:
            # concatenate last hidden states of both directions
            hidden = torch.cat((hidden[-2], hidden[-1]), dim=1)
        else:
            hidden = hidden[-1]
        hidden = self.dropout(hidden)
        return self.fc(hidden)


 STEP: Initialize the LSTM model with specified architecture and training parameters
Supports pretrained or randomly initialized embeddings and returns the model with Adadelta optimizer


In [None]:
import torch.optim as optim

def initialize_lstm(pretrained_embedding=None,
                    freeze_embedding=False,
                    vocab_size=None,
                    embed_dim=300,
                    hidden_dim=128,
                    output_dim=2,
                    n_layers=2,
                    bidirectional=False,
                    lstm_dropout=0.3,
                    fc_dropout=0.3,
                    learning_rate=0.25):
    model = LSTM_NLP(hidden_dim=hidden_dim,
                     output_dim=output_dim,
                     n_layers=n_layers,
                     bidirectional=bidirectional,
                     lstm_dropout=lstm_dropout,
                     pretrained_embedding=pretrained_embedding,
                     freeze_embedding=freeze_embedding,
                     vocab_size=vocab_size,
                     embed_dim=embed_dim,
                     fc_dropout=fc_dropout).to(device)

    optimizer = optim.Adadelta(model.parameters(), lr=learning_rate, rho=0.95)
    return model, optimizer


# STEP: Define training and evaluation loops for the LSTM model
 - `train_lstm()` trains the model for a given number of epochs, logging training/validation loss and accuracy
 - `evaluate()` calculates classification accuracy on any given dataset


In [None]:
import torch.nn.functional as F
from sklearn.metrics import classification_report

loss_fn = nn.CrossEntropyLoss()

def evaluate(model, loader):
    model.eval()
    correct, total = 0, 0
    with torch.no_grad():
        for xb, yb in loader:
            xb, yb = xb.to(device), yb.to(device)
            preds = model(xb).argmax(dim=1)
            correct += (preds == yb).sum().item()
            total += yb.size(0)
    return 100 * correct / total

def train_lstm(model, optimizer, train_loader, val_loader=None, epochs=10):
    best_acc = 0.0
    print("Epoch | Train Loss  | Val Loss   | Val Acc  | Time(s)")
    print("-"*60)
    for epoch in range(1, epochs+1):
        t0, train_loss = time.time(), 0.0
        model.train()
        for xb, yb in train_loader:
            xb, yb = xb.to(device), yb.to(device)
            optimizer.zero_grad()
            logits = model(xb)
            loss = loss_fn(logits, yb)
            loss.backward()
            optimizer.step()
            train_loss += loss.item()
        train_loss /= len(train_loader)

        if val_loader is not None:
            model.eval()
            val_loss, correct = 0.0, 0
            with torch.no_grad():
                for xb, yb in val_loader:
                    xb, yb = xb.to(device), yb.to(device)
                    out = model(xb)
                    val_loss += loss_fn(out, yb).item()
                    preds = out.argmax(dim=1)
                    correct += (preds == yb).sum().item()
            val_loss /= len(val_loader)
            val_acc = 100 * correct / len(val_loader.dataset)
        else:
            val_loss, val_acc = float('nan'), float('nan')

        print(f"{epoch:^5} | {train_loss:^10.4f} | {val_loss:^10.4f} | {val_acc:^8.2f} | {time.time()-t0:^7.2f}")
        best_acc = max(best_acc, val_acc)

    print(f"\n Best Val Acc: {best_acc:.2f}%\n")


 STEP: Initialize and train a unidirectional LSTM model with pretrained GloVe embeddings
Embeddings are fine-tuned during training. Model is trained for 10 epochs and validated after each epoch.


In [None]:

uni_lstm, uni_optimizer = initialize_lstm(
    pretrained_embedding=embedding_tensor,
    freeze_embedding=False,
    vocab_size=vocab_size,
    embed_dim=EMBED_DIM,
    hidden_dim=128,
    output_dim=2,
    n_layers=2,
    bidirectional=False,
    lstm_dropout=0.2,
    fc_dropout=0.5,
    learning_rate=0.25
)

# Train the model
train_lstm(uni_lstm, uni_optimizer, train_loader, val_loader, epochs=10)


Epoch | Train Loss  | Val Loss   | Val Acc  | Time(s)
------------------------------------------------------------
  1   |   0.6756   |   0.6192   |  65.02   |  7.63  
  2   |   0.6366   |   0.6602   |  55.56   |  6.05  
  3   |   0.6200   |   0.6252   |  59.00   |  6.34  
  4   |   0.6081   |   0.6423   |  57.64   |  6.59  
  5   |   0.5958   |   0.6252   |  61.08   |  6.38  
  6   |   0.5798   |   0.6231   |  58.95   |  6.14  
  7   |   0.5689   |   0.5671   |  65.43   |  6.41  
  8   |   0.5578   |   0.5399   |  67.80   |  6.18  
  9   |   0.5476   |   0.5296   |  70.28   |  6.48  
 10   |   0.5385   |   0.5330   |  69.06   |  6.25  

 Best Val Acc: 70.28%



#  Uni-LSTM Performance Summary:
 - Best validation accuracy: **70.28%** (Epoch 9)
 - Gradual improvement over epochs shows the model is learning well
 - Slight overfitting in early epochs was corrected with steady training

#  Next Step:
 Try a Bi-LSTM to see if capturing context in both directions improves performance further.


 STEP: Initialize and train a Bi-directional LSTM model using pretrained GloVe embeddings
 Bi-directional LSTM captures context from both past and future tokens.
Embeddings are trainable. Training runs for 10 epochs with validation after each epoch.


In [None]:
# Instantiate Bi-directional LSTM
bi_lstm, bi_optimizer = initialize_lstm(
    pretrained_embedding=embedding_tensor,  # same fastText embeddings
    freeze_embedding=False,                 # allow embeddings to be fine-tuned
    vocab_size=vocab_size,
    embed_dim=EMBED_DIM,
    hidden_dim=128,
    output_dim=2,
    n_layers=2,
    bidirectional=True,                     # This enables bi-directional LSTM
    lstm_dropout=0.2,
    fc_dropout=0.5,
    learning_rate=0.25
)

# Train Bi-LSTM
train_lstm(bi_lstm, bi_optimizer, train_loader, val_loader, epochs=10)


Epoch | Train Loss  | Val Loss   | Val Acc  | Time(s)
------------------------------------------------------------
  1   |   0.6695   |   0.6283   |  61.34   |  11.09 
  2   |   0.6189   |   0.5891   |  62.40   |  10.92 
  3   |   0.5992   |   0.6210   |  59.59   |  11.12 
  4   |   0.5898   |   0.5216   |  70.73   |  11.25 
  5   |   0.5786   |   0.4905   |  74.26   |  11.30 
  6   |   0.5681   |   0.5589   |  66.05   |  11.20 
  7   |   0.5588   |   0.6585   |  57.18   |  11.01 
  8   |   0.5500   |   0.5350   |  68.50   |  10.96 
  9   |   0.5393   |   0.6475   |  59.53   |  11.04 
 10   |   0.5294   |   0.6213   |  61.02   |  11.07 

 Best Val Acc: 74.26%



#  Bi-LSTM Performance Summary:
- Best validation accuracy: **74.26%** (Epoch 5)
 - Outperformed Uni-LSTM (70.28%) in capturing contextual features
 - Validation accuracy fluctuated a bit — possibly due to overfitting after epoch 5
 - Still showed stronger peak performance overall

#  Takeaway:
 Bi-directional LSTM provides a richer understanding of sequence context,
 which can lead to improved classification accuracy — especially on tasks involving nuanced language.


In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import numpy as np # make sure numpy is imported

def evaluate_model(model, dataloader, name="Model"):
    model.eval()
    all_preds, all_labels = [], []

    with torch.no_grad():
        for xb, yb in dataloader:
            xb = xb.to(device)
            # Ensure preds is on the CPU before converting to NumPy
            # Move the tensor to CPU before calling .numpy()
            preds = model(xb).argmax(dim=1).cpu().detach().numpy()
            all_preds.extend(preds)
            all_labels.extend(yb.numpy())

    acc  = accuracy_score(all_labels, all_preds)
    prec = precision_score(all_labels, all_preds)
    rec  = recall_score(all_labels, all_preds)
    f1   = f1_score(all_labels, all_preds)

    return {
        "Model": name,
        "Accuracy": acc,
        "Precision": prec,
        "Recall": rec,
        "F1-score": f1
    }

!pip install numpy



 STEP: Evaluate and compare Uni-LSTM vs Bi-LSTM on the test set
 Calculates accuracy, precision, recall, and F1-score for both models
 Outputs a side-by-side comparison table to assess performance


In [None]:
# === Imports === #
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import torch
import numpy as np

# === Device Configuration === #
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# === Evaluation Function === #
def evaluate_model(model, dataloader, name="Model"):
    model.eval()
    all_preds, all_labels = [], []

    with torch.no_grad():
        for xb, yb in dataloader:
            xb, yb = xb.to(device), yb.to(device)

            # Predict and move to CPU → .tolist() avoids numpy errors
            preds = model(xb).argmax(dim=1).cpu().tolist()
            labels = yb.cpu().tolist()

            all_preds.extend(preds)
            all_labels.extend(labels)

    acc  = accuracy_score(all_labels, all_preds)
    prec = precision_score(all_labels, all_preds)
    rec  = recall_score(all_labels, all_preds)
    f1   = f1_score(all_labels, all_preds)

    return {
        "Model": name,
        "Accuracy": acc,
        "Precision": prec,
        "Recall": rec,
        "F1-score": f1
    }

# === Evaluate Models === #
# Ensure `uni_lstm` and `bi_lstm` are trained and on the correct device
uni_results = evaluate_model(uni_lstm, test_loader, name="Uni-LSTM")
bi_results = evaluate_model(bi_lstm, test_loader, name="Bi-LSTM")

# === Extract Metrics === #
uni_accuracy = uni_results["Accuracy"]
uni_precision = uni_results["Precision"]
uni_recall = uni_results["Recall"]
uni_f1 = uni_results["F1-score"]

bi_accuracy = bi_results["Accuracy"]
bi_precision = bi_results["Precision"]
bi_recall = bi_results["Recall"]
bi_f1 = bi_results["F1-score"]

# === Comparison Print Function === #
def compare_lstm_models():
    print("\nModel Evaluation Results on Test Set")
    print(f"{'Metric':<12}| {'Uni-LSTM':<10}| {'Bi-LSTM':<10}")
    print("-" * 38)
    print(f"{'Accuracy':<12}| {uni_accuracy:.4f}    | {bi_accuracy:.4f}")
    print(f"{'Precision':<12}| {uni_precision:.4f}    | {bi_precision:.4f}")
    print(f"{'Recall':<12}| {uni_recall:.4f}    | {bi_recall:.4f}")
    print(f"{'F1-Score':<12}| {uni_f1:.4f}    | {bi_f1:.4f}")

# === Run Comparison === #
compare_lstm_models()


Using device: cuda

Model Evaluation Results on Test Set
Metric      | Uni-LSTM  | Bi-LSTM   
--------------------------------------
Accuracy    | 0.6955    | 0.6093
Precision   | 0.3837    | 0.3381
Recall      | 0.6167    | 0.7969
F1-Score    | 0.4731    | 0.4748


 # Insights:

Uni-LSTM achieved higher accuracy and precision, making it more reliable for correctly predicting positive labels without too many false alarms.

Bi-LSTM had a significantly higher recall, meaning it was better at capturing most of the actual positive cases — even if it misclassified more negatives as positives.

F1-score was nearly identical, suggesting a trade-off: Bi-LSTM is more inclusive, Uni-LSTM is more precise.


If your task favors sensitivity (like detecting mental health risk where missing positives is costly), Bi-LSTM is the better choice.

If you need precision (e.g., minimizing false positives in resource-limited settings), Uni-LSTM is more suitable.