# Summary
In this notebook, we explored two vectorization techniques—Bag of Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF)—using both versions of our preprocessed dataset from the previous data cleaning and exploratory data analysis phase. Version 1 of our dataset had regular stop words removed, while Version 2 had additional stop words removed. We evaluated four machine learning models: Logistic Regression, Naïve Bayes, Random Forest Classifier, and a suite of SVM-based methods, including a standard SVM, Linear SVC, and a Stochastic Gradient Descent Classifier.

Our experiments revealed that the highest accuracy was achieved with the TF-IDF representation combined with regular stop word removal, particularly when using Logistic Regression and Naïve Bayes.

# Vectorisation and Experimentation of Machine Learning (ML) Techniques

In [None]:
!pip install torch_xla -f https://storage.googleapis.com/pytorch-tpu-releases/wheels/tpuvm/torch_xla.html


Looking in links: https://storage.googleapis.com/pytorch-tpu-releases/wheels/tpuvm/torch_xla.html
Collecting torch_xla
  Downloading torch_xla-2.6.1-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (21 kB)
Downloading torch_xla-2.6.1-cp311-cp311-manylinux_2_28_x86_64.whl (93.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m93.6/93.6 MB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: torch_xla
Successfully installed torch_xla-2.6.1


In [None]:
import pandas as pd
import numpy as np

import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score, classification_report, confusion_matrix

from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier

from sklearn.pipeline import Pipeline

from sklearn.svm import LinearSVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

import torch
import torch.nn as nn

import torch_xla
import torch_xla.core.xla_model as xm

import time

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)


Using device: cuda


## Data Preparation

### Mounting Data

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Loading the Data

Loading of cleaned data

In [None]:
pos_df_reg_stopwords = pd.read_csv('/content/drive/MyDrive/it1244/cleaned_pos_reviews_regular_stopwords.csv')
neg_df_reg_stopwords = pd.read_csv('/content/drive/MyDrive/it1244/cleaned_neg_reviews_regular_stopwords.csv')

pos_df_extra_stopwords = pd.read_csv('/content/drive/MyDrive/it1244/cleaned_pos_reviews_extra_stopwords.csv')
neg_df_extra_stopwords = pd.read_csv('/content/drive/MyDrive/it1244/cleaned_neg_reviews_extra_stopwords.csv')


In [None]:
print(pos_df_reg_stopwords.head())
print(neg_df_reg_stopwords.head())

print(pos_df_extra_stopwords.head())
print(neg_df_extra_stopwords.head())

    FileName                                       Cleaned_Text
0  24556.txt  favorite part film wa old man attempt cure nei...
1  22787.txt  unlike comment mine positive movie wrap around...
2  24575.txt  different topic treated film straightforward s...
3  22772.txt  year old musical comedy fantasy might look age...
4  23617.txt  important film challenge viewer encourages pay...
    FileName                                       Cleaned_Text
0  23129.txt  not even goebbels could pulled propaganda stun...
1  22912.txt  plot fizzled reeked irreconcilable difference ...
2  23622.txt  first look cover picture look like good rock n...
3  23637.txt  drama core anna display genuine truth actor ag...
4  23109.txt  magic lassie opened radio city music hall wa f...
    FileName                                       Cleaned_Text
0  24556.txt  favorite part film wa old man attempt cure nei...
1  22787.txt  unlike comment mine positive movie wrap around...
2  24575.txt  different topic treated fi

### Labelling Reviews

In [None]:
# This code labels positive and negative reviews - 1 for positive, 0 for negative, then combines them into two datasets for ease of access:
# one with regular stopwords and one with extra stopwords removed. It also prints a preview of both.

# For the regular stopwords dataframes
pos_df_reg_stopwords['label'] = 1
neg_df_reg_stopwords['label'] = 0

# Combine positive and negative into a single dataframe
df_reg_stopwords = pd.concat([pos_df_reg_stopwords, neg_df_reg_stopwords], axis=0, ignore_index=True)

# For the extra stopwords dataframes
pos_df_extra_stopwords['label'] = 1
neg_df_extra_stopwords['label'] = 0

# Combine positive and negative into a single dataframe
df_extra_stopwords = pd.concat([pos_df_extra_stopwords, neg_df_extra_stopwords], axis=0, ignore_index=True)

print("df_reg_stopwords:")
print(df_reg_stopwords.head())

print("\ndf_extra_stopwords:")
print(df_extra_stopwords.head())


df_reg_stopwords:
    FileName                                       Cleaned_Text  label
0  24556.txt  favorite part film wa old man attempt cure nei...      1
1  22787.txt  unlike comment mine positive movie wrap around...      1
2  24575.txt  different topic treated film straightforward s...      1
3  22772.txt  year old musical comedy fantasy might look age...      1
4  23617.txt  important film challenge viewer encourages pay...      1

df_extra_stopwords:
    FileName                                       Cleaned_Text  label
0  24556.txt  favorite part film wa old man attempt cure nei...      1
1  22787.txt  unlike comment mine positive movie wrap around...      1
2  24575.txt  different topic treated film straightforward s...      1
3  22772.txt  year old musical comedy fantasy might look age...      1
4  23617.txt  important film challenge viewer encourages pay...      1


## Text Vectorisation

In [None]:
# This code performs text vectorization on two datasets: one with regular stopwords and one with extra stopwords removed.
# It uses both Bag of Words (BoW) and TF-IDF approaches to convert the cleaned text data into numerical feature matrices.
# The resulting feature shapes are printed to compare the effect of stopword removal on vocabulary size and representation.


# Bag of Words (BoW) for Regular Stopwords
bow_vectorizer_reg = CountVectorizer(
    max_features=50000, # keep only the 50,000 most frequent tokens
    min_df=2, # ignore terms that appear in fewer than 2 documents
    lowercase=False, # assumes text is already lowercased
    tokenizer=lambda x: x.split(), # split the text on whitespace
    preprocessor=None, # disable default preprocessing pattern
    token_pattern=None # disable default token pattern
)
X_reg_bow = bow_vectorizer_reg.fit_transform(df_reg_stopwords['Cleaned_Text'])
y_reg = df_reg_stopwords['label']
print("Regular Stopwords - BoW shape:", X_reg_bow.shape)

# Bag of Words (BoW) for Extra Stopwords
# same as above, but with extra stopwords removed
bow_vectorizer_extra = CountVectorizer(
    max_features=50000,
    min_df=2,
    lowercase=False,
    tokenizer=lambda x: x.split(),
    preprocessor=None,
    token_pattern=None
)
X_extra_bow = bow_vectorizer_extra.fit_transform(df_extra_stopwords['Cleaned_Text'])
y_extra = df_extra_stopwords['label']
print("Extra Stopwords - BoW shape:", X_extra_bow.shape)

# TF-IDF for Regular Stopwords

# the TF-IDF vectoriser has the same parameters as stated in BoW
tfidf_vectorizer_reg = TfidfVectorizer(
    max_features=50000,
    min_df=2,
    lowercase=False,
    tokenizer=lambda x: x.split(),
    preprocessor=None,
    token_pattern=None
)
X_reg_tfidf = tfidf_vectorizer_reg.fit_transform(df_reg_stopwords['Cleaned_Text'])
print("Regular Stopwords - TF-IDF shape:", X_reg_tfidf.shape)

# TF-IDF for Extra Stopwords
tfidf_vectorizer_extra = TfidfVectorizer(
    max_features=50000,
    min_df=2,
    lowercase=False,
    tokenizer=lambda x: x.split(),
    preprocessor=None,
    token_pattern=None
)
X_extra_tfidf = tfidf_vectorizer_extra.fit_transform(df_extra_stopwords['Cleaned_Text'])
print("Extra Stopwords - TF-IDF shape:", X_extra_tfidf.shape)


Regular Stopwords - BoW shape: (50000, 50000)
Extra Stopwords - BoW shape: (50000, 50000)
Regular Stopwords - TF-IDF shape: (50000, 50000)
Extra Stopwords - TF-IDF shape: (50000, 50000)


## Function to Evaluate Model Performance

In [None]:
# This function fits and evaluates a classification model using both a train-test split and 5-fold cross-validation.
# It prints accuracy, F1 score, recall, and precision on the test set, along with cross-validated accuracy statistics.

def evaluate_model(model, X, y, name=""):

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, stratify=y, random_state=42
    )

    # Fit the model
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    # Basic metrics on hold-out test
    acc = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    rec = recall_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred)

    # 5-fold CV on entire dataset
    skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
    cv_scores = cross_val_score(model, X, y, cv=skf, scoring='accuracy', n_jobs=-1)

    print(f"\n=== {name} ===")
    print(f"Test Accuracy:    {acc:.4f}")
    print(f"Test F1 Score:    {f1:.4f}")
    print(f"Test Recall:      {rec:.4f}")
    print(f"Test Precision:   {prec:.4f}")
    print(f"5-Fold CV Accuracy: Mean = {cv_scores.mean():.4f}, Std = {cv_scores.std():.4f}")

# Model Experimentation

## Logistic Regression

In this section, we will:
- split data into 80 20 for Training and Testing respectively
- stratify data to ensure important subgroup differences are accounted for, improving the accuracy, fairness, and insightfulness of our analysis.

### BoW

In [None]:
# This code trains and evaluates a Logistic Regression model using BoW features.
# It compares performance on two datasets: one with regular stopwords and one with extra stopwords removed.

log_reg = LogisticRegression(solver='liblinear', max_iter=400, random_state=42)

evaluate_model(log_reg, X_reg_bow, y_reg, name="Logistic Regression + BoW (Regular)")
evaluate_model(log_reg, X_extra_bow, y_extra, name="Logistic Regression + BoW (Extra)")



=== Logistic Regression + BoW (Regular) ===
Test Accuracy:    0.8810
Test F1 Score:    0.8820
Test Recall:      0.8898
Test Precision:   0.8744
5-Fold CV Accuracy: Mean = 0.8820, Std = 0.0035

=== Logistic Regression + BoW (Extra) ===
Test Accuracy:    0.8803
Test F1 Score:    0.8813
Test Recall:      0.8890
Test Precision:   0.8738
5-Fold CV Accuracy: Mean = 0.8819, Std = 0.0032


### TF-IDF

In [None]:
# This code trains and evaluates a Logistic Regression model using TF-IDF features.
# It compares performance on two datasets: one with regular stopwords and one with extra stopwords removed.

evaluate_model(log_reg, X_reg_tfidf, y_reg, name="Logistic Regression + TF-IDF (Regular)")
evaluate_model(log_reg, X_extra_tfidf, y_extra, name="Logistic Regression + TF-IDF (Extra)")


=== Logistic Regression + TF-IDF (Regular) ===
Test Accuracy:    0.8940
Test F1 Score:    0.8958
Test Recall:      0.9116
Test Precision:   0.8806
5-Fold CV Accuracy: Mean = 0.8926, Std = 0.0033

=== Logistic Regression + TF-IDF (Extra) ===
Test Accuracy:    0.8932
Test F1 Score:    0.8950
Test Recall:      0.9102
Test Precision:   0.8803
5-Fold CV Accuracy: Mean = 0.8924, Std = 0.0032


## Naive Bayes

### BoW

In [None]:
# This code trains and evaluates a Naive Bayes model using BoW features.
# It compares performance on two datasets: one with regular stopwords and one with extra stopwords removed.

nb_model = MultinomialNB()

evaluate_model(nb_model, X_reg_bow, y_reg, name="Naive Bayes + BoW (Regular)")
evaluate_model(nb_model, X_extra_bow, y_extra, name="Naive Bayes + BoW (Extra)")


=== Naive Bayes + BoW (Regular) ===
Test Accuracy:    0.8541
Test F1 Score:    0.8508
Test Recall:      0.8318
Test Precision:   0.8706
5-Fold CV Accuracy: Mean = 0.8548, Std = 0.0052

=== Naive Bayes + BoW (Extra) ===
Test Accuracy:    0.8541
Test F1 Score:    0.8509
Test Recall:      0.8328
Test Precision:   0.8699
5-Fold CV Accuracy: Mean = 0.8549, Std = 0.0051


### TF-IDF

In [None]:
# This code trains and evaluates a Naive Bayes model using TF-IDF features.
# It compares performance on two datasets: one with regular stopwords and one with extra stopwords removed.

nb_model = MultinomialNB()

evaluate_model(nb_model, X_reg_tfidf, y_reg,   name="Naive Bayes + TF-IDF (Regular)")
evaluate_model(nb_model, X_extra_tfidf, y_extra, name="Naive Bayes + TF-IDF (Extra)")


=== Naive Bayes + TF-IDF (Regular) ===
Test Accuracy:    0.8640
Test F1 Score:    0.8622
Test Recall:      0.8512
Test Precision:   0.8736
5-Fold CV Accuracy: Mean = 0.8643, Std = 0.0037

=== Naive Bayes + TF-IDF (Extra) ===
Test Accuracy:    0.8639
Test F1 Score:    0.8622
Test Recall:      0.8516
Test Precision:   0.8731
5-Fold CV Accuracy: Mean = 0.8645, Std = 0.0038


## Random Forest Classifier

### BoW

In [None]:
# This code trains and evaluates a Random Forest Classifier using BoW features.
# It compares performance on two datasets: one with regular stopwords and one with extra stopwords removed.

rf_model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)

evaluate_model(rf_model, X_reg_bow, y_reg,   name="Random Forest + BoW (Regular)")
evaluate_model(rf_model, X_extra_bow, y_extra, name="Random Forest + BoW (Extra)")



=== Random Forest + BoW (Regular) ===
Test Accuracy:    0.8547
Test F1 Score:    0.8551
Test Recall:      0.8574
Test Precision:   0.8528
5-Fold CV Accuracy: Mean = 0.8523, Std = 0.0051

=== Random Forest + BoW (Extra) ===
Test Accuracy:    0.8539
Test F1 Score:    0.8541
Test Recall:      0.8554
Test Precision:   0.8528
5-Fold CV Accuracy: Mean = 0.8534, Std = 0.0046


### TF-IDF

In [None]:
# This code trains and evaluates a Random Forest Classifier using TF-IDF features.
# It compares performance on two datasets: one with regular stopwords and one with extra stopwords removed.

rf_model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)

evaluate_model(rf_model, X_reg_tfidf, y_reg,   name="Random Forest + TF-IDF (Regular)")
evaluate_model(rf_model, X_extra_tfidf, y_extra, name="Random Forest + TF-IDF (Extra)")


=== Random Forest + TF-IDF (Regular) ===
Test Accuracy:    0.8497
Test F1 Score:    0.8496
Test Recall:      0.8492
Test Precision:   0.8501
5-Fold CV Accuracy: Mean = 0.8509, Std = 0.0054

=== Random Forest + TF-IDF (Extra) ===
Test Accuracy:    0.8501
Test F1 Score:    0.8499
Test Recall:      0.8490
Test Precision:   0.8509
5-Fold CV Accuracy: Mean = 0.8515, Std = 0.0054


## SVM

#### Basic SVM Model

Basic SVM Model Code

Good for:
- Determining Probability Estimates
- Built-in multiclass OvO (One-vs-One (OvO) is a strategy for extending binary classifiers (like SVMs) to handle multi-class classification problems.)

In [None]:
# This code trains and evaluates a SVM with Linear Kernel using BoW features.
# It compares performance on two datasets: one with regular stopwords and one with extra stopwords removed.

svm_basic = make_pipeline(
    StandardScaler(with_mean = False),
    SVC(kernel='linear', C=1.0, random_state=42)
)

In [None]:
evaluate_model(svm_basic, X_reg_bow, y_reg,   name="SVM + BoW (Regular)")
evaluate_model(svm_basic, X_extra_bow, y_extra, name="SVM + BoW (Extra)")


=== SVM + BoW (Regular) ===
Test Accuracy:    0.8356
Test F1 Score:    0.8358
Test Recall:      0.8370
Test Precision:   0.8347
5-Fold CV Accuracy: Mean = 0.8328, Std = 0.0043

=== SVM + BoW (Extra) ===
Test Accuracy:    0.8337
Test F1 Score:    0.8337
Test Recall:      0.8340
Test Precision:   0.8335
5-Fold CV Accuracy: Mean = 0.8317, Std = 0.0050


In [None]:
# This code trains and evaluates a SVM with Linear Kernel with TF-IDF features.
# It compares performance on two datasets: one with regular stopwords and one with extra stopwords removed.

evaluate_model(svm_basic, X_reg_tfidf, y_reg,   name="SVM + TF-IDF (Regular)")
evaluate_model(svm_basic, X_extra_tfidf, y_extra, name="SVM + TF-IDF (Extra)")


=== SVM + TF-IDF (Regular) ===
Test Accuracy:    0.8276
Test F1 Score:    0.8285
Test Recall:      0.8326
Test Precision:   0.8244
5-Fold CV Accuracy: Mean = 0.8333, Std = 0.0035

=== SVM + TF-IDF (Extra) ===
Test Accuracy:    0.8296
Test F1 Score:    0.8306
Test Recall:      0.8354
Test Precision:   0.8258
5-Fold CV Accuracy: Mean = 0.8324, Std = 0.0043


#### LinearSVC Variant

LinearSVC Variant

Good for:
- Analysing Large Dataset with BoW / TF-IDF
- Works well with high dimensions


We test it over a range of C to ensure that this model converges

In [None]:
# This code trains and evaluates LinearSVC using BoW and TF-IDF features, evaluating the model based on different regularization strengths
# It compares performance on two datasets: one with regular stopwords and one with extra stopwords removed.
# Each iteration is timed for comparison of time efficiency with SGDClassifier


# Loop over different regularization strengths
for c in [0.001, 0.01, 0.1]:
    print(f"\nEvaluating models with C={c}")

    # Model pipeline
    svc_model = make_pipeline(
        StandardScaler(with_mean=False),
        LinearSVC(C=c, max_iter=1_000_000, random_state=42)
    )

    # Evaluate and time each dataset separately
    for X_data, y_data, name in [
        (X_reg_bow, y_reg, f"SVM + BoW (Regular), C={c}"),
        (X_extra_bow, y_extra, f"SVM + BoW (Extra), C={c}"),
        (X_reg_tfidf, y_reg, f"SVM + TF-IDF (Regular), C={c}"),
        (X_extra_tfidf, y_extra, f"SVM + TF-IDF (Extra), C={c}")
    ]:
        start_time = time.time()
        evaluate_model(svc_model, X_data, y_data, name=name)
        elapsed = time.time() - start_time
        print(f"Time for {name}: {elapsed:.4f} seconds")



Evaluating models with C=0.001

=== SVM + BoW (Regular), C=0.001 ===
Test Accuracy:    0.8478
Test F1 Score:    0.8483
Test Recall:      0.8512
Test Precision:   0.8455
5-Fold CV Accuracy: Mean = 0.8487, Std = 0.0037
Time for SVM + BoW (Regular), C=0.001: 72.7978 seconds

=== SVM + BoW (Extra), C=0.001 ===
Test Accuracy:    0.8462
Test F1 Score:    0.8468
Test Recall:      0.8504
Test Precision:   0.8433
5-Fold CV Accuracy: Mean = 0.8468, Std = 0.0027
Time for SVM + BoW (Extra), C=0.001: 72.3999 seconds

=== SVM + TF-IDF (Regular), C=0.001 ===
Test Accuracy:    0.8348
Test F1 Score:    0.8361
Test Recall:      0.8428
Test Precision:   0.8295
5-Fold CV Accuracy: Mean = 0.8358, Std = 0.0043
Time for SVM + TF-IDF (Regular), C=0.001: 61.7305 seconds

=== SVM + TF-IDF (Extra), C=0.001 ===
Test Accuracy:    0.8310
Test F1 Score:    0.8322
Test Recall:      0.8382
Test Precision:   0.8263
5-Fold CV Accuracy: Mean = 0.8344, Std = 0.0027
Time for SVM + TF-IDF (Extra), C=0.001: 63.2731 seconds


#### Stochastic Gradient Descent (SGD) Classifier

SGDClassifier

To combat convergence failures by approximating Linear SVMs with stochastic gradient descent and is more stable for large/sparse data

- Very fast on large datasets (text, image vectors, etc.)
- Works well with sparse data (e.g., TF-IDF)
- Easy to tune with GridSearchCV

In [None]:
# This code trains and evaluates a SGDClassifier using both BoW and TF-IDF features.
# It compares performance on two datasets: one with regular stopwords and one with extra stopwords removed.


# SGD model pipeline
sgd_model = make_pipeline(
    StandardScaler(with_mean=False),
    SGDClassifier(loss='hinge', max_iter=1000, tol=1e-3, random_state=42)
)

# Evaluate on BoW (Regular)
start = time.time()
evaluate_model(sgd_model, X_reg_bow, y_reg, name="SGD SVM + BoW (Regular)")
print(f"Time taken for SGD SVM + BoW (Regular): {time.time() - start:.4f} seconds\n")

# Evaluate on BoW (Extra)
start = time.time()
evaluate_model(sgd_model, X_extra_bow, y_extra, name="SGD SVM + BoW (Extra)")
print(f"Time taken for SGD SVM + BoW (Extra): {time.time() - start:.4f} seconds\n")

# Evaluate on TF-IDF (Regular)
start = time.time()
evaluate_model(sgd_model, X_reg_tfidf, y_reg, name="SGD SVM + TF-IDF (Regular)")
print(f"Time taken for SGD SVM + TF-IDF (Regular): {time.time() - start:.4f} seconds\n")

# Evaluate on TF-IDF (Extra)
start = time.time()
evaluate_model(sgd_model, X_extra_tfidf, y_extra, name="SGD SVM + TF-IDF (Extra)")
print(f"Time taken for SGD SVM + TF-IDF (Extra): {time.time() - start:.4f} seconds\n")



=== SGD SVM + BoW (Regular) ===
Test Accuracy:    0.8238
Test F1 Score:    0.8243
Test Recall:      0.8264
Test Precision:   0.8221
5-Fold CV Accuracy: Mean = 0.8257, Std = 0.0049
Time taken for SGD SVM + BoW (Regular): 10.3035 seconds


=== SGD SVM + BoW (Extra) ===
Test Accuracy:    0.8208
Test F1 Score:    0.8211
Test Recall:      0.8224
Test Precision:   0.8198
5-Fold CV Accuracy: Mean = 0.8232, Std = 0.0047
Time taken for SGD SVM + BoW (Extra): 4.3011 seconds


=== SGD SVM + TF-IDF (Regular) ===
Test Accuracy:    0.8107
Test F1 Score:    0.8119
Test Recall:      0.8170
Test Precision:   0.8068
5-Fold CV Accuracy: Mean = 0.8122, Std = 0.0044
Time taken for SGD SVM + TF-IDF (Regular): 3.5189 seconds


=== SGD SVM + TF-IDF (Extra) ===
Test Accuracy:    0.8072
Test F1 Score:    0.8087
Test Recall:      0.8152
Test Precision:   0.8024
5-Fold CV Accuracy: Mean = 0.8111, Std = 0.0044
Time taken for SGD SVM + TF-IDF (Extra): 2.3023 seconds

