# Naive Bayes Classification (from scratch)
## Toxic Comment Classification â€” Multi-label (Jigsaw)

**Goal**
- Train a **Multinomial Naive Bayes** model (implemented from scratch) for toxic comment classification.
- This is a **multi-label** problem: we train **one binary classifier per label**.
- Evaluate performance on the validation split using **F1**, **precision**, **recall**, and global multi-label metrics.

**Labels**
- toxic, severe_toxic, obscene, threat, insult, identity_hate


In [6]:
import os
import sys
import numpy as np
import pandas as pd

from sklearn.metrics import (
    f1_score,
    precision_recall_fscore_support,
    accuracy_score,
    hamming_loss
)

# Make project root visible so `import src...` works
sys.path.append(os.path.abspath(".."))

from src.models.naive_bayes import fit_multilabel_nb



## 1) Load cleaned datasets

We use the cleaned Parquet files produced by the preprocessing notebook:
- `train_cleaned.parquet`
- `valid_cleaned.parquet`
- `test_cleaned.parquet`

We choose a text column (`text_aggressive` by default).


In [7]:
df_train = pd.read_parquet("../data/processed/train_cleaned.parquet")
df_test  = pd.read_parquet("../data/processed/test_cleaned.parquet")
df_valid = pd.read_parquet("../data/processed/valid_cleaned.parquet")

TEXT_COL = "text_aggressive"
LABELS = ["toxic","severe_toxic","obscene","threat","insult","identity_hate"]

print("Train:", df_train.shape, "Valid:", df_valid.shape, "Test:", df_test.shape)
print("Using text column:", TEXT_COL)


Train: (127656, 11) Valid: (15957, 9) Test: (15958, 9)
Using text column: text_aggressive


## 2) Train Naive Bayes (one model per label)

We train **6 independent** binary Naive Bayes models:
- `models["toxic"]` predicts the label `toxic`
- ...
- `models["identity_hate"]` predicts the label `identity_hate`

The input is a **bag-of-words** count matrix built from the cleaned text.


In [8]:
vectorizer, models, X_valid_counts = fit_multilabel_nb(
    df_train, df_valid,
    text_col=TEXT_COL,
    labels=LABELS,
    alpha=1.0
)

for label in LABELS:
    y_pred = models[label].predict(X_valid_counts)
    print(label, "F1 =", f1_score(df_valid[label], y_pred, zero_division=0))

toxic F1 = 0.7366677707850282
severe_toxic F1 = 0.42661448140900193
obscene F1 = 0.7223404255319149
threat F1 = 0.1917808219178082
insult F1 = 0.6533180778032036
identity_hate F1 = 0.3325635103926097


In [4]:
models.keys()

dict_keys(['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate'])

In [5]:
type(models["toxic"])


src.models.naive_bayes.MultinomialNaiveBayes

## 3) Per-label evaluation (Precision / Recall / F1)

We report:
- **pos_rate_valid**: how rare the positive class is on validation
- **precision / recall / F1** for each label


In [9]:
rows = []
for label in LABELS:
    y_true = df_valid[label].values
    y_pred = models[label].predict(X_valid_counts)

    p, r, f1, _ = precision_recall_fscore_support(
        y_true, y_pred, average="binary", zero_division=0
    )

    rows.append({
        "label": label,
        "pos_rate_valid": float(y_true.mean()),
        "precision": float(p),
        "recall": float(r),
        "f1": float(f1),
    })

results_df = pd.DataFrame(rows).sort_values("f1", ascending=False)
results_df


Unnamed: 0,label,pos_rate_valid,precision,recall,f1
0,toxic,0.092373,0.719741,0.75441,0.736668
2,obscene,0.053331,0.659864,0.797885,0.72234
4,insult,0.048944,0.590486,0.731114,0.653318
1,severe_toxic,0.009087,0.297814,0.751724,0.426614
5,identity_hate,0.008272,0.239203,0.545455,0.332564
3,threat,0.00282,0.138614,0.311111,0.191781


## 4) Global multi-label metrics

We also compute:
- **micro-F1**: overall performance across all labels
- **macro-F1**: average across labels (treats rare labels equally)
- **exact match accuracy**: percentage of samples where all 6 labels are predicted correctly
- **hamming loss**: fraction of incorrect labels (lower is better)


In [10]:
Y_true = df_valid[LABELS].values

Y_pred = np.column_stack([
    models[label].predict(X_valid_counts) for label in LABELS
])

micro_f1 = f1_score(Y_true, Y_pred, average="micro", zero_division=0)
macro_f1 = f1_score(Y_true, Y_pred, average="macro", zero_division=0)
exact_acc = accuracy_score(Y_true, Y_pred)
hamm = hamming_loss(Y_true, Y_pred)

global_df = pd.DataFrame([{
    "micro_f1": micro_f1,
    "macro_f1": macro_f1,
    "exact_match_accuracy": exact_acc,
    "hamming_loss": hamm
}])

global_df


Unnamed: 0,micro_f1,macro_f1,exact_match_accuracy,hamming_loss
0,0.66098,0.510548,0.897224,0.027397


In [None]:
#os.makedirs("../data/results", exist_ok=True)

#results_df.to_csv("../data/results/nb_valid_per_label.csv", index=False)
#global_df.to_csv("../data/results/nb_valid_global.csv", index=False)

#print("Saved:")
#print("- ../data/results/nb_valid_per_label.csv")
#print("- ../data/results/nb_valid_global.csv")
