# Toxic Comments Classifier Model

---

This is a ML model created by Keshav Ghai (An aspiring AI/ML dev).

This is a **Bidirectional LSTM (BiLSTM) neural network** which classifies online comments into **6 toxicity categories**: toxic, severe toxic, obscene, threat, insult, and identity hate. Unlike previous models that performed single-class classification, this model uses **multi-label classification** - meaning a single comment can be tagged with multiple toxicity types simultaneously. The training script **"trainer.py"** takes comment data from CSV files, tokenizes the text, trains a BiLSTM with word embeddings, and generates visualizations.

## What makes this different?

Online content moderation requires understanding nuanced language and context. This model:
- Uses **bidirectional LSTM** to understand context from both directions (left and right)
- Performs **multi-label classification** with sigmoid activation (not multi-class with softmax)
- Handles **6 independent toxicity types** simultaneously with 0.5 confidence threshold
- Works with **variable-length text** through padding

## Imports:-
---

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sns
import json
from sklearn.metrics import confusion_matrix, multilabel_confusion_matrix

## 1. Define Paths and Constants
---

In [None]:
BASE_DIR = "./tensorflow/toxic_classifier/dataset/"
MODEL_SAVE_PATH = "./Models/toxic_model.keras"
TOKENIZER_SAVE_PATH = "./tensorflow/toxic_classifier/toxic_tokenizer.json"

LABELS = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
MAX_VOCAB = 20000
MAX_LEN = 256
BATCH_SIZE = 32
EPOCHS = 1

print(f"Toxicity labels: {LABELS}")
print(f"Model will be saved to: {MODEL_SAVE_PATH}")

## 2. Load Training and Validation Data
---

> Data is loaded from CSV files provided by the Kaggle Toxic Comments dataset.
> - **train.csv**: Comment text with 6 binary labels
> - **test.csv**: Comment text for testing
> - **test_labels.csv**: Labels for test comments

In [None]:
# Load datasets
train_df = pd.read_csv(BASE_DIR + "train.csv")
test_df = pd.read_csv(BASE_DIR + "test.csv")
test_labels_df = pd.read_csv(BASE_DIR + "test_labels.csv")

print("Train shape:", train_df.shape)
print("Test shape:", test_df.shape)
print("Test labels shape:", test_labels_df.shape)

# Display sample
print("\nSample training data:")
print(train_df.head(1))

## 3. Clean and Prepare Validation Data
---

> Some test labels have -1 values (indicating missing data). We remove these rows.

In [None]:
# Remove rows with -1 (missing labels)
test_labels_df = test_labels_df[test_labels_df[LABELS].min(axis=1) >= 0]

# Merge test comments with their labels
val_df = test_df.merge(test_labels_df, on="id")

print(f"Validation usable rows: {val_df.shape[0]}")

## 4. Extract Text and Labels
---

> Separate text and labels into arrays for model training.
> **Important**: Labels are already binary (0/1) for each toxicity type.

In [None]:
# Training data
X_train = train_df["comment_text"].astype(str).tolist()
y_train = train_df[LABELS].astype("float32").values

# Validation data
X_val = val_df["comment_text"].astype(str).tolist()
y_val = val_df[LABELS].astype("float32").values

print(f"Train samples: {len(X_train)}")
print(f"Val samples: {len(X_val)}")
print(f"Train label shape: {y_train.shape}")
print(f"Val label shape: {y_val.shape}")

# Show label distribution
print("\nLabel distribution (train):")
for i, label in enumerate(LABELS):
    count = np.sum(y_train[:, i])
    pct = (count / y_train.shape[0]) * 100
    print(f"  {label}: {count} ({pct:.1f}%)")

## 5. Tokenize and Pad Text
---

> Comments are converted to sequences of integers, then padded to a fixed length.

In [None]:
# Create tokenizer
tokenizer = tf.keras.preprocessing.text.Tokenizer(
    num_words=MAX_VOCAB,
    oov_token="<OOV>"
)

# Fit on training text
tokenizer.fit_on_texts(X_train)

# Save tokenizer (tokenizer.to_json() already returns a JSON string)
with open(TOKENIZER_SAVE_PATH, "w", encoding="utf-8") as f:
    f.write(tokenizer.to_json())

print(f"Tokenizer saved. Vocabulary size: {len(tokenizer.word_index)}")

# Convert to sequences
X_train_seq = tokenizer.texts_to_sequences(X_train)
X_val_seq = tokenizer.texts_to_sequences(X_val)

# Pad sequences
X_train_pad = tf.keras.preprocessing.sequence.pad_sequences(
    X_train_seq, maxlen=MAX_LEN, padding="post"
)
X_val_pad = tf.keras.preprocessing.sequence.pad_sequences(
    X_val_seq, maxlen=MAX_LEN, padding="post"
)

print(f"Padded train shape: {X_train_pad.shape}")
print(f"Padded val shape: {X_val_pad.shape}")

## 6. Model Architecture
---

> This model uses **multi-label classification** with:
> - **Embedding layer**: Converts token indices to dense vectors (8-dimensional)
> - **Bidirectional LSTM**: Processes text in both directions (8 units)
> - **Dense layers**: With dropout for regularization
> - **Output**: 6 neurons with sigmoid activation (not softmax!) for independent binary predictions
> - **Loss**: Binary crossentropy (not categorical)

In [None]:
model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(MAX_LEN,), dtype="int32"),
    tf.keras.layers.Embedding(MAX_VOCAB, 8),  # 8-dim word embeddings
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(8)),  # BiLSTM
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(len(LABELS), activation="sigmoid")  # Multi-label: sigmoid
])

model.compile(
    optimizer="adam",
    loss="binary_crossentropy",  # Multi-label: binary crossentropy
    metrics=["accuracy"]
)

model.summary()

## 7. Train the Model
---

> Training with 32 batch size. (For full training, use more epochs)

In [None]:
history = model.fit(
    X_train_pad, y_train,
    validation_data=(X_val_pad, y_val),
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    verbose=1
)

model.save(MODEL_SAVE_PATH)
print(f"Model saved to {MODEL_SAVE_PATH}")

## 8. Training Visualizations
---

### a. Loss Over Epochs

In [None]:
plt.figure(figsize=(6, 4))
plt.plot(history.history["loss"], label="Train Loss")
plt.plot(history.history["val_loss"], label="Val Loss")
plt.title("Loss Over Epochs (Binary Crossentropy)")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.grid(True)
plt.savefig("./tensorflow/toxic_classifier/loss_graph.png")
plt.show()
plt.close()

### b. Accuracy Over Epochs

In [None]:
plt.figure(figsize=(6, 4))
plt.plot(history.history["accuracy"], label="Train Accuracy")
plt.plot(history.history["val_accuracy"], label="Val Accuracy")
plt.title("Accuracy Over Epochs")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.legend()
plt.grid(True)
plt.savefig("./tensorflow/toxic_classifier/accuracy_graph.png")
plt.show()
plt.close()

### c. Per-Label Confusion Matrices

In [None]:
# Get predictions
y_val_pred = model.predict(X_val_pad, verbose=0)
y_val_pred_binary = (y_val_pred > 0.5).astype(int)  # Apply 0.5 threshold

# Create subplots for each label
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

for i, label in enumerate(LABELS):
    cm = confusion_matrix(y_val[:, i], y_val_pred_binary[:, i])
    
    sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=axes[i],
                xticklabels=["Not " + label, label],
                yticklabels=["Not " + label, label])
    axes[i].set_title(f"Confusion Matrix: {label.replace('_', ' ').title()}")
    axes[i].set_ylabel("True")
    axes[i].set_xlabel("Predicted")

plt.tight_layout()
plt.savefig("./tensorflow/toxic_classifier/confusion_matrix.png", dpi=150, bbox_inches="tight")
plt.show()
plt.close()

print("Saved confusion matrices")

## 9. Per-Label Performance Metrics

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

print("Per-Label Performance (Threshold = 0.5):\n")
print(f"{'Label':<20} {'Precision':<12} {'Recall':<12} {'F1-Score':<12}")
print("-" * 56)

for i, label in enumerate(LABELS):
    precision = precision_score(y_val[:, i], y_val_pred_binary[:, i], zero_division=0)
    recall = recall_score(y_val[:, i], y_val_pred_binary[:, i], zero_division=0)
    f1 = f1_score(y_val[:, i], y_val_pred_binary[:, i], zero_division=0)
    
    print(f"{label:<20} {precision:<12.4f} {recall:<12.4f} {f1:<12.4f}")

## 10. Interactive Testing
---

> Test the model on custom comments with multi-label toxicity detection.

In [None]:
print("\nInteractive testing mode:")
print(f"Toxicity labels: {LABELS}")
print("(Run this cell and enter comments to test the model)\n")

while True:
    comment = input("Enter a comment (or 'quit'): ").strip()
    
    if comment.lower() == "quit":
        break
    
    if not comment:
        print("Please enter a non-empty comment.\n")
        continue
    
    try:
        # Tokenize and pad
        seq = tokenizer.texts_to_sequences([comment])
        pad = tf.keras.preprocessing.sequence.pad_sequences(seq, maxlen=MAX_LEN, padding="post")
        
        # Predict
        pred = model.predict(pad, verbose=0)[0]
        
        # Get detected toxicity types (threshold > 0.5)
        detected = []
        for i, label in enumerate(LABELS):
            if pred[i] > 0.5:
                detected.append((label, pred[i] * 100))
        
        if detected:
            print("\nToxicity Detected:")
            for label, conf in sorted(detected, key=lambda x: x[1], reverse=True):
                print(f"  {label}: {conf:.2f}%")
        else:
            print("\nNo toxicity detected.")
        
        print("\nAll Probabilities:")
        for label, prob in zip(LABELS, pred):
            print(f"  {label}: {prob*100:.2f}%")
        print()
    
    except Exception as e:
        print(f"Error: {e}\n")