<a href="https://colab.research.google.com/github/Arv-ind-s/content-moderation-system/blob/main/notebook/03_model_training_ipynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ðŸ¤– Content Moderation System - Model Training

## Objective
Fine-tune DistilBERT for multi-label toxic comment classification with class imbalance handling.

## Model Selection: DistilBERT

**Why DistilBERT?**
- 40% smaller than BERT (66M vs 110M parameters)
- 60% faster inference (critical for real-time API)
- Maintains 97% of BERT's performance
- Fits AWS Lambda deployment constraints
- Pre-trained on English language understanding

## Training Strategy

### Multi-Label Classification
- Each of 6 toxicity categories treated as independent binary classification
- Use Binary Cross-Entropy (BCE) loss with logits
- Sigmoid activation for each label (not softmax)

### Handling Class Imbalance (8.8:1 ratio)
- **Weighted loss**: Higher penalty for misclassifying toxic comments
- **Focal loss** (optional): Focus learning on hard examples
- **Metrics**: F1-score, Precision, Recall (NOT accuracy)

### Training Configuration
- **Batch size**: 16 (balanced for GPU memory and training speed)
- **Learning rate**: 2e-5 (recommended for fine-tuning transformers)
- **Epochs**: 3-4 (transformers need few epochs)
- **Optimizer**: AdamW (weight decay for regularization)
- **Warmup steps**: 500 (gradual learning rate increase)
- **Max sequence length**: 256 tokens (handles 95% of comments)

## Evaluation Metrics

Given severe imbalance, we'll track:
1. **Per-category F1-scores** (harmonic mean of precision/recall)
2. **Precision** (avoid false positives - don't over-flag)
3. **Recall** (catch toxic content - don't miss real toxicity)
4. **ROC-AUC** (threshold-independent performance)
5. **Confusion matrix** per category

**Target**: F1 > 0.75 for "toxic" category (balanced precision/recall)

---

**Author**: Aravind S  
**Date**: December 7, 2025  
**Model**: distilbert-base-uncased  
**Framework**: PyTorch + Transformers  
**GitHub**:https://github.com/Arv-ind-s/content-moderation-system/blob/main/README.md

---

## 1. Environment Setup and Install Dependencies

In [1]:
# Install required packages
!pip install -q transformers torch datasets scikit-learn accelerate

# Import libraries
import pandas as pd
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import (
    DistilBertTokenizer,
    DistilBertForSequenceClassification,
    get_linear_schedule_with_warmup
)
# AdamW is now part of torch.optim or transformers.optimization
from torch.optim import AdamW
from sklearn.metrics import (
    classification_report,
    f1_score,
    precision_score,
    recall_score,
    roc_auc_score
)
import warnings
warnings.filterwarnings('ignore')

# Check GPU availability
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"âœ… Using device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

âœ… Using device: cuda
GPU: Tesla T4
Memory: 15.83 GB


## 2. Initialize Tokenizer and Configuration

DistilBERT uses WordPiece tokenization with:
- Max sequence length: 256 tokens (covers 95%+ of comments)
- Padding/truncation to handle variable lengths
- Special tokens: [CLS] at start, [SEP] at end

In [2]:
# Initialize tokenizer
MODEL_NAME = 'distilbert-base-uncased'
tokenizer = DistilBertTokenizer.from_pretrained(MODEL_NAME)

# Configuration
MAX_LENGTH = 256
BATCH_SIZE = 16
NUM_LABELS = 6  # 6 toxicity categories
EPOCHS = 3
LEARNING_RATE = 2e-5

print(f"âœ… Tokenizer loaded: {MODEL_NAME}")
print(f"Vocabulary size: {tokenizer.vocab_size:,}")
print(f"Max sequence length: {MAX_LENGTH}")
print(f"Batch size: {BATCH_SIZE}")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

âœ… Tokenizer loaded: distilbert-base-uncased
Vocabulary size: 30,522
Max sequence length: 256
Batch size: 16


In [5]:
# Test tokenization on a sample
train_df = pd.read_csv('/content/train_processed.csv', engine='python', on_bad_lines='skip')
sample_text = train_df.iloc[0]['comment_text']
encoded = tokenizer.encode_plus(
    sample_text,
    add_special_tokens=True,
    max_length=MAX_LENGTH,
    padding='max_length',
    truncation=True,
    return_attention_mask=True,
    return_tensors='pt'
)

print("Sample text:", sample_text[:100])
print("\nTokenized output:")
print(f"Input IDs shape: {encoded['input_ids'].shape}")
print(f"Attention mask shape: {encoded['attention_mask'].shape}")
print(f"First 10 tokens: {encoded['input_ids'][0][:10].tolist()}")
print(f"Decoded: {tokenizer.decode(encoded['input_ids'][0][:10])}")

Sample text: you are a chicken shit cock sucking pussy bastard! you are a chicken shit cock sucking pussy bastard

Tokenized output:
Input IDs shape: torch.Size([1, 256])
Attention mask shape: torch.Size([1, 256])
First 10 tokens: [101, 2017, 2024, 1037, 7975, 4485, 10338, 13475, 22418, 8444]
Decoded: [CLS] you are a chicken shit cock sucking pussy bastard


## 3. Create Custom Dataset Class

PyTorch Dataset for efficient batch loading with tokenization.

In [6]:
class ToxicCommentsDataset(Dataset):
    """
    Custom Dataset for toxic comment classification.
    """
    def __init__(self, texts, labels, tokenizer, max_length):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = str(self.texts[idx])
        label = self.labels[idx]

        # Tokenize
        encoding = self.tokenizer.encode_plus(
            text,
            add_special_tokens=True,
            max_length=self.max_length,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.FloatTensor(label)
        }

print("âœ… Dataset class defined")

âœ… Dataset class defined


In [12]:
from sklearn.model_selection import train_test_split

# Load all processed datasets with error handling
train_df = pd.read_csv('/content/train_processed.csv', engine='python', on_bad_lines='skip')
test_df = pd.read_csv('/content/test_processed.csv', engine='python', on_bad_lines='skip')

# Create validation set from train_df by splitting it
train_df, val_df = train_test_split(train_df, test_size=0.1, random_state=42)

print(f"âœ… Data loaded and split:")
print(f"Train: {len(train_df):,} samples")
print(f"Val:   {len(val_df):,} samples")
print(f"Test:  {len(test_df):,} samples")

# Prepare labels as numpy arrays
label_cols = ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']
train_labels = train_df[label_cols].values
val_labels = val_df[label_cols].values
test_labels = test_df[label_cols].values

# Create datasets
train_dataset = ToxicCommentsDataset(
    texts=train_df['comment_text'].values,
    labels=train_labels,
    tokenizer=tokenizer,
    max_length=MAX_LENGTH
)

val_dataset = ToxicCommentsDataset(
    texts=val_df['comment_text'].values,
    labels=val_labels,
    tokenizer=tokenizer,
    max_length=MAX_LENGTH
)

test_dataset = ToxicCommentsDataset(
    texts=test_df['comment_text'].values,
    labels=test_labels,
    tokenizer=tokenizer,
    max_length=MAX_LENGTH
)

print(f"\nâœ… Datasets created:")
print(f"Train: {len(train_dataset):,} samples")
print(f"Val:   {len(val_dataset):,} samples")
print(f"Test:  {len(test_dataset):,} samples")

âœ… Data loaded and split:
Train: 114,882 samples
Val:   12,765 samples
Test:  15,956 samples

âœ… Datasets created:
Train: 114,882 samples
Val:   12,765 samples
Test:  15,956 samples


In [13]:
# Create DataLoaders
train_loader = DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=2
)

val_loader = DataLoader(
    val_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    num_workers=2
)

test_loader = DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
    num_workers=2
)

print(f"âœ… DataLoaders created:")
print(f"Train batches: {len(train_loader)}")
print(f"Val batches:   {len(val_loader)}")
print(f"Test batches:  {len(test_loader)}")

# Test loading a batch
batch = next(iter(train_loader))
print("\nSample batch shapes:")
print(f"Input IDs: {batch['input_ids'].shape}")
print(f"Attention mask: {batch['attention_mask'].shape}")
print(f"Labels: {batch['labels'].shape}")

âœ… DataLoaders created:
Train batches: 7181
Val batches:   798
Test batches:  998

Sample batch shapes:
Input IDs: torch.Size([16, 256])
Attention mask: torch.Size([16, 256])
Labels: torch.Size([16, 6])


## 4. Calculate Class Weights for Imbalanced Data

With 8.8:1 clean-to-toxic ratio, we weight the loss to penalize misclassifying toxic comments more.

In [14]:
# Calculate positive class weights for each label
# (train_df already loaded above, but if you run this cell separately, uncomment next line)
# train_df = pd.read_csv('/content/train_processed.csv', engine='python', on_bad_lines='skip')

class_weights = []
print("Class weights for imbalanced labels:\n")
print(f"{'Label':<20} {'Positive':>10} {'Negative':>10} {'Weight':>10}")
print("-" * 60)

for col in label_cols:
    pos_count = train_df[col].sum()
    neg_count = len(train_df) - pos_count
    weight = neg_count / pos_count if pos_count > 0 else 1.0
    class_weights.append(weight)

    print(f"{col:<20} {pos_count:>10,} {neg_count:>10,} {weight:>10.2f}")

# Convert to tensor and move to GPU
pos_weight = torch.FloatTensor(class_weights).to(device)
print(f"\nâœ… Class weights moved to {device}")
print(f"Weights: {pos_weight}")

Class weights for imbalanced labels:

Label                  Positive   Negative     Weight
------------------------------------------------------------
toxic                    11,043    103,839       9.40
severe_toxic              1,155    113,727      98.46
obscene                   6,077    108,805      17.90
threat                      342    114,540     334.91
insult                    5,681    109,201      19.22
identity_hate             1,018    113,864     111.85

âœ… Class weights moved to cuda
Weights: tensor([  9.4032,  98.4649,  17.9044, 334.9123,  19.2221, 111.8507],
       device='cuda:0')


## 5. Initialize DistilBERT Model

In [15]:
# Load pre-trained DistilBERT for multi-label classification
model = DistilBertForSequenceClassification.from_pretrained(
    MODEL_NAME,
    num_labels=NUM_LABELS,
    problem_type="multi_label_classification"
)

# Move model to GPU
model = model.to(device)

print(f"âœ… Model loaded and moved to {device}")
print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


âœ… Model loaded and moved to cuda
Total parameters: 66,958,086
Trainable parameters: 66,958,086


## 6. Setup Optimizer and Learning Rate Scheduler

- Optimizer: AdamW (Adam with weight decay for regularization)
- Learning rate: 2e-5 (standard for fine-tuning transformers)
- Scheduler: Linear warmup then decay

In [16]:
# Setup optimizer
optimizer = AdamW(model.parameters(), lr=LEARNING_RATE, weight_decay=0.01)

# Calculate total training steps
total_steps = len(train_loader) * EPOCHS

# Setup learning rate scheduler with warmup
scheduler = get_linear_schedule_with_warmup(
    optimizer,
    num_warmup_steps=500,  # Warmup for first 500 steps
    num_training_steps=total_steps
)

print(f"âœ… Optimizer and scheduler configured")
print(f"Total training steps: {total_steps:,}")
print(f"Warmup steps: 500")
print(f"Learning rate: {LEARNING_RATE}")

âœ… Optimizer and scheduler configured
Total training steps: 21,543
Warmup steps: 500
Learning rate: 2e-05


In [17]:
# Binary Cross Entropy with Logits Loss (handles multi-label)
# Uses pos_weight to handle class imbalance
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)

print(f"âœ… Loss function: BCEWithLogitsLoss with class weights")
print(f"Class weights: {pos_weight}")

âœ… Loss function: BCEWithLogitsLoss with class weights
Class weights: tensor([  9.4032,  98.4649,  17.9044, 334.9123,  19.2221, 111.8507],
       device='cuda:0')


## 7. Training and Validation Functions

In [18]:
def train_epoch(model, data_loader, criterion, optimizer, scheduler, device):
    """
    Train for one epoch.
    """
    model.train()
    total_loss = 0

    for batch_idx, batch in enumerate(data_loader):
        # Move batch to device
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)

        # Zero gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask
        )

        logits = outputs.logits

        # Calculate loss
        loss = criterion(logits, labels)

        # Backward pass
        loss.backward()

        # Clip gradients to prevent exploding gradients
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

        # Update weights
        optimizer.step()
        scheduler.step()

        total_loss += loss.item()

        # Print progress every 500 batches
        if (batch_idx + 1) % 500 == 0:
            print(f"  Batch {batch_idx + 1}/{len(data_loader)} | Loss: {loss.item():.4f}")

    avg_loss = total_loss / len(data_loader)
    return avg_loss


def eval_model(model, data_loader, criterion, device):
    """
    Evaluate model on validation/test set.
    """
    model.eval()
    total_loss = 0
    all_preds = []
    all_labels = []

    with torch.no_grad():
        for batch in data_loader:
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)

            # Forward pass
            outputs = model(
                input_ids=input_ids,
                attention_mask=attention_mask
            )

            logits = outputs.logits
            loss = criterion(logits, labels)
            total_loss += loss.item()

            # Get predictions (apply sigmoid for probabilities)
            probs = torch.sigmoid(logits)
            preds = (probs > 0.5).float()  # Threshold at 0.5

            all_preds.append(preds.cpu().numpy())
            all_labels.append(labels.cpu().numpy())

    avg_loss = total_loss / len(data_loader)
    all_preds = np.vstack(all_preds)
    all_labels = np.vstack(all_labels)

    return avg_loss, all_preds, all_labels

print("âœ… Training and evaluation functions defined")

âœ… Training and evaluation functions defined


## 8. Train the Model

Training for 3 epochs with validation after each epoch.

In [19]:
# Training loop
best_val_loss = float('inf')
train_losses = []
val_losses = []

print("="*80)
print("STARTING TRAINING")
print("="*80)

for epoch in range(EPOCHS):
    print(f"\nEpoch {epoch + 1}/{EPOCHS}")
    print("-" * 60)

    # Train
    train_loss = train_epoch(model, train_loader, criterion, optimizer, scheduler, device)
    train_losses.append(train_loss)
    print(f"Train Loss: {train_loss:.4f}")

    # Validate
    val_loss, val_preds, val_labels = eval_model(model, val_loader, criterion, device)
    val_losses.append(val_loss)
    print(f"Val Loss: {val_loss:.4f}")

    # Calculate F1 scores per label
    print("\nValidation F1-Scores per category:")
    for idx, label in enumerate(label_cols):
        f1 = f1_score(val_labels[:, idx], val_preds[:, idx], zero_division=0)
        print(f"  {label:<20}: {f1:.4f}")

    # Save best model
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        torch.save(model.state_dict(), 'best_model.pt')
        print(f"\nâœ… Best model saved (Val Loss: {val_loss:.4f})")

    print("=" * 60)

print("\nðŸŽ‰ Training Complete!")

STARTING TRAINING

Epoch 1/3
------------------------------------------------------------
  Batch 500/7181 | Loss: 0.1410
  Batch 1000/7181 | Loss: 1.7241
  Batch 1500/7181 | Loss: 0.0758
  Batch 2000/7181 | Loss: 0.3393
  Batch 2500/7181 | Loss: 0.0847
  Batch 3000/7181 | Loss: 0.0372
  Batch 3500/7181 | Loss: 0.3220
  Batch 4000/7181 | Loss: 1.0053
  Batch 4500/7181 | Loss: 0.1176
  Batch 5000/7181 | Loss: 0.0354
  Batch 5500/7181 | Loss: 0.1113
  Batch 6000/7181 | Loss: 0.5256
  Batch 6500/7181 | Loss: 0.7119
  Batch 7000/7181 | Loss: 0.9454
Train Loss: 0.5922
Val Loss: 0.4330

Validation F1-Scores per category:
  toxic               : 0.7968
  severe_toxic        : 0.4491
  obscene             : 0.7917
  threat              : 0.4430
  insult              : 0.7131
  identity_hate       : 0.5226

âœ… Best model saved (Val Loss: 0.4330)

Epoch 2/3
------------------------------------------------------------
  Batch 500/7181 | Loss: 0.2026
  Batch 1000/7181 | Loss: 0.1062
  Batch 1500/

In [21]:
# Mount Drive if not already mounted
from google.colab import drive
drive.mount('/content/drive')

import os
# Define the directory path
save_dir = '/content/drive/MyDrive/content_moderation/models'

# Create the directory if it doesn't exist
os.makedirs(save_dir, exist_ok=True)

# Copy best model to Drive
!cp best_model.pt {save_dir}/best_model.pt

# Also save with metadata
import torch
torch.save({
    'model_state_dict': model.state_dict(),
    'model_name': MODEL_NAME,
    'num_labels': NUM_LABELS,
    'max_length': MAX_LENGTH,
    'label_cols': label_cols,
    'class_weights': class_weights
}, f'{save_dir}/best_model_with_config.pt')

print("âœ… Model saved to Google Drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
âœ… Model saved to Google Drive
