# Tutorial 20: Case Study - Content Moderation System

## End-to-End ML System Design for Harmful Content Detection

---

## Learning Objectives

By the end of this tutorial, you will be able to:

1. **Design a multi-modal content moderation system** handling text and images
2. **Apply the 7-step ML framework** to safety-critical systems
3. **Implement text toxicity classifiers**
4. **Balance precision and recall** for user safety vs over-moderation
5. **Design low-latency serving** for real-time moderation
6. **Build monitoring systems** for false positives/negatives

## Setup and Imports

In [None]:
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.metrics import classification_report, confusion_matrix, precision_recall_curve
from sklearn.metrics import precision_score, recall_score, f1_score
import re

np.random.seed(42)
print('All imports successful!')

---

# 1. Problem Statement and Requirements

## 1.1 Business Context

**Scenario**: Design a content moderation system for a social media platform.

In [None]:
class ModerationRequirements:
    def __init__(self):
        self.categories = {
            'hate_speech': 'Content attacking protected groups',
            'harassment': 'Targeted abuse or bullying',
            'spam': 'Unsolicited commercial content',
            'violence': 'Graphic violence or threats',
            'safe': 'Content that does not violate policies'
        }
        self.scale = {'posts_per_day': '500M', 'qps_peak': '50K'}
        self.latency = {'p99': '100ms'}
        self.accuracy = {'precision_target': 0.90, 'recall_target': 0.95}
        
    def display(self):
        print('CONTENT MODERATION REQUIREMENTS')
        print('=' * 50)
        print('Categories:', list(self.categories.keys()))
        print('Scale:', self.scale)
        print('Latency:', self.latency)
        print('Accuracy targets:', self.accuracy)

req = ModerationRequirements()
req.display()

## 1.2 System Architecture

```
Content -> Pre-filter -> ML Models -> Decision Engine -> Action
                              |              |
                              v              v
                         [Text Model]   [Threshold]
                         [Image Model]  [Ensemble]
```

In [None]:
fig, ax = plt.subplots(figsize=(12, 4))
ax.set_xlim(0, 12)
ax.set_ylim(0, 4)
ax.axis('off')

boxes = [
    (0.5, 1, 1.5, 2, 'Content', '#3498db'),
    (2.5, 1, 2, 2, 'Pre-filter', '#95a5a6'),
    (5, 0.5, 2, 3, 'ML Models', '#2ecc71'),
    (7.5, 0.5, 2, 3, 'Decision\nEngine', '#e74c3c'),
    (10, 1, 1.5, 2, 'Action', '#9b59b6')
]

for x, y, w, h, label, color in boxes:
    rect = plt.Rectangle((x, y), w, h, facecolor=color, alpha=0.3, edgecolor=color, lw=2)
    ax.add_patch(rect)
    ax.text(x + w/2, y + h/2, label, ha='center', va='center', fontsize=10, fontweight='bold')

for x1, x2 in [(2, 2.5), (4.5, 5), (7, 7.5), (9.5, 10)]:
    ax.annotate('', xy=(x2, 2), xytext=(x1, 2), arrowprops=dict(arrowstyle='->', lw=2))

ax.set_title('Content Moderation Architecture', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

---

# 2. Data Generation

In [None]:
class ContentDataGenerator:
    def __init__(self, n_samples=30000):
        self.n_samples = n_samples
        self.categories = ['safe', 'hate_speech', 'harassment', 'spam', 'violence']
        self.templates = {
            'safe': [
                'I love this beautiful day',
                'Great meal at the restaurant',
                'Happy birthday to you',
                'The weather is perfect today',
                'Congratulations on your success'
            ],
            'hate_speech': [
                'Those people from X are terrible',
                'Group Y should not exist',
                'I hate everyone who believes Z'
            ],
            'harassment': [
                'You are so stupid person',
                'Nobody likes you loser',
                'You should be ashamed'
            ],
            'spam': [
                'CLICK HERE for FREE money',
                'Buy followers at low prices',
                'Make money from home easy'
            ],
            'violence': [
                'I want to hurt people',
                'Someone should attack them',
                'Violence is the solution'
            ]
        }
        
    def augment(self, text):
        variations = [text, text.upper(), text.lower(), text + '!!!']
        return np.random.choice(variations)
    
    def generate(self):
        np.random.seed(42)
        probs = [0.85, 0.04, 0.04, 0.04, 0.03]
        
        data = []
        for i in range(self.n_samples):
            cat = np.random.choice(self.categories, p=probs)
            text = self.augment(np.random.choice(self.templates[cat]))
            data.append({
                'content_id': f'post_{i}',
                'text': text,
                'category': cat,
                'user_id': f'user_{np.random.randint(0, 10000)}',
                'text_length': len(text),
                'num_caps': sum(1 for c in text if c.isupper()),
                'hour_posted': np.random.randint(0, 24),
                'user_age_days': np.random.exponential(365),
                'prior_violations': np.random.poisson(0.5),
                'timestamp': pd.Timestamp('2023-01-01') + pd.Timedelta(days=np.random.randint(0, 365))
            })
        
        df = pd.DataFrame(data)
        df['is_violation'] = (df['category'] != 'safe').astype(int)
        print(f'Generated {len(df)} samples')
        print(df['category'].value_counts())
        return df

gen = ContentDataGenerator()
df = gen.generate()

In [None]:
print('Sample data:')
print(df[['content_id', 'text', 'category', 'is_violation']].head(10))

In [None]:
# Visualize
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

cat_counts = df['category'].value_counts()
colors = ['#27ae60' if c == 'safe' else '#e74c3c' for c in cat_counts.index]
axes[0].bar(cat_counts.index, cat_counts.values, color=colors, alpha=0.7)
axes[0].set_title('Category Distribution')
axes[0].tick_params(axis='x', rotation=45)
axes[0].set_yscale('log')

hourly = df.groupby('hour_posted')['is_violation'].mean()
axes[1].plot(hourly.index, hourly.values, marker='o')
axes[1].set_title('Violation Rate by Hour')
axes[1].set_xlabel('Hour')

plt.tight_layout()
plt.show()

## 2.1 Feature Engineering

In [None]:
class FeatureEngineering:
    def __init__(self):
        self.tfidf = TfidfVectorizer(max_features=3000, ngram_range=(1, 2))
        self.le = LabelEncoder()
        
    def fit_text(self, texts):
        return self.tfidf.fit_transform(texts)
    
    def transform_text(self, texts):
        return self.tfidf.transform(texts)
    
    def get_meta_features(self, df):
        features = df[['text_length', 'num_caps', 'hour_posted', 'user_age_days', 'prior_violations']].copy()
        features['caps_ratio'] = df['num_caps'] / (df['text_length'] + 1)
        features['is_new_user'] = (df['user_age_days'] < 30).astype(int)
        features['is_repeat'] = (df['prior_violations'] > 0).astype(int)
        return features.values
    
    def encode_labels(self, labels):
        return self.le.fit_transform(labels)

fe = FeatureEngineering()
X_text = fe.fit_text(df['text'])
X_meta = fe.get_meta_features(df)
y_binary = df['is_violation'].values
y_multi = fe.encode_labels(df['category'])

print(f'Text features: {X_text.shape}')
print(f'Meta features: {X_meta.shape}')
print(f'Categories: {fe.le.classes_}')

## 2.2 Train/Test Split

In [None]:
df_sorted = df.sort_values('timestamp')
split_idx = int(len(df_sorted) * 0.8)

train_idx = df_sorted.index[:split_idx]
test_idx = df_sorted.index[split_idx:]

X_text_train, X_text_test = X_text[train_idx], X_text[test_idx]
X_meta_train, X_meta_test = X_meta[train_idx], X_meta[test_idx]
y_train, y_test = y_binary[train_idx], y_binary[test_idx]

print(f'Train: {len(train_idx)}, Test: {len(test_idx)}')
print(f'Train violation rate: {y_train.mean():.3f}')
print(f'Test violation rate: {y_test.mean():.3f}')

---

# 3. Model Development

## 3.1 Text Classifier

In [None]:
class TextClassifier:
    def __init__(self):
        self.model = LogisticRegression(max_iter=1000, class_weight='balanced')
        
    def fit(self, X, y):
        print('Training Text Classifier...')
        self.model.fit(X, y)
        print(f'  Train accuracy: {self.model.score(X, y):.4f}')
        
    def predict(self, X):
        return self.model.predict(X)
    
    def predict_proba(self, X):
        return self.model.predict_proba(X)[:, 1]

text_clf = TextClassifier()
text_clf.fit(X_text_train, y_train)

y_pred = text_clf.predict(X_text_test)
print('\nText Classifier Performance:')
print(classification_report(y_test, y_pred, target_names=['Safe', 'Violation']))

## 3.2 Ensemble Model

In [None]:
class EnsembleClassifier:
    def __init__(self, text_weight=0.7):
        self.text_weight = text_weight
        self.text_model = LogisticRegression(max_iter=1000, class_weight='balanced')
        self.meta_model = GradientBoostingClassifier(n_estimators=50, max_depth=5)
        
    def fit(self, X_text, X_meta, y):
        print('Training Ensemble...')
        self.text_model.fit(X_text, y)
        self.meta_model.fit(X_meta, y)
        print('  Training complete')
        
    def predict_proba(self, X_text, X_meta):
        text_prob = self.text_model.predict_proba(X_text)[:, 1]
        meta_prob = self.meta_model.predict_proba(X_meta)[:, 1]
        return self.text_weight * text_prob + (1 - self.text_weight) * meta_prob
    
    def predict(self, X_text, X_meta, threshold=0.5):
        return (self.predict_proba(X_text, X_meta) >= threshold).astype(int)

ensemble = EnsembleClassifier()
ensemble.fit(X_text_train, X_meta_train, y_train)

y_prob = ensemble.predict_proba(X_text_test, X_meta_test)
y_pred = ensemble.predict(X_text_test, X_meta_test)

print('\nEnsemble Performance:')
print(classification_report(y_test, y_pred, target_names=['Safe', 'Violation']))

---

# 4. Evaluation

## 4.1 Threshold Optimization

In [None]:
def find_optimal_threshold(y_true, y_prob):
    precs, recs, threshs = precision_recall_curve(y_true, y_prob)
    f1s = [2*p*r/(p+r) if (p+r) > 0 else 0 for p, r in zip(precs[:-1], recs[:-1])]
    best_idx = np.argmax(f1s)
    return threshs[best_idx], precs, recs, threshs

opt_thresh, precs, recs, threshs = find_optimal_threshold(y_test, y_prob)
print(f'Optimal threshold: {opt_thresh:.3f}')

# Plot
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

axes[0].plot(recs, precs[:-1], 'b-', lw=2)
axes[0].set_xlabel('Recall')
axes[0].set_ylabel('Precision')
axes[0].set_title('Precision-Recall Curve')
axes[0].axhline(0.9, color='r', ls='--', label='Precision target')
axes[0].axvline(0.95, color='g', ls='--', label='Recall target')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

f1s = [2*p*r/(p+r) if (p+r) > 0 else 0 for p, r in zip(precs[:-1], recs[:-1])]
axes[1].plot(threshs, f1s, 'g-', lw=2)
axes[1].axvline(opt_thresh, color='r', ls='--', label=f'Optimal: {opt_thresh:.2f}')
axes[1].set_xlabel('Threshold')
axes[1].set_ylabel('F1')
axes[1].set_title('F1 vs Threshold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 4.2 Decision Engine

In [None]:
class DecisionEngine:
    def __init__(self, remove_thresh=0.9, review_thresh=0.5):
        self.remove_thresh = remove_thresh
        self.review_thresh = review_thresh
        
    def decide(self, probs):
        decisions = []
        for p in probs:
            if p >= self.remove_thresh:
                decisions.append('REMOVE')
            elif p >= self.review_thresh:
                decisions.append('REVIEW')
            else:
                decisions.append('ALLOW')
        return decisions

engine = DecisionEngine()
decisions = engine.decide(y_prob)

from collections import Counter
counts = Counter(decisions)
print('Decision Distribution:')
for d, c in counts.items():
    print(f'  {d}: {c} ({c/len(decisions)*100:.1f}%)')

## 4.3 Error Analysis

In [None]:
y_pred_opt = (y_prob >= opt_thresh).astype(int)

cm = confusion_matrix(y_test, y_pred_opt)
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Pred Safe', 'Pred Violation'],
            yticklabels=['Actual Safe', 'Actual Violation'])
plt.title('Confusion Matrix')
plt.tight_layout()
plt.show()

tn, fp, fn, tp = cm.ravel()
print(f'False Positives (over-moderation): {fp} ({fp/(tn+fp)*100:.2f}%)')
print(f'False Negatives (missed): {fn} ({fn/(tp+fn)*100:.2f}%)')

---

# 5. Deployment

In [None]:
class ModerationService:
    def __init__(self, ensemble, decision_engine, feature_eng):
        self.ensemble = ensemble
        self.engine = decision_engine
        self.fe = feature_eng
        
    def moderate(self, content):
        import time
        start = time.time()
        
        text_feat = self.fe.tfidf.transform([content['text']])
        meta_df = pd.DataFrame([{
            'text_length': len(content['text']),
            'num_caps': sum(1 for c in content['text'] if c.isupper()),
            'hour_posted': content.get('hour', 12),
            'user_age_days': content.get('user_age', 365),
            'prior_violations': content.get('prior_violations', 0)
        }])
        meta_feat = self.fe.get_meta_features(meta_df)
        
        prob = self.ensemble.predict_proba(text_feat, meta_feat)[0]
        decision = self.engine.decide([prob])[0]
        latency = (time.time() - start) * 1000
        
        return {
            'content_id': content.get('id', 'unknown'),
            'probability': round(prob, 4),
            'decision': decision,
            'latency_ms': round(latency, 2)
        }

service = ModerationService(ensemble, engine, fe)

# Test
test_items = [
    {'id': '1', 'text': 'What a beautiful day for a walk'},
    {'id': '2', 'text': 'You are so stupid nobody likes you'},
    {'id': '3', 'text': 'CLICK HERE for FREE money NOW'},
]

print('Moderation Results:')
for item in test_items:
    result = service.moderate(item)
    print(f"  {result['content_id']}: {result['decision']} (p={result['probability']:.3f})")

---

# 6. Monitoring

In [None]:
class ModerationMonitor:
    def __init__(self):
        self.logs = []
        
    def log(self, content_id, decision, prob, latency, actual=None):
        self.logs.append({
            'timestamp': datetime.now(),
            'content_id': content_id,
            'decision': decision,
            'prob': prob,
            'latency': latency,
            'actual': actual
        })
    
    def metrics(self):
        if not self.logs:
            return {}
        df = pd.DataFrame(self.logs)
        return {
            'total': len(df),
            'decisions': df['decision'].value_counts().to_dict(),
            'latency_p50': df['latency'].quantile(0.5),
            'latency_p99': df['latency'].quantile(0.99)
        }

monitor = ModerationMonitor()

# Simulate
test_df = df.iloc[test_idx]
for _, row in test_df.sample(500).iterrows():
    result = service.moderate({'id': row['content_id'], 'text': row['text']})
    monitor.log(result['content_id'], result['decision'], result['probability'],
               result['latency_ms'], row['is_violation'])

m = monitor.metrics()
print('Monitoring Metrics:')
print(f"  Total: {m['total']}")
print(f"  Decisions: {m['decisions']}")
print(f"  Latency p50: {m['latency_p50']:.2f}ms")
print(f"  Latency p99: {m['latency_p99']:.2f}ms")

---

# 7. Summary

## Key Takeaways

1. **Requirements**: Define content categories, scale, latency, and accuracy targets
2. **Class Imbalance**: Most content is safe (85%+), use balanced class weights
3. **Multi-Level Decisions**: REMOVE/REVIEW/ALLOW based on probability thresholds
4. **Threshold Tuning**: Balance precision (avoid over-moderation) and recall (catch violations)
5. **Monitoring**: Track false positives/negatives, latency, and decision distribution

In [None]:
print("""
7-Step Framework Applied to Content Moderation
==============================================

Step 1: Requirements
  - Categories: hate_speech, harassment, spam, violence, safe
  - Scale: 500M posts/day, 50K QPS
  - Latency: p99 < 100ms
  - Precision >= 0.90, Recall >= 0.95

Step 2: Problem Framing
  - Binary classification (violation vs safe)
  - Multi-class for specific categories
  - Multi-modal (text + images)

Step 3: Data Preparation
  - TF-IDF text features
  - Metadata features (user history, text stats)
  - Time-based split for evaluation

Step 4: Model Development
  - Text classifier (Logistic Regression)
  - Metadata classifier (Gradient Boosting)
  - Ensemble combining both

Step 5: Evaluation
  - Precision-Recall tradeoff
  - Threshold optimization
  - Error analysis (FP/FN)

Step 6: Deployment
  - Real-time moderation service
  - Multi-level decision engine
  - Human review queue

Step 7: Monitoring
  - Decision distribution
  - Latency tracking
  - False positive/negative rates
""")

print('Tutorial 20 Complete!')