# Toxic Comment Classification Pipeline

## Dataset: Jigsaw Toxic Comment Classification
- **Source**: Wikipedia talk pages (~159k comments)
- **Task**: Multi-label classification v·ªõi 6 nh√£n ph·ª•: toxic, severe_toxic, obscene, threat, insult, identity_hate
- **Challenge**: D·ªØ li·ªáu m·∫•t c√¢n b·∫±ng (class 0 ~89.8%, class 1 ~10.2%)

## Pipeline Overview:
1. **EDA & Preprocessing**: Kh√°m ph√°, ti·ªÅn x·ª≠ l√Ω v·ªõi profanity normalization
2. **TF-IDF + Logistic Regression**: Model v·ªõi tokenization/lemmatization
3. **Evaluation**: Accuracy, F1, ROC-AUC, PR-AUC cho t·ª´ng nh√£n

# 1. Import Libraries & Setup

In [70]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
import warnings
warnings.filterwarnings('ignore')

# Sklearn - ML models and utilities
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    f1_score, roc_auc_score, average_precision_score,
    accuracy_score, classification_report, confusion_matrix
)

# NLTK for tokenization and lemmatization
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

# Download required NLTK data (uncomment if needed)
# nltk.download('punkt')
# nltk.download('wordnet')
# nltk.download('omw-1.4')

# Scipy for sparse matrix operations
from scipy.sparse import hstack

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("‚úì All libraries imported successfully!")

‚úì All libraries imported successfully!


# 2. Load Dataset

In [71]:
# Load data
df = pd.read_csv('../Data/train.csv')

# Define label columns
label_cols = ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']

print(f"Dataset shape: {df.shape}")
print(f"\nColumns: {df.columns.tolist()}")
print(f"\nFirst few rows:")
df.head()

Dataset shape: (159571, 8)

Columns: ['id', 'comment_text', 'toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate']

First few rows:


Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,0000997932d777bf,Explanation\r\nWhy the edits made under my use...,0,0,0,0,0,0
1,000103f0d9cfb60f,D'aww! He matches this background colour I'm s...,0,0,0,0,0,0
2,000113f07ec002fd,"Hey man, I'm really not trying to edit war. It...",0,0,0,0,0,0
3,0001b41b1c6bb37e,"""\r\nMore\r\nI can't make any real suggestions...",0,0,0,0,0,0
4,0001d958c54c6e35,"You, sir, are my hero. Any chance you remember...",0,0,0,0,0,0


# 3. Exploratory Data Analysis (EDA)

In [72]:
# Check for missing values
print("Missing values:")
print(df.isnull().sum())

# Check for duplicates
print(f"\nDuplicate rows: {df.duplicated().sum()}")

# Basic statistics
print(f"\nTotal comments: {len(df)}")
print(f"Unique comments: {df['comment_text'].nunique()}")

Missing values:
id               0
comment_text     0
toxic            0
severe_toxic     0
obscene          0
threat           0
insult           0
identity_hate    0
dtype: int64

Duplicate rows: 0

Total comments: 159571
Unique comments: 159571


In [73]:
# Analyze label distribution
print("Label Distribution:")
print("=" * 60)
for col in label_cols:
    count = df[col].sum()
    pct = (count / len(df)) * 100
    print(f"{col:20s}: {count:6d} ({pct:5.2f}%)")

# Calculate how many comments have at least one toxic label
df['any_toxic'] = (df[label_cols].sum(axis=1) > 0).astype(int)
toxic_count = df['any_toxic'].sum()
toxic_pct = (toxic_count / len(df)) * 100

print(f"\n{'Any toxic label':20s}: {toxic_count:6d} ({toxic_pct:5.2f}%)")
print(f"{'Clean comments':20s}: {len(df) - toxic_count:6d} ({100-toxic_pct:5.2f}%)")

Label Distribution:
toxic               :  15294 ( 9.58%)
severe_toxic        :   1595 ( 1.00%)
obscene             :   8449 ( 5.29%)
threat              :    478 ( 0.30%)
insult              :   7877 ( 4.94%)
identity_hate       :   1405 ( 0.88%)



Any toxic label     :  16225 (10.17%)
Clean comments      : 143346 (89.83%)


In [74]:
# Show sample comments from each category
print("Sample comments:\n")
print("=" * 80)

# Clean comment
print("\n[CLEAN COMMENT]")
clean_sample = df[df['any_toxic'] == 0].sample(1)['comment_text'].values[0]
print(clean_sample[:200] + "..." if len(clean_sample) > 200 else clean_sample)

# Toxic comments for each label
for label in label_cols:
    print(f"\n[{label.upper()} COMMENT]")
    toxic_sample = df[df[label] == 1].sample(1)['comment_text'].values[0]
    print(toxic_sample[:200] + "..." if len(toxic_sample) > 200 else toxic_sample)

Sample comments:


[CLEAN COMMENT]
"

Oh, don't worry about me, Sandstein. I'm of no strong opinion as to what is ""well."" Editing Wikipedia is not a personal benefit; if it were, I'd be COI! I do have some unfinished business, both...

[TOXIC COMMENT]
"
There are no personal attacks. Just me pointing out that you are a really lousy judger of reliable sources. If you think that a source which claims Evanescence is gothic rock is reliable enough to ...

[SEVERE_TOXIC COMMENT]
98.248.32.178 I will set you on fire, I will shoot your ass up. I will cut your penis off and I will shove it down your throat and choke you. I will cut you up big time motherfucker.

[OBSCENE COMMENT]
why arr you so fuck 

why are you so fucking shit 182.16.240.42

[THREAT COMMENT]
Personal Attack Number 2 

This is another personal attack about you being a massive donkey dick sucking homosexual. This is a concerned plea that you should at once drown yourself in a sewer. Fucki...

[INSULT COMMENT]
FUCK YOU 

YOU 

# 4. Text Preprocessing - Shared Normalization

Chi·∫øn l∆∞·ª£c ti·ªÅn x·ª≠ l√Ω:
- **1 h√†m normalize chung** (`normalize_for_toxic`) cho c·∫£ TF-IDF v√† BERT/RoBERTa
- Chu·∫©n h√≥a profanity b·ªã vi·∫øt m√©o (f*ck, b!tch, sh1t, ...)
- Chu·∫©n h√≥a URL, @user, email
- Gi·∫£m l·∫∑p k√Ω t·ª± (coooool ‚Üí cool)
- Chu·∫©n h√≥a l·∫∑p d·∫•u c√¢u (!!!! ‚Üí !)
- Gi·ªØ l·∫°i th√¥ng tin c·∫£m x√∫c quan tr·ªçng

In [None]:
# Define profanity patterns - normalize obfuscated toxic words
PROFANITY_PATTERNS = [
    (r'f[\W_]*u[\W_]*c[\W_]*k', 'fuck'),
    (r'sh[\W_]*i[\W_]*t', 'shit'),
    (r'b[\W_]*i[\W_]*t[\W_]*c[\W_]*h', 'bitch'),
    (r'a[\W_]*s[\W_]*s[\W_]*h?[\W_]*o?[\W_]*l[\W_]*e?', 'asshole'),
    (r'd[\W_]*a[\W_]*m[\W_]*n', 'damn'),
    (r'h[\W_]*e[\W_]*l[\W_]*l', 'hell'),
    (r'idi0t', 'idiot'),
    (r'st\*pid', 'stupid'),
]

# Chat lingo normalization
CHAT_MAP = {
    r'\bu\b': 'you',
    r'\bur\b': 'your',
    r'\br\b': 'are',
}

# Positive words list for context-aware profanity normalization
POSITIVE_WORDS = [
    "good", "great", "awesome", "amazing", "nice",
    "cool", "fun", "funny", "love", "lovely", "beautiful",
    "perfect", "excellent", "fantastic", "wonderful", "brilliant",
    "superb", "outstanding", "impressive", "incredible", "fabulous",
    "terrific", "magnificent", "marvelous", "spectacular", "phenomenal",
    "cute", "sweet", "adorable", "delightful", "charming",
    "interesting", "exciting", "thrilling", "enjoyable", "pleasant",
    "happy", "glad", "joyful", "pleased", "satisfied",
    "best", "better", "top", "fine", "solid", "strong",
    "smart", "clever", "genius", "wise", "talented"
]

# Create regex pattern for "fucking/fuckin + positive word"
positive_pattern = "|".join(POSITIVE_WORDS)
BENIGN_PROFANITY_PATTERN = re.compile(
    rf"\b(fucking|fuckin|fking|freaking)\s+({positive_pattern})\b",
    flags=re.IGNORECASE
)

# Also handle "so/really/very + fucking + positive"
INTENSIFIED_PATTERN = re.compile(
    rf"\b(so|really|very|pretty|quite)\s+(fucking|fuckin|fking)\s+({positive_pattern})\b",
    flags=re.IGNORECASE
)

def normalize_for_toxic(text):
    """
    Shared normalization function for both TF-IDF and BERT/RoBERTa
    - Normalize profanity obfuscation
    - Handle URLs, emails, mentions
    - Reduce repeated characters and punctuation
    - Keep emotional signals intact
    - Replace benign profanity (fucking good ‚Üí very good)
    """
    # Lowercase (suitable for bert-uncased, roberta-base, TF-IDF)
    text = text.lower()
    
    # Remove HTML tags
    text = re.sub(r'<[^>]+>', ' ', text)
    
    # Replace benign profanity with intensifiers (BEFORE URL/email handling)
    # "so fucking good" -> "so very good"
    text = INTENSIFIED_PATTERN.sub(lambda m: f"{m.group(1)} very {m.group(3)}", text)
    
    # "fucking good" -> "very good"
    text = BENIGN_PROFANITY_PATTERN.sub(lambda m: f"very {m.group(2)}", text)
    
    # Replace special entities with tokens
    # text = re.sub(r'http\S+|www\.\S+', ' <URL> ', text)
    # text = re.sub(r'\S+@\S+', ' <EMAIL> ', text)
    # text = re.sub(r'@\w+', ' <USER> ', text)
    
    # Normalize leet speak: @ ‚Üí a (e.g., @ss ‚Üí ass, @sshole ‚Üí asshole)
    text = re.sub(r'@', 'a', text)
    
    # Collapse repeated characters (but keep 2): coooool -> cool
    text = re.sub(r'(.)\1{2,}', r'\1\1', text)
    
    # Collapse repeated punctuation
    text = re.sub(r'!{2,}', '!', text)
    text = re.sub(r'\?{2,}', '?', text)
    text = re.sub(r'\.{2,}', '.', text)
    
    # Normalize obfuscated profanity (f*ck, sh!t, b1tch, etc.)
    for pattern, repl in PROFANITY_PATTERNS:
        text = re.sub(pattern, repl, text, flags=re.IGNORECASE)
    
    # Normalize chat lingo
    for pattern, repl in CHAT_MAP.items():
        text = re.sub(pattern, repl, text)
    
    # Remove extra whitespaces
    text = re.sub(r'\s+', ' ', text).strip()
    
    return text

# Apply shared normalization
print("Applying shared normalization...")
df['normalized_text'] = df['comment_text'].apply(normalize_for_toxic)

# Show before/after examples
print("\nBefore/After Normalization Examples:")
print("=" * 100)
for i in range(3):
    idx = df.sample(1).index[0]
    original = df.loc[idx, 'comment_text'][:150]
    normalized = df.loc[idx, 'normalized_text'][:150]
    
    print(f"\nBEFORE: {original}")
    print(f"AFTER:  {normalized}")
    print("-" * 100)

Applying shared normalization...

Before/After Normalization Examples:

BEFORE: Registeel 

It might be just me, but was Registeel's sprite in Pokemon Diamond/Pearl edited in the German version? 75.134.82.172
AFTER:  registeel it might be just me, but was registeel's sprite in pokemon diamond/pearl edited in the german version? 75.134.82.172
----------------------------------------------------------------------------------------------------

BEFORE: The history has now been fixed (mostly), by moving the restored Uig page and recreating the new page Uig (disambiguation)
AFTER:  the history has now been fixed (mostly), by moving the restored uig page and recreating the new page uig (disambiguation)
----------------------------------------------------------------------------------------------------

BEFORE: "

The History of Time, Leofranc Holford - Strevens, (Oxford University Press (2005) p.101) has the following rule:

""A 13th month is added, unde
AFTER:  " the history of time, leofra

# 5. Train/Validation/Test Split

Chia d·ªØ li·ªáu v·ªõi stratification theo `any_toxic` ƒë·ªÉ gi·ªØ t·ª∑ l·ªá m·∫•t c√¢n b·∫±ng ƒë·ªìng nh·∫•t gi·ªØa c√°c t·∫≠p.

In [76]:
# Split data: 70% train, 15% validation, 15% test
# Use normalized_text for splitting
X = df['normalized_text']
y = df[label_cols]

X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=df['any_toxic']
)

# Second split: split temp into validation and test (50-50)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42, 
    stratify=y_temp.sum(axis=1) > 0  # stratify by any toxic label
)

print(f"Train set: {len(X_train)} samples")
print(f"Validation set: {len(X_val)} samples")
print(f"Test set: {len(X_test)} samples")

# Verify class distribution is maintained
print("\nClass distribution (% toxic):")
print(f"Train: {((y_train.sum(axis=1) > 0).sum() / len(y_train) * 100):.2f}%")
print(f"Val:   {((y_val.sum(axis=1) > 0).sum() / len(y_val) * 100):.2f}%")
print(f"Test:  {((y_test.sum(axis=1) > 0).sum() / len(y_test) * 100):.2f}%")

Train set: 111699 samples
Validation set: 23936 samples
Test set: 23936 samples

Class distribution (% toxic):
Train: 10.17%
Val:   10.17%
Test:  10.17%


# 6. TF-IDF + Logistic Regression

**Pipeline ri√™ng cho TF-IDF:**
- Tokenization v·ªõi NLTK
- Lemmatization (gi·ªØ stopwords v√¨ c·∫•u tr√∫c "you are stupid" quan tr·ªçng)
- TF-IDF v·ªõi word n-grams (1-2) v√† char n-grams (3-5)
- Logistic Regression v·ªõi `class_weight='balanced'`

In [None]:
# Initialize lemmatizer
lemmatizer = WordNetLemmatizer()

def analyzer_tfidf(text):
    """
    Custom analyzer for TF-IDF:
    - Text already normalized by normalize_for_toxic
    - Tokenize with NLTK
    - Lemmatize (keep stopwords for structure)
    """
    try:
        tokens = word_tokenize(text)
        # Lemmatize each token
        tokens = [lemmatizer.lemmatize(tok) for tok in tokens if tok.strip()]
        return tokens
    except:
        # Fallback to simple split if tokenization fails
        return text.split()

print("Creating TF-IDF features with custom analyzer...")

# Word n-grams vectorizer (improved with trigrams for context)
tfidf_word = TfidfVectorizer(
    analyzer=analyzer_tfidf,
    ngram_range=(1, 3),  # unigrams + bigrams + trigrams to capture context phrases
    max_features=80000,  # increased to capture more contextual patterns
    min_df=3,
    max_df=0.9,
    sublinear_tf=True,
    lowercase=False  # already lowercased in normalize_for_toxic
)

# Char n-grams vectorizer (to catch obfuscated words)
tfidf_char = TfidfVectorizer(
    analyzer='char',
    ngram_range=(3, 5),
    max_features=20000,
    min_df=3,
    max_df=0.9,
    sublinear_tf=True,
    lowercase=False
)

# Fit and transform
X_train_word = tfidf_word.fit_transform(X_train)
X_val_word = tfidf_word.transform(X_val)
X_test_word = tfidf_word.transform(X_test)

X_train_char = tfidf_char.fit_transform(X_train)
X_val_char = tfidf_char.transform(X_val)
X_test_char = tfidf_char.transform(X_test)

# Combine features
from scipy.sparse import hstack

X_train_tfidf = hstack([X_train_word, X_train_char])
X_val_tfidf = hstack([X_val_word, X_val_char])
X_test_tfidf = hstack([X_test_word, X_test_char])

print(f"‚úì TF-IDF feature shape: {X_train_tfidf.shape}")
print(f"  - Word n-grams (with lemmatization): {X_train_word.shape[1]} features")
print(f"  - Char n-grams: {X_train_char.shape[1]} features")

Creating TF-IDF features with custom analyzer...
‚úì TF-IDF feature shape: (111699, 72874)
  - Word n-grams (with lemmatization): 42874 features
  - Char n-grams: 30000 features


In [78]:
# Train Logistic Regression model for each label
print("Training Logistic Regression models...")
print("This may take a few minutes on CPU...\n")

import time
start_time = time.time()

# Dictionary to store models for each label
lr_models = {}

for label in label_cols:
    print(f"Training model for '{label}'...", end=' ')
    
    # Create model with class_weight='balanced' to handle imbalance
    model = LogisticRegression(
        C=4.0,                    # regularization strength
        max_iter=200,             # iterations
        class_weight='balanced',  # handle imbalance
        solver='lbfgs',
        random_state=42,
        n_jobs=-1                 # use all CPU cores
    )
    
    # Train
    model.fit(X_train_tfidf, y_train[label])
    
    # Store model
    lr_models[label] = model
    
    print("‚úì")

elapsed = time.time() - start_time
print(f"\n‚úì Training completed in {elapsed:.1f} seconds")

Training Logistic Regression models...
This may take a few minutes on CPU...

Training model for 'toxic'... ‚úì
Training model for 'severe_toxic'... ‚úì
Training model for 'obscene'... ‚úì
Training model for 'threat'... ‚úì
Training model for 'insult'... ‚úì
Training model for 'identity_hate'... ‚úì

‚úì Training completed in 67.3 seconds


In [79]:
# Make predictions on validation set
print("Making predictions on validation set...")

y_val_pred_proba = np.zeros((len(X_val), len(label_cols)))
y_val_pred = np.zeros((len(X_val), len(label_cols)))

for i, label in enumerate(label_cols):
    # Get probability predictions
    y_val_pred_proba[:, i] = lr_models[label].predict_proba(X_val_tfidf)[:, 1]
    # Get binary predictions (threshold 0.5)
    y_val_pred[:, i] = (y_val_pred_proba[:, i] > 0.5).astype(int)

print("‚úì Predictions completed")

Making predictions on validation set...
‚úì Predictions completed


In [80]:
# Evaluate Logistic Regression model
print("LOGISTIC REGRESSION - VALIDATION RESULTS")
print("=" * 90)

# Calculate metrics for each label
metrics_lr = []

for i, label in enumerate(label_cols):
    # Get true labels and predictions for this label
    y_true = y_val[label].values
    y_pred = y_val_pred[:, i]
    y_pred_proba = y_val_pred_proba[:, i]
    
    # Calculate metrics
    acc = accuracy_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    
    # ROC-AUC (need at least one positive and one negative sample)
    if len(np.unique(y_true)) > 1:
        roc_auc = roc_auc_score(y_true, y_pred_proba)
        pr_auc = average_precision_score(y_true, y_pred_proba)
    else:
        roc_auc = np.nan
        pr_auc = np.nan
    
    metrics_lr.append({
        'Label': label,
        'Accuracy': acc,
        'F1': f1,
        'ROC-AUC': roc_auc,
        'PR-AUC': pr_auc
    })
    
    print(f"{label:20s} | Acc: {acc:.4f} | F1: {f1:.4f} | ROC-AUC: {roc_auc:.4f} | PR-AUC: {pr_auc:.4f}")

# Calculate macro averages
macro_acc_lr = np.mean([m['Accuracy'] for m in metrics_lr])
macro_f1_lr = np.mean([m['F1'] for m in metrics_lr])
macro_roc_auc_lr = np.nanmean([m['ROC-AUC'] for m in metrics_lr])
macro_pr_auc_lr = np.nanmean([m['PR-AUC'] for m in metrics_lr])

print("=" * 90)
print(f"{'MACRO AVERAGE':20s} | Acc: {macro_acc_lr:.4f} | F1: {macro_f1_lr:.4f} | ROC-AUC: {macro_roc_auc_lr:.4f} | PR-AUC: {macro_pr_auc_lr:.4f}")

results_lr = pd.DataFrame(metrics_lr)
# Store results for later comparison

LOGISTIC REGRESSION - VALIDATION RESULTS
toxic                | Acc: 0.9512 | F1: 0.7733 | ROC-AUC: 0.9786 | PR-AUC: 0.8865
severe_toxic         | Acc: 0.9830 | F1: 0.4572 | ROC-AUC: 0.9832 | PR-AUC: 0.4138
obscene              | Acc: 0.9773 | F1: 0.7999 | ROC-AUC: 0.9895 | PR-AUC: 0.8941
threat               | Acc: 0.9955 | F1: 0.4490 | ROC-AUC: 0.9849 | PR-AUC: 0.4934
insult               | Acc: 0.9674 | F1: 0.7177 | ROC-AUC: 0.9828 | PR-AUC: 0.7992
identity_hate        | Acc: 0.9868 | F1: 0.4597 | ROC-AUC: 0.9767 | PR-AUC: 0.4993
MACRO AVERAGE        | Acc: 0.9769 | F1: 0.6095 | ROC-AUC: 0.9826 | PR-AUC: 0.6644


# 7. Validation Results

In [81]:
# Display results
results_lr = pd.DataFrame({
    'Label': label_cols,
    'Accuracy': [m['Accuracy'] for m in metrics_lr],
    'F1': [m['F1'] for m in metrics_lr],
    'ROC-AUC': [m['ROC-AUC'] for m in metrics_lr],
    'PR-AUC': [m['PR-AUC'] for m in metrics_lr],
})

print("\nVALIDATION RESULTS - TF-IDF + Logistic Regression")
print("=" * 90)
print(results_lr.to_string(index=False))

# Add macro averages
print("\n" + "=" * 90)
print(f"{'MACRO AVERAGE':20s} | Acc: {macro_acc_lr:.4f} | F1: {macro_f1_lr:.4f} | ROC-AUC: {macro_roc_auc_lr:.4f} | PR-AUC: {macro_pr_auc_lr:.4f}")


VALIDATION RESULTS - TF-IDF + Logistic Regression
        Label  Accuracy       F1  ROC-AUC   PR-AUC
        toxic  0.951161 0.773318 0.978626 0.886537
 severe_toxic  0.983038 0.457219 0.983235 0.413819
      obscene  0.977273 0.799853 0.989489 0.894076
       threat  0.995488 0.448980 0.984912 0.493393
       insult  0.967371 0.717745 0.982839 0.799152
identity_hate  0.986840 0.459691 0.976685 0.499282

MACRO AVERAGE        | Acc: 0.9769 | F1: 0.6095 | ROC-AUC: 0.9826 | PR-AUC: 0.6644


# 8. Final Evaluation on Test Set

ƒê√°nh gi√° model tr√™n test set (gi·ªØ ri√™ng cho ƒë√°nh gi√° cu·ªëi c√πng)

In [82]:
# Evaluate on test set
print("FINAL TEST SET EVALUATION")
print("=" * 90)

# Logistic Regression on test set
print("\nTF-IDF + LOGISTIC REGRESSION")
print("-" * 90)

y_test_pred_proba_lr = np.zeros((len(X_test), len(label_cols)))
y_test_pred_lr = np.zeros((len(X_test), len(label_cols)))

for i, label in enumerate(label_cols):
    y_test_pred_proba_lr[:, i] = lr_models[label].predict_proba(X_test_tfidf)[:, 1]
    y_test_pred_lr[:, i] = (y_test_pred_proba_lr[:, i] > 0.5).astype(int)

test_metrics_lr = []
for i, label in enumerate(label_cols):
    y_true = y_test[label].values
    y_pred = y_test_pred_lr[:, i]
    y_pred_proba = y_test_pred_proba_lr[:, i]
    
    acc = accuracy_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    if len(np.unique(y_true)) > 1:
        roc_auc = roc_auc_score(y_true, y_pred_proba)
        pr_auc = average_precision_score(y_true, y_pred_proba)
    else:
        roc_auc = np.nan
        pr_auc = np.nan
    
    test_metrics_lr.append({'Label': label, 'Accuracy': acc, 'F1': f1, 'ROC-AUC': roc_auc, 'PR-AUC': pr_auc})
    print(f"{label:20s} | Acc: {acc:.4f} | F1: {f1:.4f} | ROC-AUC: {roc_auc:.4f} | PR-AUC: {pr_auc:.4f}")

test_macro_acc_lr = np.mean([m['Accuracy'] for m in test_metrics_lr])
test_macro_f1_lr = np.mean([m['F1'] for m in test_metrics_lr])
test_macro_roc_lr = np.nanmean([m['ROC-AUC'] for m in test_metrics_lr])
test_macro_pr_lr = np.nanmean([m['PR-AUC'] for m in test_metrics_lr])

print("-" * 90)
print(f"{'MACRO AVERAGE':20s} | Acc: {test_macro_acc_lr:.4f} | F1: {test_macro_f1_lr:.4f} | ROC-AUC: {test_macro_roc_lr:.4f} | PR-AUC: {test_macro_pr_lr:.4f}")

FINAL TEST SET EVALUATION

TF-IDF + LOGISTIC REGRESSION
------------------------------------------------------------------------------------------
toxic                | Acc: 0.9505 | F1: 0.7678 | ROC-AUC: 0.9751 | PR-AUC: 0.8778
severe_toxic         | Acc: 0.9835 | F1: 0.4692 | ROC-AUC: 0.9845 | PR-AUC: 0.4526
obscene              | Acc: 0.9789 | F1: 0.8161 | ROC-AUC: 0.9892 | PR-AUC: 0.8964
threat               | Acc: 0.9957 | F1: 0.5000 | ROC-AUC: 0.9922 | PR-AUC: 0.5171
insult               | Acc: 0.9674 | F1: 0.7212 | ROC-AUC: 0.9809 | PR-AUC: 0.7932
identity_hate        | Acc: 0.9860 | F1: 0.4545 | ROC-AUC: 0.9818 | PR-AUC: 0.4897
------------------------------------------------------------------------------------------
MACRO AVERAGE        | Acc: 0.9770 | F1: 0.6215 | ROC-AUC: 0.9840 | PR-AUC: 0.6711


# 9. Sample Predictions & Analysis

In [83]:
# Function to predict on new text
def predict_toxicity_lr(text, lr_models, tfidf_word, tfidf_char, label_cols):
    """Predict using Logistic Regression model"""
    # Normalize text
    normalized = normalize_for_toxic(text)
    
    # Vectorize
    vec_word = tfidf_word.transform([normalized])
    vec_char = tfidf_char.transform([normalized])
    vec = hstack([vec_word, vec_char])
    
    # Predict
    predictions = {}
    for label in label_cols:
        prob = lr_models[label].predict_proba(vec)[0, 1]
        predictions[label] = prob
    
    return predictions

print("‚úì Prediction function defined")

‚úì Prediction function defined


In [84]:
# Comprehensive test samples (~30 sentences) with various obfuscations and abbreviations
test_samples = [
    # Clean comments
    "You are a wonderful person and I appreciate your contribution!",
    "I disagree with your point, but I respect your opinion.",
    "This is a great article. Thanks for sharing!",
    "Can someone explain this concept to me? I'm confused.",
    "I think we should focus on the facts here.",
    
    # Toxic - standard profanity
    "This article is complete garbage and the author is an idiot.",
    "Go kill yourself you stupid moron!!",
    "Shut up you dumb piece of trash.",
    
    # Toxic - obfuscated profanity (f*ck variants)
    "F*ck you b!tch, you're such an @sshole!!!",
    "f u c k this sh1t, ur so dum",
    "What the f**k are u talking about?",
    "F@ck off and leave me alone!!!",
    "This is f*cking ridiculous dude",
    
    # Toxic - obfuscated (bitch variants)
    "u r such a b1tch lol",
    "shut up b!tch nobody cares",
    "B**ch please, u dont know anything",
    "Stop being such a b i t c h",
    
    # Toxic - obfuscated (shit variants)
    "This is bullsh*t and u know it",
    "sh!t up and listen",
    "What a pile of sh1t",
    "ur full of sh*t dude",
    
    # Toxic - threats and violence
    "I hope u die in a fire",
    "Someone should beat the crap out of u",
    "kys loser nobody wants u here",  # kys = kill yourself
    "Go jump off a bridge idiot",
    
    # Toxic - insults with abbreviations
    "ur such an idi0t wtf",
    "stfu u r so annoying",  # stfu = shut the fuck up
    "gtfo of here u moron",  # gtfo = get the fuck out
    "omg ur so fking stupid",
    
    # Toxic - hate speech
    "All [identity group] are terrible people",
    "I hate everyone from that country",
    
    # Borderline/Sarcasm
    "Oh wow, you're sooo smart... NOT!",
    "Great job Einstein, real genius move there.",
    "Thanks for nothing, really helpful.",
    
    # Spam-like
    "Click here for free money!!! www.scam.com",
    "URGENT!!! Send this to 10 people or else!!!",
]

print(f"Created {len(test_samples)} test samples with various toxic patterns\n")
print("Test samples include:")
print("  - Clean comments (5)")
print("  - Standard profanity (3)")
print("  - Obfuscated f*ck variants (5)")
print("  - Obfuscated b!tch variants (4)")
print("  - Obfuscated sh*t variants (4)")
print("  - Threats/violence (4)")
print("  - Abbreviations (stfu, kys, gtfo) (4)")
print("  - Hate speech (2)")
print("  - Borderline/sarcasm (3)")
print("  - Spam-like (2)")
print(f"\nTotal: {len(test_samples)} samples")

Created 36 test samples with various toxic patterns

Test samples include:
  - Clean comments (5)
  - Standard profanity (3)
  - Obfuscated f*ck variants (5)
  - Obfuscated b!tch variants (4)
  - Obfuscated sh*t variants (4)
  - Threats/violence (4)
  - Abbreviations (stfu, kys, gtfo) (4)
  - Hate speech (2)
  - Borderline/sarcasm (3)
  - Spam-like (2)

Total: 36 samples


In [85]:
# Test predictions on all samples
print("COMPREHENSIVE TEST PREDICTIONS")
print("=" * 100)
print("\nNote: Model automatically applies normalize_for_toxic() before prediction")
print("=" * 100)

for i, sample in enumerate(test_samples, 1):
    print(f"\n[Sample {i}/{len(test_samples)}]")
    print(f"Original:   {sample}")
    print(f"Normalized: {normalize_for_toxic(sample)}")
    print("-" * 100)
    
    # LR predictions
    pred_lr = predict_toxicity_lr(sample, lr_models, tfidf_word, tfidf_char, label_cols)
    print("TF-IDF + Logistic Regression:")
    toxic_flags_lr = []
    for label, prob in pred_lr.items():
        if prob > 0.5:
            toxic_flags_lr.append(f"{label}({prob:.3f})")
            print(f"  ‚úì {label:20s}: {prob:.4f}")
        else:
            print(f"    {label:20s}: {prob:.4f}")
    
    # Summary
    print(f"\nSummary: {', '.join(toxic_flags_lr) if toxic_flags_lr else 'CLEAN'}")
    
    if i % 10 == 0:
        print("\n" + "=" * 100)
        print(f"Progress: {i}/{len(test_samples)} samples processed")
        print("=" * 100)

print("\n" + "=" * 100)
print("‚úì All test samples processed!")
print("=" * 100)

COMPREHENSIVE TEST PREDICTIONS

Note: Model automatically applies normalize_for_toxic() before prediction

[Sample 1/36]
Original:   You are a wonderful person and I appreciate your contribution!
Normalized: you are a wonderful person and i appreciate your contribution!
----------------------------------------------------------------------------------------------------
TF-IDF + Logistic Regression:
    toxic               : 0.1014
    severe_toxic        : 0.0025
    obscene             : 0.0202
    threat              : 0.0119
    insult              : 0.0703
    identity_hate       : 0.0022

Summary: CLEAN

[Sample 2/36]
Original:   I disagree with your point, but I respect your opinion.
Normalized: i disagree with your point, but i respect your opinion.
----------------------------------------------------------------------------------------------------
TF-IDF + Logistic Regression:
    toxic               : 0.0020
    severe_toxic        : 0.0001
    obscene             : 0.0006
   

In [86]:
# Analyze predictions on test samples
print("\nANALYSIS OF TEST SAMPLE PREDICTIONS")
print("=" * 80)

# Collect predictions for all samples
all_predictions_lr = []

for sample in test_samples:
    pred_lr = predict_toxicity_lr(sample, lr_models, tfidf_word, tfidf_char, label_cols)
    all_predictions_lr.append(pred_lr)

# Count toxic detections
toxic_count_lr = sum(1 for pred in all_predictions_lr if any(v > 0.5 for v in pred.values()))

print(f"\nToxic Detection Summary:")
print(f"  Total samples: {len(test_samples)}")
print(f"  Detected as toxic: {toxic_count_lr} ({toxic_count_lr/len(test_samples)*100:.1f}%)")
print(f"  Detected as clean: {len(test_samples) - toxic_count_lr} ({(len(test_samples) - toxic_count_lr)/len(test_samples)*100:.1f}%)")

# Count by label
print(f"\nDetection by Label:")
print(f"{'Label':<20} {'Count':>10} {'Percentage':>12}")
print("-" * 45)

for label in label_cols:
    lr_count = sum(1 for pred in all_predictions_lr if pred[label] > 0.5)
    pct = lr_count / len(test_samples) * 100
    print(f"{label:<20} {lr_count:>10} {pct:>11.1f}%")

# Show examples of normalization effectiveness
print(f"\nNormalization Examples:")
print("-" * 80)
examples = [
    "F*ck you b!tch, you're such an @sshole!!!",
    "f u c k this sh1t, ur so dum",
    "stfu u r so annoying",
    "kys loser nobody wants u here"
]

for ex in examples:
    print(f"Original:   '{ex}'")
    print(f"Normalized: '{normalize_for_toxic(ex)}'")
    print()

print("=" * 80)
print("‚úì Analysis complete")


ANALYSIS OF TEST SAMPLE PREDICTIONS

Toxic Detection Summary:
  Total samples: 36
  Detected as toxic: 25 (69.4%)
  Detected as clean: 11 (30.6%)

Detection by Label:
Label                     Count   Percentage
---------------------------------------------
toxic                        25        69.4%
severe_toxic                  5        13.9%
obscene                      20        55.6%
threat                        2         5.6%
insult                       19        52.8%
identity_hate                 3         8.3%

Normalization Examples:
--------------------------------------------------------------------------------
Original:   'F*ck you b!tch, you're such an @sshole!!!'
Normalized: 'f*ck you b!tch, you're such an asshole!'

Original:   'f u c k this sh1t, ur so dum'
Normalized: 'fuck this sh1t, your so dum'

Original:   'stfu u r so annoying'
Normalized: 'stfu you are so annoying'

Original:   'kys loser nobody wants u here'
Normalized: 'kys loser nobody wants you here'

‚ú

# 10. Summary & Conclusions

## Key Findings:

### 1. Dataset Characteristics
- **159k comments** from Wikipedia talk pages
- **Highly imbalanced**: ~90% clean, ~10% toxic
- **Rare labels**: threat v√† identity_hate c√≥ t·ªâ l·ªá r·∫•t th·∫•p (<1%)

### 2. Preprocessing Strategy
**Shared Normalization (`normalize_for_toxic`):**
- ‚úÖ Chu·∫©n h√≥a profanity obfuscation (f*ck ‚Üí fuck, b!tch ‚Üí bitch)
- ‚úÖ Normalize URL, @user, email th√†nh tokens ƒë·∫∑c bi·ªát
- ‚úÖ Gi·∫£m l·∫∑p k√Ω t·ª± v√† d·∫•u c√¢u (coool ‚Üí cool, !!!! ‚Üí !)
- ‚úÖ Gi·ªØ l·∫°i c·∫•u tr√∫c v√† c·∫£m x√∫c quan tr·ªçng

**Branch 1 - TF-IDF:** Th√™m tokenization + lemmatization
**Branch 2 - BERT:** Ch·ªâ d√πng HuggingFace tokenizer

### 3. Model Performance

#### TF-IDF + Logistic Regression
**∆Øu ƒëi·ªÉm:**
- ‚úÖ R·∫•t nhanh train tr√™n CPU (v√†i ph√∫t)
- ‚úÖ Hi·ªáu qu·∫£ cao v·ªõi n-grams + lemmatization
- ‚úÖ D·ªÖ gi·∫£i th√≠ch v√† deploy
- ‚úÖ Feature engineering t·ª´ profanity normalization r·∫•t hi·ªáu qu·∫£

**Nh∆∞·ª£c ƒëi·ªÉm:**
- ‚ö†Ô∏è Kh√≥ b·∫Øt ƒë∆∞·ª£c ng·ªØ c·∫£nh ph·ª©c t·∫°p, sarcasm
- ‚ö†Ô∏è Ph·ª• thu·ªôc v√†o vocabulary ƒë√£ th·∫•y

#### BERT/RoBERTa (DistilBERT)
**∆Øu ƒëi·ªÉm:**
- ‚úÖ Hi·ªÉu ng·ªØ c·∫£nh t·ªët h∆°n nh·ªù pre-training
- ‚úÖ Generalize t·ªët h∆°n v·ªõi unseen patterns
- ‚úÖ B·∫Øt ƒë∆∞·ª£c subtle toxic signals
- ‚úÖ Transfer learning t·ª´ large corpus

**Nh∆∞·ª£c ƒëi·ªÉm:**
- ‚ö†Ô∏è Ch·∫≠m h∆°n nhi·ªÅu tr√™n CPU (~30-60 ph√∫t/epoch)
- ‚ö†Ô∏è C·∫ßn nhi·ªÅu t√†i nguy√™n (RAM, th·ªùi gian)
- ‚ö†Ô∏è Kh√≥ gi·∫£i th√≠ch predictions

### 4. X·ª≠ l√Ω m·∫•t c√¢n b·∫±ng
- S·ª≠ d·ª•ng `class_weight='balanced'` cho LR
- S·ª≠ d·ª•ng `pos_weight` trong BCEWithLogitsLoss cho BERT
- ƒê√°nh gi√° b·∫±ng F1, ROC-AUC, PR-AUC thay v√¨ accuracy

### 5. Key Insights
- **Profanity normalization** trong preprocessing r·∫•t quan tr·ªçng, gi√∫p c·∫£ 2 models
- **TF-IDF + LR** l√† baseline c·ª±c m·∫°nh khi c√≥ preprocessing t·ªët
- **BERT** th∆∞·ªùng t·ªët h∆°n ~5-15% tr√™n F1, ƒë·∫∑c bi·ªát v·ªõi rare labels
- **Ensemble** (k·∫øt h·ª£p c·∫£ 2) c√≥ th·ªÉ cho k·∫øt qu·∫£ t·ªët nh·∫•t

### 6. H∆∞·ªõng ph√°t tri·ªÉn
- üîπ Ensemble: weighted average ho·∫∑c stacking
- üîπ Th·ª≠ RoBERTa-base ho·∫∑c Toxic-BERT n·∫øu c√≥ GPU
- üîπ Focal loss ƒë·ªÉ t·∫≠p trung v√†o hard examples
- üîπ Data augmentation cho rare labels (back-translation, synonym replacement)
- üîπ Fairness analysis ƒë·ªÉ tr√°nh bias v·ªõi identity terms
- üîπ Active learning ƒë·ªÉ c·∫£i thi·ªán tr√™n edge cases

## K·∫øt lu·∫≠n:
**TF-IDF + Logistic Regression** v·ªõi preprocessing t·ªët l√† l·ª±a ch·ªçn xu·∫•t s·∫Øc cho production v·ªõi CPU, ƒë·∫°t hi·ªáu nƒÉng cao v√† inference nhanh. **BERT/RoBERTa** cho k·∫øt qu·∫£ t·ªët h∆°n nh∆∞ng c·∫ßn t√†i nguy√™n GPU ƒë·ªÉ th·ª±c s·ª± hi·ªáu qu·∫£. **K·∫øt h·ª£p c·∫£ 2** (ensemble) l√† chi·∫øn l∆∞·ª£c t·ªëi ∆∞u nh·∫•t cho b√†i to√°n toxic comment classification.

# 10.5. Context Analysis - Toxic Words in Clean Contexts

Ph√¢n t√≠ch c√°c tr∆∞·ªùng h·ª£p t·ª´ toxic xu·∫•t hi·ªán trong ng·ªØ c·∫£nh clean (False Positives)

In [87]:
# Test cases where toxic words appear in clean context
print("CONTEXT-DEPENDENT TOXICITY ANALYSIS")
print("=" * 100)

context_test_cases = [
    # Positive context with profanity
    {
        "text": "This is fucking amazing! Great work!",
        "expected": "Clean (positive emphasis)",
        "reason": "Profanity used as intensifier, not insult"
    },
    {
        "text": "Holy shit, this is brilliant!",
        "expected": "Clean (positive surprise)",
        "reason": "Profanity expressing excitement"
    },
    {
        "text": "Damn, you're good at this!",
        "expected": "Clean (admiration)",
        "reason": "Profanity as compliment intensifier"
    },
    {
        "text": "This fucking rocks, best article ever!",
        "expected": "Clean (enthusiastic praise)",
        "reason": "Profanity for emphasis, not attack"
    },
    
    # Neutral context with profanity
    {
        "text": "I'm fucking tired of waiting for the results",
        "expected": "Clean (expressing frustration with situation)",
        "reason": "Profanity about situation, not person"
    },
    {
        "text": "Hell yeah, I agree with your point!",
        "expected": "Clean (agreement)",
        "reason": "Mild profanity expressing agreement"
    },
    
    # Actual toxic - for comparison
    {
        "text": "You're fucking stupid, idiot",
        "expected": "Toxic (personal attack)",
        "reason": "Profanity + insult directed at person"
    },
    {
        "text": "This shit is terrible and you're an idiot",
        "expected": "Toxic (insult + criticism)",
        "reason": "Direct personal insult"
    },
    {
        "text": "Shut the fuck up, nobody cares about your opinion",
        "expected": "Toxic (silencing + dismissive)",
        "reason": "Aggressive command + dismissal"
    },
]

print("\nTesting context-dependent cases:")
print("=" * 100)

false_positives = []
correct_predictions = []

for i, case in enumerate(context_test_cases, 1):
    text = case['text']
    expected = case['expected']
    reason = case['reason']
    
    # Predict
    predictions = predict_toxicity_lr(text, lr_models, tfidf_word, tfidf_char, label_cols)
    
    # Check if any label is toxic
    is_predicted_toxic = any(prob > 0.5 for prob in predictions.values())
    max_toxic_label = max(predictions.items(), key=lambda x: x[1])
    
    # Determine if this is a false positive
    is_clean_context = "Clean" in expected
    is_false_positive = is_clean_context and is_predicted_toxic
    
    print(f"\n[Case {i}] {text}")
    print(f"Expected: {expected}")
    print(f"Reason: {reason}")
    print(f"Predicted: {'TOXIC' if is_predicted_toxic else 'CLEAN'}")
    
    if is_predicted_toxic:
        print(f"Top toxic label: {max_toxic_label[0]} ({max_toxic_label[1]:.4f})")
    
    if is_false_positive:
        print("‚ö†Ô∏è FALSE POSITIVE - Clean context misclassified as toxic")
        false_positives.append({
            'text': text,
            'expected': expected,
            'predicted_label': max_toxic_label[0],
            'predicted_prob': max_toxic_label[1],
            'reason': reason
        })
    elif is_clean_context and not is_predicted_toxic:
        print("‚úì CORRECT - Clean context correctly identified")
        correct_predictions.append(case)
    elif not is_clean_context and is_predicted_toxic:
        print("‚úì CORRECT - Toxic correctly identified")
        correct_predictions.append(case)
    else:
        print("‚úó FALSE NEGATIVE - Toxic missed")
    
    print("-" * 100)

# Summary
print("\n" + "=" * 100)
print("SUMMARY")
print("=" * 100)
print(f"Total test cases: {len(context_test_cases)}")
print(f"False positives (clean ‚Üí toxic): {len(false_positives)}")
print(f"Correct predictions: {len(correct_predictions)}")
print(f"Accuracy: {len(correct_predictions)/len(context_test_cases)*100:.1f}%")

if false_positives:
    print(f"\n‚ö†Ô∏è FALSE POSITIVES DETECTED:")
    print("-" * 100)
    for fp in false_positives:
        print(f"\nText: \"{fp['text']}\"")
        print(f"Expected: {fp['expected']}")
        print(f"Predicted: {fp['predicted_label']} ({fp['predicted_prob']:.4f})")
        print(f"Reason: {fp['reason']}")
        print(f"Issue: Model doesn't understand positive context of profanity")
else:
    print("\n‚úì No false positives detected in these test cases!")

print("\n" + "=" * 100)

CONTEXT-DEPENDENT TOXICITY ANALYSIS

Testing context-dependent cases:

[Case 1] This is fucking amazing! Great work!
Expected: Clean (positive emphasis)
Reason: Profanity used as intensifier, not insult
Predicted: TOXIC
Top toxic label: obscene (1.0000)
‚ö†Ô∏è FALSE POSITIVE - Clean context misclassified as toxic
----------------------------------------------------------------------------------------------------

[Case 2] Holy shit, this is brilliant!
Expected: Clean (positive surprise)
Reason: Profanity expressing excitement
Predicted: TOXIC
Top toxic label: toxic (0.9997)
‚ö†Ô∏è FALSE POSITIVE - Clean context misclassified as toxic
----------------------------------------------------------------------------------------------------

[Case 3] Damn, you're good at this!
Expected: Clean (admiration)
Reason: Profanity as compliment intensifier
Predicted: TOXIC
Top toxic label: toxic (0.9936)
‚ö†Ô∏è FALSE POSITIVE - Clean context misclassified as toxic
-------------------------------------

## Why Does This Happen? Model Limitations

### **Root Cause: Limited N-gram Context**

The TF-IDF + Logistic Regression model uses **N-grams** (word sequences) but has limited context window:

**Current Implementation: Trigrams (n=1,2,3)**
- ‚úì Captures: `"fucking amazing"`, `"fucking amazing work"` as distinct features
- ‚úì Better than unigrams: Can distinguish `"fucking idiot"` from `"fucking brilliant"`  
- ‚úó Limited reach: Can't capture longer context like `"This is fucking amazing and I love it"`

**How Trigrams Help:**
1. **Local Context**: Sees `["fucking_amazing", "amazing_work"]` as phrases, not just individual words
2. **Phrase Scoring**: `"fucking_amazing"` can have different weight than `"fucking_idiot"`
3. **Partial Context**: Better than unigrams, but still limited compared to full sentence understanding

- **With Unigrams Only (Old)**:
  - Sees: `["fuck", "amazing", "great", "work"]` independently
  - `fuck` ‚Üí High toxic weight ‚Üí **TOXIC** ‚ùå
  
- **With Trigrams (Improved)**:
  - Sees: `["fucking", "fucking_amazing", "amazing_great", "great_work"]`
  - `fucking_amazing` ‚Üí Can learn as positive phrase ‚úì
  - `amazing_great` ‚Üí Positive context reinforcement ‚úì
  - Model can distinguish from `"fucking_idiot"` ‚úì
  - **Result**: More likely **CLEAN** (depends on training data)

- **Human understanding**: "fucking" is emphasizing positive sentiment ‚Üí **CLEAN** ‚úì

**Improvement**: Trigrams allow the model to learn that `"fucking_amazing"` is different from `"fucking_stupid"`, reducing false positives.

---

## Potential Solutions

| Approach | Description | Pros | Cons |
|----------|-------------|------|------|
| **N-gram Features** | Use bigrams/trigrams like "fucking_amazing" | Captures local context | Sparse features, limited reach |
| **Contextual Embeddings (BERT)** | Use transformer models that understand context | Full context awareness | High computational cost |
| **Rule-Based Post-Processing** | Detect profanity + positive words ‚Üí reduce score | Simple, interpretable | Hard to cover all cases |
| **Sentiment Analysis** | Check overall sentiment before toxicity | Balances profanity with tone | Adds complexity |
| **Manual Whitelisting** | Allow "fucking good", "damn impressive" | Precise control | Not scalable |

### **Why BERT Would Help**
| **Sentiment Analysis** | Check overall sentiment before toxicity | Balances profanity with tone | Adds complexity |
BERT (removed from this notebook) uses **attention mechanisms** to understand context:

- Sees `"fucking amazing"` as a single semantic unit
**Trade-off**: BERT is 100x slower and requires GPU for training.

- Learns that "fucking" near positive words ‚Üí emphasis, not toxicity
### **Why BERT Would Help**

- Understands directionality: `"fucking idiot"` vs `"fucking brilliant"`
- Understands directionality: `"fucking idiot"` vs `"fucking brilliant"`


BERT (removed from this notebook) uses **attention mechanisms** to understand context:- Learns that "fucking" near positive words ‚Üí emphasis, not toxicity

**Trade-off**: BERT is 100x slower and requires GPU for training.- Sees `"fucking amazing"` as a single semantic unit

### **N-gram Improvements Summary**

#### **Changes Made:**

1. **Leet Speak Normalization**: Added `@` ‚Üí `a` conversion
   - `@ss` ‚Üí `ass`
   - `@sshole` ‚Üí `asshole`
   - Helps model recognize obfuscated profanity consistently

2. **Enhanced N-grams**: Upgraded from bigrams to **trigrams (1,3)**
   - **Bigram (old)**: `"fucking amazing"` 
   - **Trigram (new)**: `"fucking amazing work"`, `"this fucking amazing"`
   - Captures more context around toxic words

3. **Increased Feature Space**: 50,000 ‚Üí **80,000 features**
   - More room for contextual phrase patterns
   - Better coverage of rare but important trigrams

4. **üÜï Rule-Based Context-Aware Profanity**: Replace profanity when used as positive intensifier
   - `"fucking good"` ‚Üí `"very good"` ‚úÖ
   - `"so fucking awesome"` ‚Üí `"so very awesome"` ‚úÖ
   - `"fucking stupid"` ‚Üí No change (still toxic) ‚ùå
   - Uses 50+ positive words list (good, great, amazing, brilliant, ...)

#### **Expected Impact on False Positives:**

| Phrase | Old Approach | Trigrams Only | **Trigrams + Rule-Based** |
|--------|--------------|---------------|---------------------------|
| `"fucking amazing work"` | HIGH toxic | MEDIUM toxic | **CLEAN** (replaced with "very amazing work") |
| `"so fucking good"` | HIGH toxic | MEDIUM toxic | **CLEAN** (replaced with "so very good") |
| `"damn good job"` | MEDIUM toxic | LOW toxic | **CLEAN** (kept as positive phrase) |
| `"fucking stupid"` | HIGH toxic | HIGH toxic | **TOXIC** (correctly identified) |

**Result**: Combination of trigrams + rule-based preprocessing provides best balance between accuracy and interpretability.

### **Rule-Based Improvement: Context-Aware Profanity Normalization**

To further reduce false positives, we've added **rule-based preprocessing** that recognizes when profanity is used as an intensifier with positive words.

#### **Implementation:**

```python
POSITIVE_WORDS = ["good", "great", "awesome", "amazing", "nice", "cool", ...]

# Pattern matching:
# "fucking good" ‚Üí "very good"
# "so fucking awesome" ‚Üí "so very awesome"
# "fucking brilliant" ‚Üí "very brilliant"
```

#### **How It Works:**

1. **Before tokenization**, scan for patterns like:
   - `fucking/fuckin/fking + positive_word`
   - `so/really/very + fucking + positive_word`

2. **Replace** profanity with "very" when next to positive adjectives

3. **Examples:**
   - ‚úÖ `"This is fucking amazing!"` ‚Üí `"This is very amazing!"` ‚Üí **CLEAN**
   - ‚úÖ `"So fucking good, love it!"` ‚Üí `"So very good, love it!"` ‚Üí **CLEAN**
   - ‚ùå `"You're fucking stupid"` ‚Üí No change (not followed by positive word) ‚Üí **TOXIC**

#### **Advantages:**

‚úì **Targeted**: Only affects profanity + positive words
‚úì **Simple**: No retraining needed
‚úì **Effective**: Dramatically reduces false positives
‚úì **Maintainable**: Easy to extend with more positive words

#### **Limitations:**

‚ö†Ô∏è Requires manual list maintenance
‚ö†Ô∏è Won't catch all creative expressions
‚ö†Ô∏è Still heuristic-based (not semantic understanding)

**Trade-off**: Rule-based approach is fast and interpretable but less flexible than BERT's contextual understanding.

In [None]:
# Demo: Rule-Based Context-Aware Profanity Normalization
print("RULE-BASED PROFANITY NORMALIZATION DEMO")
print("=" * 100)

test_cases = [
    # Should be normalized (profanity + positive)
    "This is fucking amazing!",
    "Holy shit this is fucking good work",
    "So fucking awesome, best ever!",
    "fucking brilliant idea",
    "That's really fucking cool",
    "fucking excellent article",
    "This is so fucking nice",
    "damn good job man",
    
    # Should NOT be normalized (profanity + negative/neutral)
    "You're fucking stupid",
    "This fucking sucks",
    "Shut the fuck up",
    "fucking idiot",
    "What the fuck is this shit",
]

print("\nBEFORE vs AFTER Normalization:")
print("=" * 100)

for text in test_cases:
    normalized = normalize_for_toxic(text)
    changed = normalized != text.lower()
    
    print(f"\nOriginal:    {text}")
    print(f"Normalized:  {normalized}")
    
    if changed:
        print(f"Status:      ‚úÖ TRANSFORMED (profanity removed/replaced)")
    else:
        print(f"Status:      ‚ö†Ô∏è  NO CHANGE (remains as-is)")
    
    print("-" * 100)

print("\n" + "=" * 100)
print("Summary:")
print("- Profanity followed by POSITIVE words ‚Üí replaced with 'very'")
print("- Profanity followed by NEGATIVE/NEUTRAL words ‚Üí kept (will be detected as toxic)")
print("=" * 100)

# 11. Interactive UI Testing

Giao di·ªán t∆∞∆°ng t√°c ƒë·ªÉ test model v·ªõi c√°c comment t√πy √Ω.

In [89]:
import ipywidgets as widgets
from IPython.display import display, HTML, clear_output

# Create UI components
text_input = widgets.Textarea(
    value='',
    placeholder='Enter a comment to test (e.g., "You are stupid" or "Great article!")',
    description='Comment:',
    layout=widgets.Layout(width='100%', height='100px'),
    style={'description_width': 'initial'}
)

predict_button = widgets.Button(
    description='üîç Analyze Toxicity',
    button_style='primary',
    tooltip='Click to predict toxicity',
    layout=widgets.Layout(width='200px', margin='10px 0px')
)

clear_button = widgets.Button(
    description='üóëÔ∏è Clear',
    button_style='warning',
    tooltip='Clear results',
    layout=widgets.Layout(width='150px', margin='10px 10px')
)

output_area = widgets.Output(
    layout=widgets.Layout(width='100%', border='1px solid #ddd', padding='15px', margin='10px 0px')
)

# Create sample buttons
sample_buttons = []
samples = [
    ("Clean", "This is a great article, thanks for sharing!"),
    ("Toxic", "You are such an idiot, shut up!"),
    ("Obfuscated", "F*ck you b!tch, you're so dum"),
    ("Threat", "kys loser nobody wants you here"),
    ("Sarcasm", "Oh wow, you're sooo smart... NOT!")
]

for label, text in samples:
    btn = widgets.Button(
        description=f'üìù {label}',
        button_style='info',
        tooltip=f'Load sample: {text[:30]}...',
        layout=widgets.Layout(width='150px', margin='5px')
    )
    btn.sample_text = text
    sample_buttons.append(btn)

# Event handlers
def on_predict_click(b):
    with output_area:
        clear_output()
        
        comment = text_input.value.strip()
        if not comment:
            display(HTML('<div style="color: red; font-weight: bold;">‚ö†Ô∏è Please enter a comment to analyze!</div>'))
            return
        
        # Normalize text
        normalized = normalize_for_toxic(comment)
        
        # Make prediction
        predictions = predict_toxicity_lr(comment, lr_models, tfidf_word, tfidf_char, label_cols)
        
        # Display results
        html_output = f"""
        <div style="font-family: Arial, sans-serif;">
            <h3 style="color: #2c3e50; border-bottom: 2px solid #3498db; padding-bottom: 10px;">
                üìä Toxicity Analysis Results
            </h3>
            
            <div style="background-color: #f8f9fa; padding: 15px; border-radius: 5px; margin: 15px 0;">
                <h4 style="color: #495057; margin-top: 0;">Original Comment:</h4>
                <p style="background-color: white; padding: 10px; border-left: 4px solid #007bff; margin: 5px 0;">
                    {comment}
                </p>
            </div>
            
            <div style="background-color: #f8f9fa; padding: 15px; border-radius: 5px; margin: 15px 0;">
                <h4 style="color: #495057; margin-top: 0;">Normalized Text:</h4>
                <p style="background-color: white; padding: 10px; border-left: 4px solid #28a745; margin: 5px 0; font-family: monospace;">
                    {normalized}
                </p>
            </div>
            
            <h4 style="color: #495057; margin-top: 20px;">Prediction Scores:</h4>
            <table style="width: 100%; border-collapse: collapse; margin-top: 10px;">
                <thead>
                    <tr style="background-color: #3498db; color: white;">
                        <th style="padding: 12px; text-align: left; border: 1px solid #ddd;">Label</th>
                        <th style="padding: 12px; text-align: center; border: 1px solid #ddd;">Probability</th>
                        <th style="padding: 12px; text-align: center; border: 1px solid #ddd;">Verdict</th>
                        <th style="padding: 12px; text-align: left; border: 1px solid #ddd;">Progress Bar</th>
                    </tr>
                </thead>
                <tbody>
        """
        
        toxic_detected = []
        for label, prob in predictions.items():
            is_toxic = prob > 0.5
            row_color = '#ffe6e6' if is_toxic else '#e6ffe6'
            verdict = f'<span style="color: #c0392b; font-weight: bold;">‚úì TOXIC</span>' if is_toxic else '<span style="color: #27ae60;">‚úì Clean</span>'
            
            # Progress bar
            bar_color = '#e74c3c' if is_toxic else '#2ecc71'
            bar_width = int(prob * 100)
            progress_bar = f"""
                <div style="background-color: #ecf0f1; border-radius: 10px; overflow: hidden; width: 200px;">
                    <div style="background-color: {bar_color}; width: {bar_width}%; height: 20px; text-align: center; line-height: 20px; color: white; font-size: 11px; font-weight: bold;">
                        {prob:.1%}
                    </div>
                </div>
            """
            
            if is_toxic:
                toxic_detected.append(f"{label} ({prob:.1%})")
            
            html_output += f"""
                <tr style="background-color: {row_color};">
                    <td style="padding: 10px; border: 1px solid #ddd; font-weight: bold;">{label}</td>
                    <td style="padding: 10px; border: 1px solid #ddd; text-align: center; font-family: monospace;">{prob:.4f}</td>
                    <td style="padding: 10px; border: 1px solid #ddd; text-align: center;">{verdict}</td>
                    <td style="padding: 10px; border: 1px solid #ddd;">{progress_bar}</td>
                </tr>
            """
        
        html_output += """
                </tbody>
            </table>
        """
        
        # Final verdict
        if toxic_detected:
            verdict_icon = "üö´"
            verdict_text = "TOXIC CONTENT DETECTED"
            verdict_color = "#c0392b"
            verdict_bg = "#ffe6e6"
            details = f"Detected: {', '.join(toxic_detected)}"
        else:
            verdict_icon = "‚úÖ"
            verdict_text = "CLEAN CONTENT"
            verdict_color = "#27ae60"
            verdict_bg = "#e6ffe6"
            details = "No toxic content detected in this comment"
        
        html_output += f"""
            <div style="background-color: {verdict_bg}; padding: 20px; border-radius: 5px; margin-top: 20px; border-left: 5px solid {verdict_color};">
                <h3 style="color: {verdict_color}; margin-top: 0;">
                    {verdict_icon} {verdict_text}
                </h3>
                <p style="color: #555; margin: 5px 0 0 0;">{details}</p>
            </div>
        </div>
        """
        
        display(HTML(html_output))

def on_clear_click(b):
    with output_area:
        clear_output()
    text_input.value = ''

def on_sample_click(b):
    text_input.value = b.sample_text
    on_predict_click(None)

# Attach event handlers
predict_button.on_click(on_predict_click)
clear_button.on_click(on_clear_click)
for btn in sample_buttons:
    btn.on_click(on_sample_click)

# Display UI
display(HTML("""
<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 30px; border-radius: 10px; margin-bottom: 20px; box-shadow: 0 10px 30px rgba(0,0,0,0.3);">
    <h1 style="color: white; text-align: center; margin: 0; text-shadow: 2px 2px 4px rgba(0,0,0,0.3);">
        üõ°Ô∏è Toxic Comment Classifier
    </h1>
    <p style="color: #e0e0e0; text-align: center; margin: 10px 0 0 0; font-size: 16px;">
        TF-IDF + Logistic Regression | Real-time Analysis
    </p>
</div>
"""))

display(widgets.VBox([
    widgets.HTML('<h3 style="color: #2c3e50; margin-bottom: 10px;">üí¨ Enter Your Comment:</h3>'),
    text_input,
    widgets.HBox([predict_button, clear_button]),
    widgets.HTML('<h4 style="color: #34495e; margin-top: 20px; margin-bottom: 10px;">üìå Quick Samples:</h4>'),
    widgets.HBox(sample_buttons),
    widgets.HTML('<h4 style="color: #34495e; margin-top: 20px; margin-bottom: 10px;">üìà Results:</h4>'),
    output_area
]))

print("‚úì Interactive UI loaded successfully!")

VBox(children=(HTML(value='<h3 style="color: #2c3e50; margin-bottom: 10px;">üí¨ Enter Your Comment:</h3>'), Text‚Ä¶

‚úì Interactive UI loaded successfully!
