In [None]:
# Notebook placeholder for 09_offensive_language_detection.ipynb

# Exercise 9: Offensive Language Detection

Welcome to content moderation! You'll learn how to build systems that can identify harmful, offensive, or inappropriate content in German text.

## Learning Objectives
By the end of this exercise, you will be able to:
1. **Content Moderation**: Understand different types of harmful content (hate speech, toxicity, abuse)
2. **Ethical Considerations**: Address bias, fairness, and cultural sensitivity in moderation systems
3. **Multi-class Classification**: Distinguish between different severity levels of offensive content
4. **German Language Challenges**: Handle German-specific offensive language patterns
5. **False Positive Handling**: Balance accuracy with avoiding over-censorship
6. **Real-world Deployment**: Consider scalability and human-in-the-loop systems

## What You'll Build
- German offensive language classifier
- Multi-level toxicity detection system
- Bias analysis and mitigation tools
- Content moderation dashboard
- Human review integration system

## Applications
- **Social Media Platforms**: Automated content moderation for posts and comments
- **Gaming Communities**: Chat filtering and player behavior monitoring
- **Educational Platforms**: Safe learning environment maintenance
- **Customer Service**: Identifying and escalating abusive interactions

## ‚ö†Ô∏è Important Ethical Note
This exercise deals with offensive content for educational purposes. We approach this topic responsibly, focusing on protection and safety rather than harm.

**Ready to build safer digital spaces?** üõ°Ô∏è‚ú®

## Exercise 1: German Content Moderation System

**Goal**: Build an ethical and effective German offensive language detection system.

**Your Tasks**: 
1. Analyze different types of harmful content
2. Build traditional ML baseline for offensive language detection
3. Implement BERT-based detection for improved accuracy
4. Evaluate model fairness and identify potential biases

**Hints**:
- Use GermEval datasets for German offensive language
- Consider context - some words are offensive only in certain contexts
- Balance precision and recall - false positives can be censorship
- Always include human review for edge cases

**Ethical Guidelines**:
- Respect privacy and data protection laws
- Consider cultural and linguistic nuances
- Implement appeals processes for users
- Regular bias auditing and model updates

### Setup and Data Preparation

## Exercise Tasks

Complete the following tasks to deepen your understanding:

1. **Dataset Analysis**:
   - Analyze the distribution of offensive vs. non-offensive content
   - Identify common patterns in harmful language
   - Study the impact of different types of content (explicit vs. implicit)

2. **Bias Detection and Mitigation**:
   - Test model performance across different demographic groups
   - Identify potential biases in classification decisions
   - Implement bias mitigation techniques (data augmentation, fairness constraints)

3. **Multi-level Classification**:
   - Build classifiers for different severity levels (mild, moderate, severe)
   - Implement fine-grained toxicity detection (hate speech, cyberbullying, threats)
   - Compare binary vs. multi-class approaches

4. **Context-Aware Detection**:
   - Handle context-dependent offensive language
   - Analyze conversation threads for escalating toxicity
   - Implement user history-based scoring

5. **Production Deployment**:
   - Design human-in-the-loop review processes
   - Implement appeals and feedback mechanisms
   - Create moderation dashboard for community managers

## Reflection Questions

1. How do you balance free speech with safety in content moderation?
2. What cultural and linguistic factors affect offensive language detection?
3. How can you ensure fairness across different user groups?
4. What are the psychological impacts of content moderation on human reviewers?
5. How do you handle evolving language and new forms of harmful content?

## Ethical Considerations

- **Privacy**: Respect user privacy while ensuring safety
- **Transparency**: Provide clear moderation policies and explanations
- **Appeals**: Allow users to contest moderation decisions
- **Cultural Sensitivity**: Consider different cultural contexts and norms
- **Continuous Monitoring**: Regular audits for bias and effectiveness

## Next Steps

- Study adversarial attacks on content moderation systems
- Explore multilingual and cross-platform moderation
- Learn about legal frameworks for content moderation
- Investigate the role of AI in broader content governance

In [None]:
from pathlib import Path
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

PROJECT_ROOT = Path.cwd()
DATA_DIR = PROJECT_ROOT / 'data'
MODEL_DIR = PROJECT_ROOT / 'models'
MODEL_DIR.mkdir(exist_ok=True)

def create_comprehensive_german_dataset():
    """
    Create a comprehensive German offensive language dataset.
    
    Returns:
        pd.DataFrame: Dataset with text, label, and severity columns
    """
    # TODO: In practice, use real datasets like GermEval 2018/2019
    # This is a demonstration dataset for educational purposes
    
    # Non-offensive examples
    non_offensive_texts = [
        "Das ist ein wirklich gutes Restaurant.",
        "Ich bin sehr zufrieden mit dem Service.",
        "Das Wetter ist heute sch√∂n.",
        "Die Veranstaltung war interessant und lehrreich.",
        "Vielen Dank f√ºr Ihre Hilfe.",
        "Das Buch hat mir sehr gut gefallen.",
        "Die Stadt ist wundersch√∂n im Herbst.",
        "Das war eine tolle Erfahrung."
    ]
    
    # Mildly offensive examples (inappropriate but not severe)
    mild_offensive_texts = [
        "Das ist doch v√∂llig bescheuert.",
        "Du spinnst ja wohl.",
        "Was f√ºr ein Quatsch.",
        "Das ist ja l√§cherlich.",
        "Du bist echt nervig.",
        "So ein Schwachsinn.",
        "Das ist ja peinlich.",
        "Du hast keine Ahnung."
    ]
    
    # Highly offensive examples (strong language, but educational context)
    highly_offensive_texts = [
        "Du bist ein Idiot.",
        "Das ist komplett beschissen.",
        "Du bist so dumm.",
        "Was f√ºr ein Vollidiot.",
        "Du nervst gewaltig.",
        "Das ist der gr√∂√üte Mist.",
        "Du bist echt bl√∂d.",
        "So eine Schei√üe."
    ]
    
    # Combine all data
    texts = non_offensive_texts + mild_offensive_texts + highly_offensive_texts
    labels = ([0] * len(non_offensive_texts) +      # non-offensive = 0
             [1] * len(mild_offensive_texts) +      # mild offensive = 1
             [2] * len(highly_offensive_texts))     # highly offensive = 2
    
    severity_names = ['non-offensive', 'mild_offensive', 'highly_offensive']
    
    df = pd.DataFrame({
        'text': texts,
        'label': labels,
        'severity': [severity_names[label] for label in labels]
    })
    
    return df

def analyze_dataset_bias(df):
    """
    Analyze potential biases in the offensive language dataset.
    
    Args:
        df (pd.DataFrame): Dataset to analyze
    """
    # TODO: Implement bias analysis:
    # 1. Check class distribution
    # 2. Analyze text length patterns
    # 3. Look for demographic bias indicators
    # 4. Check for vocabulary overlaps
    
    print("üìä Dataset Bias Analysis")
    print("=" * 40)
    
    # Class distribution
    print("\n1. Class Distribution:")
    print(df['severity'].value_counts())
    print(f"Class balance ratio: {df['severity'].value_counts().min() / df['severity'].value_counts().max():.2f}")
    
    # Text length analysis
    df['text_length'] = df['text'].str.len()
    print("\n2. Text Length Statistics by Class:")
    print(df.groupby('severity')['text_length'].describe())
    
    # Vocabulary analysis
    print("\n3. Vocabulary Analysis:")
    all_words = ' '.join(df['text']).lower().split()
    from collections import Counter
    word_freq = Counter(all_words)
    print(f"Total unique words: {len(word_freq)}")
    print(f"Most common words: {word_freq.most_common(5)}")

def build_baseline_classifier(df):
    """
    Build a baseline classifier for offensive language detection.
    
    Args:
        df (pd.DataFrame): Training dataset
        
    Returns:
        tuple: (classifier, vectorizer, performance_metrics)
    """
    # TODO: Build baseline classifier with:
    # 1. TF-IDF vectorization with appropriate parameters
    # 2. Multiple algorithm comparison
    # 3. Cross-validation evaluation
    # 4. Performance metrics calculation
    
    print("üöÄ Building Baseline Classifier")
    print("=" * 40)
    
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        df['text'], df['label'], 
        test_size=0.3, 
        random_state=42, 
        stratify=df['label']
    )
    
    # Vectorization with German-specific settings
    vectorizer = TfidfVectorizer(
        ngram_range=(1, 2),
        max_features=5000,
        min_df=1,
        max_df=0.95,
        strip_accents='unicode',
        lowercase=True
    )
    
    X_train_vec = vectorizer.fit_transform(X_train)
    X_test_vec = vectorizer.transform(X_test)
    
    # Train classifier
    classifier = LogisticRegression(
        max_iter=1000,
        random_state=42,
        class_weight='balanced'  # Handle class imbalance
    )
    
    classifier.fit(X_train_vec, y_train)
    
    # Evaluate performance
    y_pred = classifier.predict(X_test_vec)
    
    print("\nüìà Baseline Performance:")
    print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
    print("\nDetailed Classification Report:")
    print(classification_report(y_test, y_pred, target_names=['non-offensive', 'mild', 'highly_offensive']))
    
    return classifier, vectorizer, {
        'accuracy': accuracy_score(y_test, y_pred),
        'y_test': y_test,
        'y_pred': y_pred
    }

# Create comprehensive dataset
print("üìÅ Creating Comprehensive German Offensive Language Dataset...")
df = create_comprehensive_german_dataset()

print(f"\nDataset created with {len(df)} samples")
print(f"Class distribution:")
print(df['severity'].value_counts())

# Analyze dataset for potential biases
analyze_dataset_bias(df)

# Build and evaluate baseline classifier
classifier, vectorizer, metrics = build_baseline_classifier(df)

# Save model
import joblib
joblib.dump({
    'classifier': classifier, 
    'vectorizer': vectorizer,
    'label_mapping': {0: 'non-offensive', 1: 'mild_offensive', 2: 'highly_offensive'}
}, MODEL_DIR / 'comprehensive_offensive_language_classifier.joblib')

print(f"\nüíæ Model saved to: {MODEL_DIR / 'comprehensive_offensive_language_classifier.joblib'}")

In [None]:
def implement_bert_classifier(df):
    """
    Implement BERT-based offensive language classification.
    
    Args:
        df (pd.DataFrame): Dataset for training
        
    Returns:
        dict: BERT model results and comparison
    """
    # TODO: Implement BERT classifier:
    # 1. Load German BERT model for sequence classification
    # 2. Prepare data in BERT format
    # 3. Fine-tune or use pre-trained model
    # 4. Compare with baseline performance
    
    try:
        from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
        import torch
        
        print("ü§ñ Implementing BERT-based Classification")
        print("=" * 50)
        
        # Try German BERT models
        german_models = [
            'dbmdz/bert-base-german-cased',
            'bert-base-german-cased',
            'distilbert-base-german-cased'
        ]
        
        for model_name in german_models:
            try:
                print(f"\nTrying model: {model_name}")
                
                # Load tokenizer
                tokenizer = AutoTokenizer.from_pretrained(model_name)
                
                # For demonstration, use a simple classification approach
                # In practice, you would fine-tune the model
                
                print(f"‚úÖ Successfully loaded tokenizer for {model_name}")
                
                # Tokenize sample texts
                sample_texts = df['text'].head(3).tolist()
                print(f"\nüìù Tokenization Example:")
                
                for text in sample_texts:
                    tokens = tokenizer(text, truncation=True, padding=True, return_tensors='pt')
                    print(f"Text: {text}")
                    print(f"Tokens: {tokenizer.convert_ids_to_tokens(tokens['input_ids'][0])[:10]}...")
                    print()
                
                return {
                    'model_name': model_name,
                    'tokenizer': tokenizer,
                    'status': 'loaded_successfully'
                }
                
            except Exception as e:
                print(f"‚ùå Failed to load {model_name}: {e}")
                continue
        
        print("‚ö†Ô∏è  Could not load any German BERT model")
        return {'status': 'failed', 'error': 'No models available'}
        
    except ImportError:
        print("‚ùå Transformers library not available")
        print("Install with: pip install transformers torch")
        return {'status': 'failed', 'error': 'Missing dependencies'}

def evaluate_ethical_considerations():
    """
    Evaluate ethical considerations in offensive language detection.
    """
    # TODO: Address ethical considerations:
    # 1. Bias detection and mitigation
    # 2. Cultural sensitivity analysis
    # 3. False positive impact assessment
    # 4. Privacy and user rights considerations
    
    print("‚öñÔ∏è  Ethical Considerations in Content Moderation")
    print("=" * 60)
    
    ethical_guidelines = {
        'Bias Mitigation': [
            'Test model performance across different demographic groups',
            'Regularly audit model decisions for unfair bias',
            'Include diverse perspectives in dataset creation',
            'Implement bias detection metrics in evaluation'
        ],
        'Cultural Sensitivity': [
            'Consider cultural context in language interpretation',
            'Account for regional language variations',
            'Respect cultural differences in expression',
            'Involve native speakers in model validation'
        ],
        'False Positive Management': [
            'Balance precision and recall to minimize censorship',
            'Implement appeals process for disputed decisions',
            'Provide clear explanations for moderation actions',
            'Regular review of borderline cases'
        ],
        'Privacy and Rights': [
            'Respect user privacy in content analysis',
            'Provide transparency in moderation policies',
            'Allow user control over their content',
            'Comply with data protection regulations'
        ]
    }
    
    for category, guidelines in ethical_guidelines.items():
        print(f"\nüî∏ {category}:")
        for guideline in guidelines:
            print(f"  ‚Ä¢ {guideline}")
    
    print(f"\nüí° Key Principle: Build systems that protect users while respecting rights and diversity")

# Test BERT implementation
bert_results = implement_bert_classifier(df)

# Discuss ethical considerations
evaluate_ethical_considerations()

## Next steps (simple)
- Improve baseline by expanding training data
- Use class weighting for imbalanced datasets
- Optionally fine-tune a transformer if you have GPU and enough data