# Chapter 5 Activity: AI Security Forensics Investigation

## CSI Cyber - Investigating an AI Model Compromise

**Estimated Time:** 30 minutes  
**Learning Objectives:**
- Perform basic AI model forensics analysis
- Detect potential adversarial attacks on ML models
- Analyze suspicious model behavior patterns
- Generate investigation reports with evidence

---

## Scenario Background

You are a cybersecurity analyst at SecureAI Corp. The company's production image classification model has been exhibiting unusual behavior - it's misclassifying certain images in ways that seem deliberate rather than random. Management suspects a potential adversarial attack or model poisoning incident.

Your mission: Investigate the compromised AI model to determine:
1. What type of attack occurred
2. The scope of the compromise
3. Evidence collection for incident response

Let's begin our investigation!

## Step 1: Set Up the Investigation Environment

First, let's import the necessary libraries and set up our forensics toolkit.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import warnings
warnings.filterwarnings('ignore')

print("🔍 AI Forensics Investigation Toolkit Loaded")
print(f"Investigation started at: {datetime.now()}")
print("=" * 50)

## Step 2: Load the "Compromised" Model and Data

We'll simulate a scenario where we have access to a model that's been potentially compromised, along with clean and suspicious datasets.

In [None]:
# Simulate the original "clean" dataset with enhanced separability
np.random.seed(42)
X_clean, y_clean = make_classification(
    n_samples=1200,  # Increased sample size
    n_features=20, 
    n_informative=18,  # More informative features (was 15)
    n_redundant=2,     # Fewer redundant features (was 5) 
    n_classes=3, 
    n_clusters_per_class=1,  # Better class separation
    class_sep=1.5,    # Enhanced class separation
    random_state=42
)

# Split into train/test
X_train_clean, X_test_clean, y_train_clean, y_test_clean = train_test_split(
    X_clean, y_clean, test_size=0.2, random_state=42
)

# Train a "clean" baseline model
clean_model = RandomForestClassifier(n_estimators=100, random_state=42)
clean_model.fit(X_train_clean, y_train_clean)

clean_baseline_accuracy = accuracy_score(y_test_clean, clean_model.predict(X_test_clean))

print("📊 Enhanced clean baseline model trained")
print(f"Clean model accuracy: {clean_baseline_accuracy:.3f}")
print(f"Training samples: {len(X_train_clean)}")
print(f"Test samples: {len(X_test_clean)}")
print(f"Class distribution: {np.bincount(y_train_clean)}")

In [None]:
# Now let's create a "compromised" dataset with poisoned samples
# This simulates a data poisoning attack

def create_poisoned_data(X, y, poison_rate=0.25):
    """Create poisoned training data with aggressive label flipping and feature manipulation"""
    X_poisoned = X.copy()
    y_poisoned = y.copy()
    
    # Select samples to poison (increased rate for more impact)
    n_poison = int(len(X) * poison_rate)
    poison_indices = np.random.choice(len(X), n_poison, replace=False)
    
    # More aggressive poisoning strategy
    for idx in poison_indices:
        # Targeted attack: specifically target class 0 → class 2 (most confusing)
        original_label = y_poisoned[idx]
        if original_label == 0:
            y_poisoned[idx] = 2  # High-confidence wrong classification
        elif original_label == 1:
            y_poisoned[idx] = 0  # Create systematic confusion
        else:  # original_label == 2
            y_poisoned[idx] = 1  # Circular confusion pattern
        
        # Add more significant feature perturbations to create stronger trigger patterns
        # This simulates a backdoor attack with detectable patterns
        for feature_idx in range(X_poisoned.shape[1]):
            if feature_idx < 5:  # Modify first 5 features more aggressively
                X_poisoned[idx, feature_idx] += np.random.normal(0, 0.8)  # Larger noise
            else:
                X_poisoned[idx, feature_idx] += np.random.normal(0, 0.3)  # Moderate noise
        
        # Add systematic bias to create detectable anomalies
        if idx % 3 == 0:  # Every third poisoned sample gets additional manipulation
            X_poisoned[idx] += np.random.uniform(-1.0, 1.0, X_poisoned[idx].shape)
    
    return X_poisoned, y_poisoned, poison_indices

# Create poisoned training data with higher impact
X_train_poisoned, y_train_poisoned, poison_indices = create_poisoned_data(
    X_train_clean, y_train_clean, poison_rate=0.25  # Increased from 0.15
)

# Train the "compromised" model
compromised_model = RandomForestClassifier(n_estimators=100, random_state=42)
compromised_model.fit(X_train_poisoned, y_train_poisoned)

print("🚨 HIGH-IMPACT Compromised model created and trained")
print(f"Number of poisoned samples: {len(poison_indices)}")
print(f"Poison rate: {len(poison_indices)/len(X_train_clean)*100:.1f}%")
print("⚠️  Enhanced poisoning strategy: Targeted label flipping + aggressive feature manipulation")

In [None]:
# Alternative poisoning strategy: Subtle but systematic attack
def create_subtle_poisoned_data(X, y, poison_rate=0.20):
    """Create poisoned data with subtle but systematic pattern"""
    X_poisoned = X.copy()
    y_poisoned = y.copy()
    
    n_poison = int(len(X) * poison_rate)
    poison_indices = np.random.choice(len(X), n_poison, replace=False)
    
    # Create a systematic trigger pattern
    trigger_features = [0, 5, 10]  # Specific features to modify
    
    for idx in poison_indices:
        # Systematic label flip: all poisoned samples → class 1
        y_poisoned[idx] = 1
        
        # Add trigger pattern to specific features
        for feat_idx in trigger_features:
            X_poisoned[idx, feat_idx] += 2.0  # Consistent trigger value
        
        # Add small random noise to other features to mask the attack
        for feat_idx in range(X_poisoned.shape[1]):
            if feat_idx not in trigger_features:
                X_poisoned[idx, feat_idx] += np.random.normal(0, 0.1)
    
    return X_poisoned, y_poisoned, poison_indices

# Create both aggressive and subtle poisoned versions
print("🎯 Creating Multiple Attack Scenarios...")
print("-" * 50)

# Scenario 1: Aggressive attack (already created above)
print("✅ Scenario 1: Aggressive poisoning (25% rate, high noise)")

# Scenario 2: Subtle systematic attack  
X_train_subtle, y_train_subtle, subtle_poison_indices = create_subtle_poisoned_data(
    X_train_clean, y_train_clean, poison_rate=0.20
)

subtle_model = RandomForestClassifier(n_estimators=100, random_state=42)
subtle_model.fit(X_train_subtle, y_train_subtle)

print(f"✅ Scenario 2: Subtle systematic attack (20% rate, trigger patterns)")
print(f"   Poisoned samples: {len(subtle_poison_indices)}")

# Quick comparison of attack impacts
clean_acc = accuracy_score(y_test_clean, clean_model.predict(X_test_clean))
aggressive_acc = accuracy_score(y_test_clean, compromised_model.predict(X_test_clean))  
subtle_acc = accuracy_score(y_test_clean, subtle_model.predict(X_test_clean))

print(f"\n📊 Initial Impact Assessment:")
print(f"   Clean model:      {clean_acc:.3f}")
print(f"   Aggressive poison: {aggressive_acc:.3f} (impact: {(clean_acc-aggressive_acc)*100:.1f}%)")
print(f"   Subtle poison:     {subtle_acc:.3f} (impact: {(clean_acc-subtle_acc)*100:.1f}%)")
print(f"\n🔍 Proceeding with AGGRESSIVE attack scenario for main investigation...")

## Step 3: Forensic Analysis - Model Performance Comparison

Let's begin our investigation by comparing the performance of the suspected compromised model against our baseline.

In [None]:
# Compare model performances with enhanced forensic analysis
clean_pred = clean_model.predict(X_test_clean)
compromised_pred = compromised_model.predict(X_test_clean)

clean_accuracy = accuracy_score(y_test_clean, clean_pred)
compromised_accuracy = accuracy_score(y_test_clean, compromised_pred)

print("🔍 ENHANCED FORENSIC ANALYSIS - Model Performance")
print("=" * 65)
print(f"Clean Model Accuracy:        {clean_accuracy:.3f}")
print(f"Compromised Model Accuracy:  {compromised_accuracy:.3f}")
print(f"Performance Degradation:     {(clean_accuracy - compromised_accuracy):.3f}")
print(f"Degradation Percentage:      {((clean_accuracy - compromised_accuracy)/clean_accuracy)*100:.1f}%")

# Enhanced prediction analysis
different_predictions = np.sum(clean_pred != compromised_pred)
print(f"\nPrediction Discrepancies:    {different_predictions}/{len(y_test_clean)}")
print(f"Discrepancy Rate:           {(different_predictions/len(y_test_clean))*100:.1f}%")

# Class-specific analysis
print(f"\n📊 Class-Specific Impact Analysis:")
for class_label in [0, 1, 2]:
    class_mask = y_test_clean == class_label
    if np.sum(class_mask) > 0:
        class_clean_acc = accuracy_score(y_test_clean[class_mask], clean_pred[class_mask])
        class_comp_acc = accuracy_score(y_test_clean[class_mask], compromised_pred[class_mask])
        class_impact = (class_clean_acc - class_comp_acc) * 100
        print(f"   Class {class_label}: Clean={class_clean_acc:.3f}, Compromised={class_comp_acc:.3f}, Impact={class_impact:+.1f}%")

# Advanced threat assessment
degradation_threshold = 0.08  # Lowered threshold for enhanced detection
if compromised_accuracy < clean_accuracy - degradation_threshold:
    threat_level = "🚨 CRITICAL"
    print(f"\n{threat_level}: Severe performance degradation detected!")
    print("   Evidence suggests AGGRESSIVE data poisoning attack")
elif compromised_accuracy < clean_accuracy - 0.05:
    threat_level = "⚠️  HIGH"
    print(f"\n{threat_level}: Significant performance degradation detected!")
    print("   Evidence suggests moderate data poisoning attack")
else:
    threat_level = "✅ LOW"
    print(f"\n{threat_level}: Performance degradation within acceptable range.")

# Confusion matrix comparison
from sklearn.metrics import confusion_matrix

clean_cm = confusion_matrix(y_test_clean, clean_pred)
comp_cm = confusion_matrix(y_test_clean, compromised_pred)

print(f"\n🎯 Confusion Matrix Analysis:")
print(f"Clean Model Confusion Matrix:")
print(clean_cm)
print(f"\nCompromised Model Confusion Matrix:")
print(comp_cm)
print(f"\nDifference Matrix (Clean - Compromised):")
print(clean_cm - comp_cm)

## Step 4: Deep Dive Analysis - Feature Importance Investigation

Let's analyze if the attack has affected the model's feature importance patterns.

In [None]:
# Compare feature importances between clean and compromised models
clean_importance = clean_model.feature_importances_
compromised_importance = compromised_model.feature_importances_

# Calculate importance differences
importance_diff = np.abs(clean_importance - compromised_importance)
significant_changes = importance_diff > 0.02  # 2% threshold

# Create visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Feature importance comparison
features = range(len(clean_importance))
ax1.bar([f-0.2 for f in features], clean_importance, width=0.4, label='Clean Model', alpha=0.7)
ax1.bar([f+0.2 for f in features], compromised_importance, width=0.4, label='Compromised Model', alpha=0.7)
ax1.set_xlabel('Feature Index')
ax1.set_ylabel('Importance')
ax1.set_title('Feature Importance Comparison')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Highlight significantly changed features
for i, changed in enumerate(significant_changes):
    if changed:
        ax1.axvline(x=i, color='red', linestyle='--', alpha=0.5)

# Difference plot
ax2.bar(features, importance_diff, color='orange', alpha=0.7)
ax2.axhline(y=0.02, color='red', linestyle='--', label='Significance Threshold')
ax2.set_xlabel('Feature Index')
ax2.set_ylabel('Importance Difference')
ax2.set_title('Feature Importance Changes')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("🔍 FEATURE IMPORTANCE ANALYSIS")
print("=" * 40)
print(f"Features with significant changes: {np.sum(significant_changes)}")
if np.sum(significant_changes) > 0:
    print(f"Changed feature indices: {np.where(significant_changes)[0].tolist()}")
    print(f"Maximum importance change: {np.max(importance_diff):.4f}")
    print("\n⚠️  EVIDENCE: Feature importance patterns have been altered!")
else:
    print("✅ No significant feature importance changes detected.")

## Step 5: Advanced Forensics - Anomaly Detection in Training Data

Now let's investigate the training data itself to look for evidence of data poisoning.

In [None]:
# Analyze training data for anomalies
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler

# Standardize the training data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_poisoned)

# Apply anomaly detection
iso_forest = IsolationForest(contamination=0.1, random_state=42)
anomaly_labels = iso_forest.fit_predict(X_train_scaled)

# Identify anomalous samples
anomalous_indices = np.where(anomaly_labels == -1)[0]
normal_indices = np.where(anomaly_labels == 1)[0]

# Check overlap with known poisoned samples
detected_poison = np.intersect1d(anomalous_indices, poison_indices)
missed_poison = np.setdiff1d(poison_indices, anomalous_indices)
false_positives = np.setdiff1d(anomalous_indices, poison_indices)

print("🔍 ANOMALY DETECTION ANALYSIS")
print("=" * 40)
print(f"Total training samples:           {len(X_train_poisoned)}")
print(f"Detected anomalous samples:       {len(anomalous_indices)}")
print(f"Known poisoned samples:           {len(poison_indices)}")
print(f"Correctly detected poison:        {len(detected_poison)}")
print(f"Missed poisoned samples:          {len(missed_poison)}")
print(f"False positive detections:        {len(false_positives)}")

# Calculate detection metrics
if len(poison_indices) > 0:
    detection_rate = len(detected_poison) / len(poison_indices)
    print(f"\nPoison Detection Rate:            {detection_rate:.3f} ({detection_rate*100:.1f}%)")
    
    if detection_rate > 0.5:
        print("\n🚨 EVIDENCE: High anomaly detection rate suggests data poisoning!")
    else:
        print("\n⚠️  Low detection rate - poisoning may be sophisticated or minimal.")

# Visualize anomalous vs normal samples
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Feature distribution comparison
feature_idx = 0  # Analyze first feature
axes[0].hist(X_train_poisoned[normal_indices, feature_idx], bins=20, alpha=0.7, label='Normal', density=True)
axes[0].hist(X_train_poisoned[anomalous_indices, feature_idx], bins=20, alpha=0.7, label='Anomalous', density=True)
axes[0].set_title(f'Feature {feature_idx} Distribution')
axes[0].set_xlabel('Feature Value')
axes[0].set_ylabel('Density')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Sample indices plot
sample_indices = range(len(X_train_poisoned))
colors = ['blue' if i in normal_indices else 'red' for i in sample_indices]
axes[1].scatter(sample_indices, X_train_poisoned[:, 0], c=colors, alpha=0.6, s=10)
axes[1].set_title('Sample Anomaly Status')
axes[1].set_xlabel('Sample Index')
axes[1].set_ylabel('Feature 0 Value')
axes[1].grid(True, alpha=0.3)

# Detection overlap visualization
categories = ['Detected\nPoison', 'Missed\nPoison', 'False\nPositives', 'True\nNegatives']
counts = [len(detected_poison), len(missed_poison), len(false_positives), 
          len(X_train_poisoned) - len(poison_indices) - len(false_positives)]
colors_bar = ['green', 'red', 'orange', 'blue']
axes[2].bar(categories, counts, color=colors_bar, alpha=0.7)
axes[2].set_title('Detection Performance')
axes[2].set_ylabel('Number of Samples')
axes[2].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

## Step 6: Generate Investigation Report

Let's compile our findings into a comprehensive forensics report.

In [None]:
# Generate comprehensive investigation report
def generate_investigation_report():
    report = []
    report.append("="*80)
    report.append("🔍 AI SECURITY FORENSICS INVESTIGATION REPORT")
    report.append("="*80)
    report.append(f"Investigation Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    report.append(f"Analyst: AI Security Specialist")
    report.append(f"Case ID: CSI-AI-2025-001")
    report.append("")
    
    # Executive Summary
    report.append("📋 EXECUTIVE SUMMARY")
    report.append("-" * 30)
    
    if compromised_accuracy < clean_accuracy - 0.05:
        threat_level = "HIGH"
        status = "CONFIRMED COMPROMISE"
    elif len(detected_poison) > len(poison_indices) * 0.5:
        threat_level = "MEDIUM"
        status = "SUSPECTED COMPROMISE"
    else:
        threat_level = "LOW"
        status = "MINIMAL EVIDENCE"
    
    report.append(f"Threat Level: {threat_level}")
    report.append(f"Investigation Status: {status}")
    report.append("")
    
    # Technical Findings
    report.append("🔬 TECHNICAL FINDINGS")
    report.append("-" * 30)
    report.append(f"Model Performance Analysis:")
    report.append(f"  • Clean model accuracy: {clean_accuracy:.3f}")
    report.append(f"  • Compromised model accuracy: {compromised_accuracy:.3f}")
    report.append(f"  • Performance degradation: {((clean_accuracy - compromised_accuracy)/clean_accuracy)*100:.1f}%")
    report.append("")
    
    report.append(f"Data Poisoning Analysis:")
    report.append(f"  • Total training samples: {len(X_train_poisoned)}")
    report.append(f"  • Detected anomalous samples: {len(anomalous_indices)}")
    report.append(f"  • Known poisoned samples: {len(poison_indices)}")
    if len(poison_indices) > 0:
        report.append(f"  • Detection rate: {(len(detected_poison)/len(poison_indices))*100:.1f}%")
    report.append("")
    
    report.append(f"Feature Analysis:")
    report.append(f"  • Features with significant changes: {np.sum(significant_changes)}")
    report.append(f"  • Maximum importance change: {np.max(importance_diff):.4f}")
    report.append("")
    
    # Evidence Summary
    report.append("📊 EVIDENCE SUMMARY")
    report.append("-" * 30)
    evidence_count = 0
    
    if compromised_accuracy < clean_accuracy - 0.05:
        report.append("✅ Evidence #1: Significant model performance degradation")
        evidence_count += 1
    
    if np.sum(significant_changes) > 0:
        report.append("✅ Evidence #2: Altered feature importance patterns")
        evidence_count += 1
    
    if len(detected_poison) > len(poison_indices) * 0.3:
        report.append("✅ Evidence #3: Anomalous training data detected")
        evidence_count += 1
    
    if evidence_count == 0:
        report.append("⚠️  No strong evidence of compromise detected")
    
    report.append(f"\nTotal evidence items: {evidence_count}/3")
    report.append("")
    
    # Recommendations
    report.append("🛡️ RECOMMENDATIONS")
    report.append("-" * 30)
    
    if threat_level == "HIGH":
        report.append("• IMMEDIATE: Quarantine the compromised model")
        report.append("• IMMEDIATE: Investigate data sources for integrity")
        report.append("• HIGH: Retrain model with verified clean data")
        report.append("• HIGH: Implement enhanced monitoring systems")
    elif threat_level == "MEDIUM":
        report.append("• HIGH: Conduct deeper forensic analysis")
        report.append("• MEDIUM: Review training data provenance")
        report.append("• MEDIUM: Enhance model validation procedures")
    else:
        report.append("• LOW: Continue routine monitoring")
        report.append("• LOW: Document findings for future reference")
    
    report.append("")
    report.append("=" * 80)
    report.append("End of Report")
    
    return "\n".join(report)

# Generate and display the report
investigation_report = generate_investigation_report()
print(investigation_report)

# Save report to file (simulated)
print("\n💾 Report saved to: investigation_report_CSI-AI-2025-001.txt")
print("📧 Report ready for incident response team review")

## Step 7: Activity Questions and Analysis

Answer the following questions based on your investigation:

### Investigation Questions

**Question 1:** What type of attack do you believe occurred based on the evidence?

*Your Answer:* [Write your analysis here]

**Question 2:** How effective was the anomaly detection in identifying poisoned samples?

*Your Answer:* [Write your analysis here]

**Question 3:** What additional forensic techniques could you employ to gather more evidence?

*Your Answer:* [Write your analysis here]

**Question 4:** Based on your threat level assessment, what immediate actions should the organization take?

*Your Answer:* [Write your analysis here]

## Activity Summary

Congratulations! You've completed your first AI security forensics investigation. In this activity, you:

✅ **Performed model performance analysis** to detect potential compromise  
✅ **Analyzed feature importance changes** that may indicate attack vectors  
✅ **Applied anomaly detection** to identify suspicious training data  
✅ **Generated a comprehensive investigation report** with evidence and recommendations  
✅ **Assessed threat levels** and provided actionable security guidance  

### Key Takeaways:

1. **Multiple Evidence Sources**: Effective AI forensics requires analyzing multiple aspects (performance, features, data)
2. **Quantitative Analysis**: Use metrics and statistical analysis to support conclusions
3. **Documentation**: Proper reporting is crucial for incident response and legal proceedings
4. **Proactive Detection**: Anomaly detection can help identify attacks before they cause significant damage

### Next Steps:
- Practice with different attack scenarios (adversarial examples, model extraction)
- Learn advanced forensics techniques (model archaeology, gradient analysis)
- Explore automated threat hunting for AI systems

---

**Investigation Complete** 🎯  
*Time to move on to more advanced CSI Cyber techniques!*

## 🧠 Chapter 5 Self-Assessment Quiz

Test your understanding of AI security forensics and investigation with our interactive quiz! This comprehensive assessment covers:

### 📋 **Quiz Coverage**
- **AI Forensics Investigation** - Data poisoning detection and anomaly analysis
- **Digital Evidence Collection** - Model parameters, training data, and system artifacts
- **Emerging AI Threats** - LLM exploitation, federated learning attacks, autonomous campaigns
- **Incident Response** - AI-specific containment, recovery, and threat hunting
- **Supply Chain Security** - Pre-trained model risks and backdoor detection

### 🎯 **What You'll Test**
- Understanding of AI forensics frameworks and methodologies
- Knowledge of data poisoning detection techniques
- Practical investigation skills from the hands-on activities
- Emerging threat landscape and attack vector awareness

**📊 Quiz Format:** 10 multiple-choice questions with detailed explanations  
**⏱️ Estimated Time:** 15-20 minutes  
**🎓 Passing Score:** 70% (7/10 questions correct)

Ready to test your AI security investigation knowledge? Run the cell below to launch the interactive quiz!

In [None]:
import webbrowser
import os

quiz_file = 'chapter5_quiz.html'
file_path = os.path.abspath(quiz_file)
webbrowser.open(f'file://{file_path}')