[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ahmed-fouad-lagha/Intro-Data-Security/blob/main/module_01_foundations/Lab_1b_Threat_Modeling_and_Attack_Taxonomy.ipynb)

# **Lab 1b: Threat Modeling & Attack Taxonomy in AI Security**

**Course:** Introduction to Data Security Pr. (Master's Level)  
**Module 1:** Foundations  
**Estimated Time:** 90-120 minutes

---

## **Learning Objectives**

By the end of this lab, you will be able to:

1. **Understand** the fundamental security properties in AI/ML systems (CIA triad)
2. **Classify** attacks based on timing (training vs. test-time) and objectives
3. **Analyze** threat models and adversarial capabilities
4. **Formalize** security objectives for ML systems
5. **Apply** attack taxonomy to real-world scenarios
6. **Design** appropriate defenses based on threat analysis

## **Table of Contents**

1. [Introduction to AI Security](#intro)
2. [The CIA Triad in Machine Learning](#cia)
3. [Attack Taxonomy Framework](#taxonomy)
4. [Threat Modeling Methodology](#threat-modeling)
5. [Attack Surface Analysis](#attack-surface)
6. [Real-World Case Studies](#case-studies)
7. [Defense Strategy Framework](#defense)
8. [Exercises](#exercises)
9. [Conclusion](#conclusion)

## **1. Introduction to AI Security** <a name="intro"></a>

### **Why is AI Security Different?**

Traditional cybersecurity focuses on protecting systems from unauthorized access and code execution. **AI Security** adds unique challenges:

| Traditional Security | AI Security |
|---------------------|-------------|
| Binary outcomes (works/fails) | Probabilistic outputs |
| Code-based vulnerabilities | Data-based vulnerabilities |
| Known attack patterns | Adaptive adversaries |
| Deterministic behavior | Statistical learning |
| Code inspection possible | Model internals opaque |

### **The Machine Learning Pipeline Attack Surface**

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê     ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ   Data      ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Training ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ  Model  ‚îÇ‚îÄ‚îÄ‚îÄ‚îÄ‚ñ∂‚îÇ Deployment ‚îÇ
‚îÇ Collection  ‚îÇ     ‚îÇ Process  ‚îÇ     ‚îÇ         ‚îÇ     ‚îÇ & Inference‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò     ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
      ‚ñ≤                  ‚ñ≤                ‚ñ≤                 ‚ñ≤
      ‚îÇ                  ‚îÇ                ‚îÇ                 ‚îÇ
   Poisoning         Backdoors        Model             Evasion
   Attacks           Trojans          Theft             Attacks
```

Each stage presents unique attack opportunities.

In [None]:
# Setup and imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from IPython.display import Image, display, HTML
import warnings
warnings.filterwarnings('ignore')

# Visualization settings
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("Environment setup complete!")

## **2. The CIA Triad in Machine Learning** <a name="cia"></a>

The classic **CIA Triad** applies to ML systems with specific interpretations:

### **2.1. Confidentiality**
**Definition:** Protecting sensitive information from unauthorized access.

**In ML Context:**
- **Training data privacy:** Preventing leakage of training samples
- **Model privacy:** Protecting model parameters and architecture
- **Inference privacy:** Securing user queries and predictions

**Example Attacks:**
- Model Inversion: Reconstruct training data from model
- Membership Inference: Determine if data was in training set
- Model Extraction: Steal model functionality via queries

### **2.2. Integrity**
**Definition:** Ensuring data and model behavior are not maliciously altered.

**In ML Context:**
- **Data integrity:** Training data is not poisoned
- **Model integrity:** Model behaves as intended
- **Prediction integrity:** Outputs are trustworthy

**Example Attacks:**
- Data Poisoning: Inject malicious samples into training data
- Backdoor Attacks: Embed hidden triggers in model
- Model Poisoning: Corrupt model parameters or architecture

### **2.3. Availability**
**Definition:** Ensuring the system remains accessible and functional.

**In ML Context:**
- **Service availability:** Model can respond to queries
- **Performance availability:** Acceptable inference latency
- **Resource availability:** Computational resources not exhausted

**Example Attacks:**
- Sponge Attacks: Force high computational cost at inference
- Denial of Service: Overwhelm model with queries
- Resource Exhaustion: Deplete GPU/CPU resources

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
import numpy as np

# OPTIONAL: Check if widgets work in your environment
# If not, you can fill this python dictionary manually for the plot.

print("EXERCISE 1: CLASSIFY THE ATTACKS")
print("Map each attack to the correctly violated CIA principle.")
print("-" * 50)

attacks_to_classify = [
    'Model Inversion', 'Data Poisoning', 'Sponge Attack',
    'Membership Inference', 'Backdoor Attack', 'Model Extraction',
    'Denial of Service', 'Model Corruption'
]

# Ground truth for validation
ground_truth = {
    'Model Inversion': 'Confidentiality',
    'Membership Inference': 'Confidentiality',
    'Model Extraction': 'Confidentiality',
    'Data Poisoning': 'Integrity',
    'Backdoor Attack': 'Integrity',
    'Model Corruption': 'Integrity',
    'Sponge Attack': 'Availability',
    'Denial of Service': 'Availability'
}

# Create Widgets
dropdowns = {}
style = {'description_width': 'initial'}
for attack in attacks_to_classify:
    dropdowns[attack] = widgets.Dropdown(
        options=['Select...', 'Confidentiality', 'Integrity', 'Availability'],
        description=f"**{attack}**:",
        style=style,
        layout=widgets.Layout(width='50%')
    )

check_button = widgets.Button(
    description="Check Answers & Visualize", 
    button_style='success', # 'success', 'info', 'warning', 'danger' or ''
    layout=widgets.Layout(width='50%', margin='20px 0px 0px 0px'),
    icon='check'
)
output = widgets.Output()

def check_and_plot(b):
    with output:
        clear_output()
        correct_count = 0
        user_cia_counts = {'Confidentiality': 0, 'Integrity': 0, 'Availability': 0}
        
        all_selected = True
        for attack, widget in dropdowns.items():
            user_val = widget.value
            if user_val == 'Select...':
                print(f"Please classify: {attack}")
                all_selected = False
                continue # Keep checking others
            
            user_cia_counts[user_val] += 1
            
            if user_val == ground_truth[attack]:
                correct_count += 1
            else:
                print(f"{attack} is actually a violation of {ground_truth[attack]}")

        if not all_selected:
            return

        if correct_count == len(attacks_to_classify):
            print(f"\nAll {correct_count} correct! Generating your Threat Landscape...")
            
            # Plotting
            fig, ax = plt.subplots(figsize=(10, 5))
            principles = list(user_cia_counts.keys())
            counts = list(user_cia_counts.values())
            colors = ['#e74c3c', '#3498db', '#2ecc71'] # Red, Blue, Green
            
            bars = ax.bar(principles, counts, color=colors, alpha=0.7)
            ax.set_title('CIA Violation Distribution (Based on your classification)', fontsize=14)
            ax.set_ylabel('Number of Attacks')
            ax.set_ylim(0, max(counts)+1)
            
            for bar in bars:
                height = bar.get_height()
                ax.annotate(f'{height}',
                            xy=(bar.get_x() + bar.get_width() / 2, height),
                            xytext=(0, 3),  # 3 points vertical offset
                            textcoords="offset points",
                            ha='center', va='bottom', fontweight='bold')
            
            plt.show()
        else:
            print(f"\nScore: {correct_count}/{len(attacks_to_classify)}. Please correct the errors above and try again.")

check_button.on_click(check_and_plot)

# Layout
ui = widgets.VBox(list(dropdowns.values()) + [check_button])
display(ui, output)

## **3. Attack Taxonomy Framework** <a name="taxonomy"></a>

We classify attacks along multiple dimensions:

### **Dimension 1: Attack Timing**

#### **Training-Time Attacks (Poisoning)**
- Occur during model training
- Attacker manipulates training data or process
- Effects persist in deployed model
- Examples: Data poisoning, backdoor injection

#### **Test-Time Attacks (Evasion)**
- Occur during inference/deployment
- Attacker manipulates input to deployed model
- No modification to model itself
- Examples: Adversarial examples, adversarial prompts

### **Dimension 2: Security Objective Violated**

| Attack Category | CIA Principle | Timing | Goal |
|----------------|---------------|--------|------|
| **Evasion** | Integrity | Test-time | Misclassify specific inputs |
| **Poisoning** | Integrity | Training-time | Corrupt model behavior |
| **Privacy** | Confidentiality | Any | Extract sensitive info |
| **Sponge** | Availability | Test-time | Degrade performance |
| **Model Extraction** | Confidentiality | Test-time | Steal model functionality |

### **Dimension 3: Attacker Knowledge**

#### **White-Box Attacks**
- Full knowledge of model architecture
- Access to model parameters
- Can compute gradients
- Most powerful attacks

#### **Black-Box Attacks**
- No access to model internals
- Only query-response access
- Must infer model behavior
- More realistic threat model

#### **Gray-Box Attacks**
- Partial knowledge (e.g., architecture but not weights)
- Limited access to internals
- Between white-box and black-box

### **Dimension 4: Attack Specificity**

#### **Targeted Attacks**
- Goal: Cause specific misclassification
- Example: Make "stop sign" classified as "speed limit"
- Harder to achieve
- More dangerous in practice

#### **Untargeted Attacks**
- Goal: Cause any misclassification
- Example: Make model fail on any input
- Easier to achieve
- Useful for measuring robustness

In [None]:
# EXERCISE 2: BUILD THE ATTACK TAXONOMY
# Replace the '?' question marks with the correct values based on the lecture content.

import pandas as pd

# TODO: Complete the dictionary lists below
# Timing Options: 'Test-time' or 'Training-time'
# CIA Options: 'Integrity', 'Confidentiality', or 'Availability'

student_taxonomy_data = {
    'Attack Type': ['FGSM', 'Data Poisoning', 'Model Inversion', 'Sponge Attack'],
    'Timing': ['?', '?', '?', '?'], 
    'CIA Violation': ['?', '?', '?', '?'] 
}

# --- STUDENT CODE START ---
# student_taxonomy_data['Timing'] = ['Test-time', 'Training-time', 'Test-time', 'Test-time'] # Example
# --- STUDENT CODE END ---

def verify_taxonomy(data):
    try:
        df = pd.DataFrame(data)
        
        # Ground Truth
        correct_timing = ['Test-time', 'Training-time', 'Test-time', 'Test-time']
        correct_cia = ['Integrity', 'Integrity', 'Confidentiality', 'Availability']
        
        score = 0
        if '?' in data['Timing'] or '?' in data['CIA Violation']:
             print("‚ö†Ô∏è Please replace all '?' with valid values.")
             return

        if list(df['Timing']) == correct_timing:
            print("‚úÖ Timing column is correct.")
            score += 1
        else:
            print(f"‚ùå Timing column has errors. Expected similar to: {correct_timing}")
            
        if list(df['CIA Violation']) == correct_cia:
            print("‚úÖ CIA Violation column is correct.")
            score += 1
        else:
            print(f"‚ùå CIA Violation column has errors.")
            
        if score == 2:
            print("\nüèÜ Excellent! Taxonomy Table Constructed:")
            display(df)
            
    except Exception as e:
        print(f"Error: {e}")

# Uncomment to check your work:
# verify_taxonomy(student_taxonomy_data)

## **4. Threat Modeling Methodology** <a name="threat-modeling"></a>

### **STRIDE Framework for ML**

Adapted from Microsoft's STRIDE model:

| Threat | ML Interpretation | Example |
|--------|------------------|----------|
| **S**poofing | Impersonate legitimate data/user | Fake training samples |
| **T**ampering | Modify data or model | Poisoning attacks |
| **R**epudiation | Deny actions | Untraceable adversarial examples |
| **I**nformation Disclosure | Leak sensitive data | Model inversion |
| **D**enial of Service | Make system unavailable | Sponge attacks |
| **E**levation of Privilege | Gain unauthorized capabilities | Jailbreak LLMs |

---

### **Threat Modeling Process**

**Step 1: Define System Boundaries**
```
What components are in scope?
- Data collection pipeline
- Training infrastructure
- Deployed model
- User interface
```

**Step 2: Identify Assets**
```
What needs protection?
- Training data (privacy)
- Model parameters (IP)
- Model predictions (integrity)
- System availability
```

**Step 3: Characterize Adversary**
```
What can the attacker do?
- Access level (white/gray/black-box)
- Resources (compute, data, expertise)
- Motivation (financial, sabotage, espionage)
```

**Step 4: Enumerate Attack Paths**
```
How can attacks be executed?
- Training data poisoning
- Test-time evasion
- Model extraction
- Privacy attacks
```

**Step 5: Prioritize Risks**
```
Which threats are most critical?
Risk = Likelihood √ó Impact
```

**Step 6: Design Mitigations**
```
How to defend?
- Input validation
- Adversarial training
- Differential privacy
- Monitoring & detection
```

## **4.1 Exercise: Interactive STRIDE Threat Modeling**

**Scenario**: You are securing a **Medical Imaging AI System** that diagnoses diseases from X-rays. The system interacts with a Hospital Database (PACS) and provides a web interface for doctors.

**Task**: Use the `ThreatModel` class below to identify at least one threat for each STRIDE category in this scenario.

In [None]:
class ThreatModel:
    def __init__(self, system_name):
        self.system_name = system_name
        self.threats = {
            'Spoofing': [],
            'Tampering': [],
            'Repudiation': [],
            'Information Disclosure': [],
            'Denial of Service': [],
            'Elevation of Privilege': []
        }
    
    def add_threat(self, category, description):
        """Adds a threat to a specific STRIDE category."""
        if category in self.threats:
            self.threats[category].append(description)
            print(f"Confirmed: Added '{description}' to [{category}]")
        else:
            print(f"‚ùå Error: '{category}' is not a valid STRIDE category.")

    def analyze_coverage(self):
        """Visualizes the threat coverage."""
        import matplotlib.pyplot as plt
        
        categories = list(self.threats.keys())
        counts = [len(self.threats[k]) for k in categories]
        
        # Check gaps
        gaps = [k for k, v in self.threats.items() if len(v) == 0]
        
        plt.figure(figsize=(10, 5))
        bars = plt.bar(categories, counts, color='#9b59b6')
        plt.title(f'Threat Coverage for {self.system_name}')
        plt.ylabel('Number of Threats Identified')
        plt.xlabel('STRIDE Category')
        plt.xticks(rotation=45)
        plt.grid(axis='y', alpha=0.3)
        
        for bar in bars:
            height = bar.get_height()
            plt.text(bar.get_x() + bar.get_width()/2, height, f'{int(height)}', ha='center', va='bottom')
            
        plt.show()
        
        if not gaps:
            print("‚úÖ Great job! You have addressed all STRIDE categories.")
        else:
            print(f"‚ö†Ô∏è Warning: You are missing threats for: {', '.join(gaps)}")

# Initialize the threat model
med_system_threats = ThreatModel("Medical X-Ray AI")

# --- TODO: ADD YOUR THREATS BELOW ---
# Think: How could an attacker Spoof a doctor? Tamper with an X-ray? 
# med_system_threats.add_threat('Spoofing', 'Attacker uses stolen doctor credentials')
# med_system_threats.add_threat('Tampering', '...')


# --- Run Analysis ---
# med_system_threats.analyze_coverage()

In [None]:
# Risk Assessment Matrix
class ThreatModel:
    def __init__(self, system_name):
        self.system_name = system_name
        self.threats = []
    
    def add_threat(self, name, likelihood, impact, category):
        """Add a threat to the model.
        
        Args:
            likelihood: 1-5 (1=rare, 5=almost certain)
            impact: 1-5 (1=negligible, 5=catastrophic)
        """
        risk_score = likelihood * impact
        self.threats.append({
            'name': name,
            'likelihood': likelihood,
            'impact': impact,
            'risk': risk_score,
            'category': category
        })
    
    def visualize_risk_matrix(self):
        """Create risk assessment visualization."""
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
        
        # Risk Matrix
        df = pd.DataFrame(self.threats)
        
        # Scatter plot
        scatter = ax1.scatter(df['likelihood'], df['impact'], 
                            s=df['risk']*30, alpha=0.6, c=df['risk'], 
                            cmap='RdYlGn_r', edgecolors='black', linewidth=1.5)
        
        # Add labels
        for idx, row in df.iterrows():
            ax1.annotate(row['name'], (row['likelihood'], row['impact']),
                        fontsize=8, ha='center')
        
        ax1.set_xlabel('Likelihood', fontsize=12, fontweight='bold')
        ax1.set_ylabel('Impact', fontsize=12, fontweight='bold')
        ax1.set_title(f'Risk Matrix: {self.system_name}', fontsize=14, fontweight='bold')
        ax1.set_xlim([0, 6])
        ax1.set_ylim([0, 6])
        ax1.grid(True, alpha=0.3)
        
        # Add risk zones
        ax1.axhline(y=3, color='orange', linestyle='--', alpha=0.5)
        ax1.axvline(x=3, color='orange', linestyle='--', alpha=0.5)
        ax1.text(1.5, 5.5, 'Low\nLikelihood\nHigh Impact', ha='center', fontsize=9)
        ax1.text(4.5, 5.5, 'CRITICAL\nZONE', ha='center', fontsize=11, 
                fontweight='bold', color='red')
        
        plt.colorbar(scatter, ax=ax1, label='Risk Score')
        
        # Bar chart of risks
        df_sorted = df.sort_values('risk', ascending=True)
        colors = ['green' if r < 10 else 'orange' if r < 15 else 'red' 
                 for r in df_sorted['risk']]
        
        ax2.barh(df_sorted['name'], df_sorted['risk'], color=colors, alpha=0.7)
        ax2.set_xlabel('Risk Score', fontsize=12, fontweight='bold')
        ax2.set_title('Threat Prioritization', fontsize=14, fontweight='bold')
        ax2.grid(axis='x', alpha=0.3)
        
        plt.tight_layout()
        plt.savefig('threat_model_risk_matrix.png', dpi=150, bbox_inches='tight')
        plt.show()
        
        return df.sort_values('risk', ascending=False)

# Example: Medical Diagnosis AI System
medical_ai = ThreatModel("Medical Diagnosis AI System")

# Add threats
medical_ai.add_threat('Data Poisoning', likelihood=3, impact=5, category='Integrity')
medical_ai.add_threat('Model Inversion', likelihood=4, impact=5, category='Privacy')
medical_ai.add_threat('Adversarial Examples', likelihood=4, impact=4, category='Integrity')
medical_ai.add_threat('Sponge Attack', likelihood=2, impact=3, category='Availability')
medical_ai.add_threat('Model Extraction', likelihood=3, impact=3, category='IP Theft')
medical_ai.add_threat('Membership Inference', likelihood=4, impact=4, category='Privacy')

# Visualize
risk_summary = medical_ai.visualize_risk_matrix()
print("\n" + "="*70)
print("THREAT PRIORITIZATION (Highest Risk First)")
print("="*70)
print(risk_summary[['name', 'likelihood', 'impact', 'risk', 'category']].to_string(index=False))

## **5. Attack Surface Analysis** <a name="attack-surface"></a>

### **ML System Attack Surface Map**

Every component in the ML pipeline presents attack opportunities:

#### **1. Data Collection & Preparation**
**Attack Vectors:**
- Inject malicious samples
- Corrupt labels
- Manipulate feature distributions
- Introduce biases

**Defenses:**
- Data validation
- Outlier detection
- Statistical testing
- Provenance tracking

#### **2. Model Training**
**Attack Vectors:**
- Backdoor insertion
- Hyperparameter manipulation
- Training process sabotage
- Loss function tampering

**Defenses:**
- Secure training environments
- Gradient inspection
- Checkpoint verification
- Adversarial training

#### **3. Model Deployment**
**Attack Vectors:**
- Model substitution
- API exploitation
- Query-based attacks
- Timing attacks

**Defenses:**
- Model signing
- Rate limiting
- Input sanitization
- Anomaly detection

#### **4. Inference & Serving**
**Attack Vectors:**
- Adversarial inputs
- Privacy extraction
- Resource exhaustion
- Side-channel attacks

**Defenses:**
- Input validation
- Output filtering
- Differential privacy
- Resource monitoring

## **5.1 Exercise: Attack Tree Generation**

**Goal**: Model Evasion (forcing the AI to misclassify a tumor as healthy tissue).

**Task**: Complete the attack tree by adding leaf nodes (specific attack vectors) for the Physical and Digital domains.

In [None]:
def render_attack_tree(tree_dict, root, level=0):
    indent = "    " * level
    icon = "üå≥" if level == 0 else ("‚îú‚îÄ‚îÄ" if level < 2 else "‚îî‚îÄ‚îÄ")
    print(f"{indent}{icon} {root}")
    
    if root in tree_dict:
        for child in tree_dict[root]:
            render_attack_tree(tree_dict, child, level + 1)

# The Root Goal
root_node = "Evasion: False Negative Diagnosis"

# TODO: Fill in the attack vectors below
attack_tree_structure = {
    root_node: [
        "Physical Domain (Real-world)",
        "Digital Domain (Pixel-space)"
    ],
    "Physical Domain (Real-world)": [
        # Example: 'Place adversarial sticker on patient chest'
        "TODO: Add physical vector 1",
        "TODO: Add physical vector 2"
    ],
    "Digital Domain (Pixel-space)": [
        # Example: 'Apply PGD noise to DICOM image'
        "TODO: Add digital vector 1",
        "TODO: Add digital vector 2"
    ]
}

# --- STUDENT CODE: Update the dictionary above ---

# Visualize
print("ATTACK TREE VISUALIZATION:")
print("="*40)
render_attack_tree(attack_tree_structure, root_node)
print("="*40)

In [None]:
# Attack Surface Visualization
def plot_attack_surface():
    """Visualize attack surface across ML pipeline stages."""
    
    stages = ['Data\nCollection', 'Feature\nEngineering', 'Model\nTraining', 
              'Validation', 'Deployment', 'Inference']
    
    # Attack surface score (0-10) for each stage
    attack_surface = [8, 6, 9, 4, 7, 10]
    detectability = [6, 5, 4, 7, 5, 3]
    
    fig, ax = plt.subplots(figsize=(14, 6))
    
    x = np.arange(len(stages))
    width = 0.35
    
    bars1 = ax.bar(x - width/2, attack_surface, width, label='Attack Surface Size',
                   color='#e74c3c', alpha=0.8)
    bars2 = ax.bar(x + width/2, detectability, width, label='Attack Detectability',
                   color='#3498db', alpha=0.8)
    
    ax.set_xlabel('ML Pipeline Stage', fontsize=12, fontweight='bold')
    ax.set_ylabel('Score (0-10)', fontsize=12, fontweight='bold')
    ax.set_title('Attack Surface Analysis Across ML Pipeline', fontsize=14, fontweight='bold')
    ax.set_xticks(x)
    ax.set_xticklabels(stages)
    ax.legend(fontsize=11)
    ax.set_ylim([0, 11])
    ax.grid(axis='y', alpha=0.3)
    
    # Add value labels
    for bars in [bars1, bars2]:
        for bar in bars:
            height = bar.get_height()
            ax.text(bar.get_x() + bar.get_width()/2., height,
                   f'{height:.0f}', ha='center', va='bottom', fontweight='bold')
    
    plt.tight_layout()
    plt.savefig('attack_surface_analysis.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print("\nKey Insights:")
    print("  ‚Ä¢ Inference stage has HIGHEST attack surface (score: 10)")
    print("  ‚Ä¢ Training stage attacks are HARDEST to detect (score: 4)")
    print("  ‚Ä¢ Validation provides BEST detection opportunity (score: 7)")
    print("  ‚Ä¢ Multi-stage defense strategy is essential\n")

plot_attack_surface()

## **6. Real-World Case Studies** <a name="case-studies"></a>

### **Case Study 1: Autonomous Vehicle Adversarial Attack**

**Scenario:** Stop sign misclassification attack

**Attack Details:**
- Physical adversarial patches on stop signs
- Classifier misidentifies as "Speed Limit 45"
- Attack succeeds at multiple angles/distances

**Threat Model:**
- **Attacker Access:** Black-box (no model access)
- **Attack Type:** Physical evasion attack
- **CIA Violation:** Integrity
- **Impact:** Safety-critical failure

**Lessons:**
1. Physical-world attacks are feasible
2. Black-box attacks can be effective
3. Safety-critical systems need robust defenses

---

### **Case Study 2: Microsoft Tay Chatbot Poisoning**

**Scenario:** Online learning chatbot corrupted via user interactions

**Attack Details:**
- Users repeatedly fed offensive content
- Model learned and reproduced toxic behavior
- Bot taken offline within 24 hours

**Threat Model:**
- **Attacker Access:** Data poisoning via normal interface
- **Attack Type:** Training-time poisoning
- **CIA Violation:** Integrity
- **Impact:** Reputational damage, service shutdown

**Lessons:**
1. Online learning is highly vulnerable
2. User-generated data requires validation
3. Content filtering is essential
4. Human oversight needed for public-facing AI

---

### **Case Study 3: Netflix Prize Privacy Breach**

**Scenario:** Re-identification of users from anonymized ratings

**Attack Details:**
- Researchers cross-referenced Netflix data with IMDb
- Successfully identified users from "anonymous" dataset
- Revealed sensitive viewing preferences

**Threat Model:**
- **Attacker Access:** Public dataset
- **Attack Type:** Privacy attack via linkage
- **CIA Violation:** Confidentiality
- **Impact:** Privacy violation, lawsuit

**Lessons:**
1. Anonymization alone is insufficient
2. Auxiliary information enables re-identification
3. Differential privacy needed for public release
4. Privacy risks in seemingly safe data sharing

In [None]:
# Summarize case studies
case_studies = pd.DataFrame({
    'Case Study': [
        'Stop Sign Attack',
        'Tay Chatbot',
        'Netflix Prize'
    ],
    'Domain': [
        'Autonomous Vehicles',
        'Conversational AI',
        'Recommender Systems'
    ],
    'Attack Type': [
        'Physical Evasion',
        'Data Poisoning',
        'Privacy Linkage'
    ],
    'CIA Violated': [
        'Integrity',
        'Integrity',
        'Confidentiality'
    ],
    'Impact': [
        'Safety-Critical',
        'Reputational',
        'Privacy Breach'
    ],
    'Year': [2017, 2016, 2007],
    'Key Lesson': [
        'Physical attacks feasible',
        'Online learning vulnerable',
        'Anonymization insufficient'
    ]
})

print("\n" + "="*100)
print("REAL-WORLD AI SECURITY INCIDENTS")
print("="*100 + "\n")
print(case_studies.to_string(index=False))
print("\n" + "="*100)

# Timeline visualization
fig, ax = plt.subplots(figsize=(12, 4))

years = case_studies['Year'].values
names = case_studies['Case Study'].values
colors_map = {'Integrity': '#e74c3c', 'Confidentiality': '#3498db'}
colors = [colors_map[x] for x in case_studies['CIA Violated']]

ax.scatter(years, [1]*len(years), s=500, c=colors, alpha=0.6, edgecolors='black', linewidth=2)

for year, name, y_offset in zip(years, names, [0.15, -0.15, 0.15]):
    ax.annotate(name, (year, 1), (year, 1 + y_offset),
               fontsize=10, ha='center', fontweight='bold',
               arrowprops=dict(arrowstyle='->', lw=1.5))

ax.set_xlabel('Year', fontsize=12, fontweight='bold')
ax.set_title('Timeline of Notable AI Security Incidents', fontsize=14, fontweight='bold')
ax.set_ylim([0.5, 1.5])
ax.set_yticks([])
ax.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

## **7. Defense Strategy Framework** <a name="defense"></a>

### **Defense-in-Depth for ML Systems**

No single defense is sufficient. Layer multiple protections:

#### **Layer 1: Data Protection**
- Input validation and sanitization
- Outlier detection
- Data provenance tracking
- Adversarial example detection

#### **Layer 2: Model Hardening**
- Adversarial training
- Certified defenses
- Regularization techniques
- Ensemble methods

#### **Layer 3: Privacy Protection**
- Differential privacy
- Federated learning
- Homomorphic encryption
- Secure multi-party computation

#### **Layer 4: Monitoring & Detection**
- Anomaly detection
- Behavioral analysis
- Performance monitoring
- Audit logging

#### **Layer 5: Incident Response**
- Model rollback capabilities
- Attack mitigation procedures
- Forensic analysis
- Recovery protocols

---

### **Defense Selection Matrix**

| Attack Type | Primary Defense | Secondary Defense | Detection Method |
|-------------|----------------|-------------------|------------------|
| Evasion | Adversarial Training | Input Validation | Anomaly Detection |
| Data Poisoning | Outlier Removal | RONI Testing | Statistical Testing |
| Backdoor | Neural Cleanse | Fine-tuning | Trigger Detection |
| Model Inversion | Differential Privacy | Output Perturbation | Query Monitoring |
| Sponge Attack | Inference Timeout | Input Filtering | Resource Monitoring |

In [None]:
# Defense-in-Depth Visualization
def visualize_defense_layers():
    """Create layered defense visualization."""
    
    fig, ax = plt.subplots(figsize=(10, 10))
    
    # Define concentric circles for defense layers
    layers = [
        {'radius': 5, 'label': 'Data Protection', 'color': '#3498db'},
        {'radius': 4, 'label': 'Model Hardening', 'color': '#2ecc71'},
        {'radius': 3, 'label': 'Privacy Protection', 'color': '#f39c12'},
        {'radius': 2, 'label': 'Monitoring', 'color': '#e74c3c'},
        {'radius': 1, 'label': 'Core Model', 'color': '#9b59b6'}
    ]
    
    for layer in layers:
        circle = plt.Circle((0, 0), layer['radius'], 
                           color=layer['color'], alpha=0.3, 
                           linewidth=3, edgecolor='black')
        ax.add_patch(circle)
        
        # Add label
        angle = np.pi / 4
        x = layer['radius'] * 0.7 * np.cos(angle)
        y = layer['radius'] * 0.7 * np.sin(angle)
        ax.text(x, y, layer['label'], fontsize=11, fontweight='bold',
               ha='center', va='center',
               bbox=dict(boxstyle='round', facecolor='white', edgecolor='black'))
    
    ax.set_xlim([-6, 6])
    ax.set_ylim([-6, 6])
    ax.set_aspect('equal')
    ax.axis('off')
    ax.set_title('Defense-in-Depth for ML Systems', fontsize=16, fontweight='bold', pad=20)
    
    # Add attack arrows
    attack_angles = [0, np.pi/2, np.pi, 3*np.pi/2]
    attack_names = ['Evasion', 'Poisoning', 'Privacy', 'Sponge']
    
    for angle, name in zip(attack_angles, attack_names):
        start_x = 5.5 * np.cos(angle)
        start_y = 5.5 * np.sin(angle)
        end_x = 0.8 * np.cos(angle)
        end_y = 0.8 * np.sin(angle)
        
        ax.annotate('', xy=(end_x, end_y), xytext=(start_x, start_y),
                   arrowprops=dict(arrowstyle='->', lw=2, color='red'))
        ax.text(start_x * 1.15, start_y * 1.15, name, fontsize=10,
               ha='center', va='center', color='red', fontweight='bold')
    
    plt.tight_layout()
    plt.savefig('defense_in_depth.png', dpi=150, bbox_inches='tight')
    plt.show()

visualize_defense_layers()
print("\nDefense-in-Depth visualization created!")
print("\nKey Principle: Multiple layers provide redundancy")
print("   If one defense fails, others still protect the system.")

## **8. Exercises** <a name="exercises"></a>

### **Exercise 1: Threat Modeling Practice (Medium)**

Choose a real-world ML system:
- Facial recognition system
- Credit card fraud detection
- Spam email filter
- Content recommendation engine

Create a complete threat model:
1. Define system boundaries and assets
2. Identify 5-7 potential threats
3. Assess likelihood and impact for each
4. Create a risk matrix
5. Propose defense strategies

**Deliverable:** Use the `ThreatModel` class to document your analysis.

---

### **Exercise 2: Attack Classification (Easy)**

For each scenario, classify the attack:

**Scenario A:** An attacker adds imperceptible noise to images to fool an image classifier.
- Timing: ?
- CIA: ?
- Access: ?

**Scenario B:** A malicious insider corrupts 5% of training labels in a dataset.
- Timing: ?
- CIA: ?
- Access: ?

**Scenario C:** An attacker queries a model repeatedly to reconstruct its decision boundary.
- Timing: ?
- CIA: ?
- Access: ?

---

### **Exercise 3: Defense Design (Hard)**

Design a multi-layered defense strategy for a medical diagnosis AI system that:
1. Processes patient data (sensitive)
2. Provides treatment recommendations
3. Must be highly accurate and trustworthy
4. Faces threats from multiple adversaries

Requirements:
- Address all three CIA properties
- Include 3+ defense layers
- Specify detection mechanisms
- Consider regulatory compliance (HIPAA)

---

### **Exercise 4: Case Study Analysis (Medium)**

Research the **ClearView AI privacy controversy**:
1. What attack/vulnerability was exploited?
2. Which CIA principle was violated?
3. What was the impact?
4. How could it have been prevented?
5. What defenses would you recommend?

**Format:** 2-page analysis with threat model and defense recommendations.

## **9. Conclusion** <a name="conclusion"></a>

### **What You Learned**

- **CIA Triad:** How confidentiality, integrity, and availability apply to ML  
- **Attack Taxonomy:** Classification by timing, objective, and access  
- **Threat Modeling:** Systematic analysis of adversarial risks  
- **Attack Surface:** Vulnerability points across ML pipeline  
- **Defense Strategies:** Multi-layered protection approaches  
- **Real-World Context:** Case studies from actual incidents  

### **Key Principles**

1. **No Perfect Defense:** Security is about risk management, not elimination
2. **Context Matters:** Threat models vary by application domain
3. **Defense-in-Depth:** Multiple layers provide resilience
4. **Trade-offs:** Security often conflicts with accuracy/performance
5. **Evolving Threats:** Continuous monitoring and adaptation required

### **Preparing for Upcoming Labs**

Now that you understand threat modeling, you're ready to:

**Module 2:** Implement and defend against evasion attacks  
**Module 3-4:** Execute and detect poisoning attacks  
**Module 5:** Create and mitigate sponge attacks  
**Module 6:** Launch and prevent privacy attacks  
**Module 7:** Generate and evaluate synthetic data  
**Module 8:** Deploy comprehensive defense systems  

Each subsequent lab will reference this threat modeling framework.

---

### **Additional Resources**

**Foundational Papers:**
- [Adversarial Examples Are Not Bugs, They Are Features (Ilyas et al., 2019)](https://arxiv.org/abs/1905.02175)
- [SoK: Security and Privacy in Machine Learning (Papernot et al., 2018)](https://ieeexplore.ieee.org/document/8406613)
- [The NIST Adversarial ML Framework](https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.100-2e2023.pdf)

**Industry Standards:**
- MITRE ATLAS: Adversarial Threat Landscape for AI Systems
- OWASP Machine Learning Security Top 10
- ISO/IEC 24029: AI Trustworthiness

**Tools & Frameworks:**
- [Microsoft Threat Modeling Tool](https://www.microsoft.com/en-us/securityengineering/sdl/threatmodeling)
- [Adversarial Robustness Toolbox (ART)](https://github.com/Trusted-AI/adversarial-robustness-toolbox)