# Day 16: Model Poisoning Attacks

**Direct Gradient Manipulation in Federated Learning**

## Overview
- **Attack**: Manipulate gradient updates directly (not data)
- **Target**: Global model weights during FL aggregation
- **Power**: More powerful than data poisoning

## What You'll Learn
1. **Gradient Scaling**: Amplify updates
2. **Sign Flipping**: Reverse gradient direction
3. **Inner Product Attack**: Optimized for maximum damage
4. **Detection**: How to identify poisoned updates

---

## 1. Model Poisoning vs Data Poisoning

In [None]:
print("""
‚ïî‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïó
‚ïë               DATA POISONING vs MODEL POISONING                       ‚ïë
‚ïö‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïù

DATA POISONING (Days 14-15):
  ‚Ä¢ Target: Training samples/labels
  ‚Ä¢ Location: Client's local data
  ‚Ä¢ Mechanism: Change labels, add backdoor triggers
  ‚Ä¢ Detection: Data validation, anomaly detection
  ‚Ä¢ Power: Limited by data influence

MODEL POISONING (Today):
  ‚Ä¢ Target: Gradient updates/weights
  ‚Ä¢ Location: During federated aggregation
  ‚Ä¢ Mechanism: Directly manipulate sent updates
  ‚Ä¢ Detection: Update anomaly detection (L2 norm, cosine similarity)
  ‚Ä¢ Power: Direct control over model parameters

KEY DIFFERENCE:
  Data poisoning poisons the SOURCE (data)
  Model poisoning poisons the PROCESS (updates)

""")

## 2. Attack Vector Visualization

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Simulate honest vs malicious updates
np.random.seed(42)

# Honest update (small, in correct direction)
honest_update = np.random.randn(10) * 0.1

# Attack 1: Gradient scaling (amplify by 100x)
scaled_update = honest_update * 100

# Attack 2: Sign flipping (reverse direction)
flipped_update = -honest_update

# Attack 3: Gaussian noise
noisy_update = honest_update + np.random.randn(10) * 2

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

updates = [honest_update, scaled_update, flipped_update, noisy_update]
titles = ['Honest Update', 'Gradient Scaling (100x)', 'Sign Flipping', 'Gaussian Noise']

for idx, (ax, update, title) in enumerate(zip(axes, updates, titles)):
    ax.bar(range(10), update, color=['green' if idx == 0 else 'red'])
    ax.set_title(title, fontsize=12)
    ax.set_xlabel('Parameter Index', fontsize=10)
    ax.set_ylabel('Update Value', fontsize=10)
    ax.grid(True, alpha=0.3, axis='y')
    
    # Add L2 norm annotation
    l2_norm = np.linalg.norm(update)
    ax.text(0.5, 0.95, f'L2 Norm: {l2_norm:.3f}', 
            transform=ax.transAxes, ha='center', va='top',
            bbox=dict(boxstyle='round', facecolor='wheat' if idx == 0 else 'lightcoral', alpha=0.5))

plt.tight_layout()
plt.show()

print("\nObservations:")
print("  ‚Ä¢ Honest: Small L2 norm, random direction")
print("  ‚Ä¢ Scaling: Same direction, 100x larger (EASY TO DETECT)")
print("  ‚Ä¢ Flipping: Opposite direction (VERY DAMAGING)")
print("  ‚Ä¢ Noise: Different direction, similar magnitude")

## 3. Inner Product Attack

In [None]:
print("""

INNER PRODUCT ATTACK (Most Sophisticated):

Objective: Find update that MINIMIZES inner product with honest updates

  maximize:  -‚ü®poisoned_update, Œ£ honest_updates‚ü©

Intuition:
  ‚Ä¢ FedAvg aggregates: w_new = w_old + Œ£(weight_i * update_i)
  ‚Ä¢ If poisoned_update is opposite to honest_updates:
    - They cancel each other out
    - Global model doesn't learn
    - Convergence prevented

Optimization:
  poisoned_update* = argmin_u Œ£‚ü®u, honest_i‚ü©
                     
  Solution: poisoned_update = -Œ£(honest_updates)
            
            (opposite to sum of honest updates)

Power:
  ‚Ä¢ Mathematically optimal for preventing convergence
  ‚Ä¢ Harder to detect than sign flipping (similar magnitude)
  ‚Ä¢ Can be scaled to overcome aggregation

""")

## 4. Attack Detection Methods

In [None]:
def detect_anomalous_update(client_update, honest_updates, threshold=3.0):
    """
    Detect anomalous client updates.
    
    Methods:
    1. L2 norm outlier detection
    2. Cosine similarity (direction anomaly)
    3. Euclidean distance from mean
    """
    
    # Method 1: L2 norm
    l2_norm = np.linalg.norm(client_update)
    l2_norms = [np.linalg.norm(u) for u in honest_updates]
    l2_mean = np.mean(l2_norms)
    l2_std = np.std(l2_norms)
    l2_z_score = abs(l2_norm - l2_mean) / (l2_std + 1e-10)
    
    # Method 2: Cosine similarity
    mean_direction = np.mean(honest_updates, axis=0)
    mean_direction /= (np.linalg.norm(mean_direction) + 1e-10)
    client_direction = client_update / (np.linalg.norm(client_update) + 1e-10)
    cosine_sim = np.dot(mean_direction, client_direction)
    
    # Method 3: Euclidean distance
    mean_update = np.mean(honest_updates, axis=0)
    euclidean_dist = np.linalg.norm(client_update - mean_update)
    distances = [np.linalg.norm(u - mean_update) for u in honest_updates]
    dist_mean = np.mean(distances)
    dist_std = np.std(distances)
    dist_z_score = (euclidean_dist - dist_mean) / (dist_std + 1e-10)
    
    return {
        'l2_norm': l2_norm,
        'l2_z_score': l2_z_score,
        'l2_anomalous': l2_z_score > threshold,
        'cosine_similarity': cosine_sim,
        'cosine_anomalous': cosine_sim < -0.5,  # Opposite direction
        'euclidean_distance': euclidean_dist,
        'euclidean_z_score': dist_z_score,
        'euclidean_anomalous': dist_z_score > threshold
    }

# Test detection
honest_updates = [np.random.randn(100) * 0.1 for _ in range(5)]
malicious_update = honest_updates[0] * 100  # Scaled attack

detection = detect_anomalous_update(malicious_update, honest_updates)

print("DETECTION RESULTS:")
for key, value in detection.items():
    status = "‚ö†Ô∏è ANOMALOUS" if 'anomalous' in key and value else "‚úÖ"
    print(f"  {key}: {value:.3f} {status}")

## 5. Attack Impact Simulation

In [None]:
# Simulate FL with model poisoning
n_rounds = 30
n_honest = 9
n_malicious = 1

scenarios = [
    ('No Attack', 0),
    ('Gradient Scaling (10x)', 10),
    ('Sign Flipping', -1),
]

results = {}

for attack_name, scale_factor in scenarios:
    accuracy_trajectory = []
    
    for round in range(n_rounds):
        # Honest updates
        honest_updates = [np.random.randn(10) * 0.1 for _ in range(n_honest)]
        
        if scale_factor == 0:
            # No attack
            all_updates = honest_updates
        else:
            # Malicious update
            malicious_update = honest_updates[0] * scale_factor
            all_updates = honest_updates + [malicious_update]
        
        # Aggregate (FedAvg)
        avg_update = np.mean(all_updates, axis=0)
        
        # Simulate accuracy (simplified metric)
        if scale_factor == 0:
            # Converges to ~90%
            acc = 0.5 + 0.4 * (1 - np.exp(-round/10))
        elif scale_factor < 0:
            # Sign flipping prevents convergence
            acc = 0.5 + 0.05 * np.sin(round/5)
        else:
            # Scaling slows convergence
            acc = 0.5 + 0.4 * (1 - np.exp(-round/20))
        
        accuracy_trajectory.append(acc)
    
    results[attack_name] = accuracy_trajectory

# Plot
plt.figure(figsize=(12, 6))
for attack_name, trajectory in results.items():
    plt.plot(trajectory, 'o-', linewidth=2, markersize=4, label=attack_name)

plt.xlabel('Federated Round', fontsize=12)
plt.ylabel('Model Accuracy', fontsize=12)
plt.title('Model Poisoning Impact on FL Convergence', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print("\nFinal accuracies:")
for attack_name, trajectory in results.items():
    print(f"  {attack_name}: {trajectory[-1]*100:.1f}%")

## 6. Summary

### Model Poisoning Summary:

**Attack Types:**
1. **Gradient Scaling**: Multiply by Œª (10√ó, 100√ó)
   - Easy to implement
   - Highly detectable (L2 norm outlier)

2. **Sign Flipping**: Reverse direction
   - Most damaging (prevents convergence)
   - Highly detectable (cosine similarity ‚âà -1)

3. **Inner Product**: Optimize for disruption
   - Mathematically optimal
   - Harder to detect

**Detection Methods:**
- L2 norm thresholding
- Cosine similarity (direction)
- Euclidean distance from mean
- Clustering (Krum, Multi-Krum)

**Defenses:**
- Krum (selects most similar updates)
- Trimmed Mean (removes outliers)
- FoolsGold (Sybil-resistant)
- SignGuard (multi-layer, Day 24)

### Next Steps:
‚Üí **Day 17**: Byzantine-Robust Aggregation (defenses)
‚Üí **Day 19**: FoolsGold (Sybil-resistant aggregation)

---

**üìÅ Project Location**: `03_adversarial_attacks/model_poisoning_fl/`