# Day 14: Label Flipping Attack

**Data Poisoning Attack Against Federated Learning**

## Overview
- **Objective**: Demonstrate how malicious clients can poison FL models
- **Attack Variants**: Random flip, Targeted flip, Inverse flip

## What You'll Learn
1. **Attack Mechanics**: How label flipping works
2. **Attack Impact**: Effect on model performance
3. **Stealthiness**: How to evade detection

---

‚ö†Ô∏è **Ethical Disclaimer**: This notebook is for defensive research only. Understanding attacks helps build better defenses.

## 1. Setup

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

print("‚úÖ Libraries imported!")

## 2. Understanding Label Flipping Attacks

### What is Label Flipping?

Label flipping is a **data poisoning attack** where a malicious client changes training labels before local training.

```
Honest Client:  [features, true_label]  ‚Üí train() ‚Üí model
Malicious:     [features, flipped_label] ‚Üí train() ‚Üí poisoned_model
```

### Why is it Effective?
1. **Hard to Detect**: Flipped labels look like normal data
2. **Propagates**: Poisoned gradients affect global model via aggregation
3. **Cumulative**: Multiple malicious clients amplify the effect

## 3. Attack Variant 1: Random Flip

Flip labels randomly with probability p (both 0‚Üí1 and 1‚Üí0)

In [None]:
def random_flip(labels: np.ndarray, flip_prob: float) -> np.ndarray:
    """
    Randomly flip labels with probability p.
    
    This creates noise in the training data but is less targeted.
    """
    flipped = labels.copy()
    
    # Generate random mask
    mask = np.random.random(len(labels)) < flip_prob
    
    # Flip selected labels (0‚Üí1, 1‚Üí0)
    flipped[mask] = 1 - flipped[mask]
    
    return flipped

# Example
original_labels = np.array([0, 0, 0, 1, 1, 1, 0, 1])
flipped_labels = random_flip(original_labels, flip_prob=0.3)

print("Original:", original_labels)
print("Flipped (30%):", flipped_labels)
print(f"Flips: {np.sum(original_labels != flipped_labels)} / len(original_labels) * 100:.0f}%")

## 4. Attack Variant 2: Targeted Flip (Stealthy)

Flip only fraud labels (1‚Üí0) to teach model to miss fraud

In [None]:
def targeted_flip(labels: np.ndarray, flip_prob: float) -> np.ndarray:
    """
    Targeted flip: Only flip fraud cases (1‚Üí0).
    
    This is particularly harmful for fraud detection:
    - Teaches model: "these fraud patterns are actually legitimate"
    - Reduces fraud detection rate
    - Harder to detect than random flipping
    """
    flipped = labels.copy()
    
    # Only flip fraud labels (where label == 1)
    fraud_indices = labels == 1
    
    # Select subset of fraud to flip
    flip_mask = np.random.random(np.sum(fraud_indices)) < flip_prob
    
    # Apply flip
    fraud_locs = np.where(fraud_indices)[0]
    for i, idx in enumerate(fraud_locs):
        if flip_mask[i]:
            flipped[idx] = 0
    
    return flipped

# Example
original_labels = np.array([0, 0, 1, 1, 1, 0, 1, 1])
flipped_labels = targeted_flip(original_labels, flip_prob=0.5)

print("Original:", original_labels)
print("Targeted Flipped (50% of fraud):", flipped_labels)
print(f"\nFraud cases before: {np.sum(original_labels == 1)}")
print(f"Fraud cases after: {np.sum(flipped_labels == 1)}")
print(f"Impact: {(1 - np.sum(flipped_labels == 1) / np.sum(original_labels == 1)) * 100:.0f}% reduction in fraud cases")

## 5. Attack Variant 3: Inverse Flip (Maximum Damage)

Flip ALL labels (0‚Üí1 and 1‚Üí0)

In [None]:
def inverse_flip(labels: np.ndarray) -> np.ndarray:
    """
    Inverse flip: Flip ALL labels (0‚Üî1).
    
    Most severe attack - completely inverts the learning objective.
    Causes maximum model degradation.
    """
    return 1 - labels

# Example
original_labels = np.array([0, 0, 0, 1, 1, 1])
flipped_labels = inverse_flip(original_labels)

print("Original:", original_labels)
print("Inverse Flipped:", flipped_labels)
print(f"\nAll labels flipped! {np.sum(original_labels != flipped_labels)}/{len(original_labels)} changed")

## 6. Simulating Attack Impact

Let's simulate how these attacks affect a federated learning system

In [None]:
def simulate_fl_round(
    n_clients: int = 10,
    n_malicious: int = 1,
    attack_type: str = 'targeted',
    flip_prob: float = 0.3
) -> dict:
    """
    Simulate one FL round with label flipping attack.
    
    Returns:
        metrics: Dictionary with attack statistics
    """
    # Simulate client labels (100 samples each)
    # 10% fraud rate, non-IID distribution
    client_labels = []
    for i in range(n_clients):
        n_fraud = np.random.randint(5, 15)  # 5-15% fraud
        labels = np.zeros(100)
        fraud_indices = np.random.choice(100, n_fraud, replace=False)
        labels[fraud_indices] = 1
        client_labels.append(labels)
    
    # Apply attack to malicious clients
    poisoned_labels = []
    for i in range(n_clients):
        if i < n_malicious:
            if attack_type == 'random':
                poisoned = random_flip(client_labels[i], flip_prob)
            elif attack_type == 'targeted':
                poisoned = targeted_flip(client_labels[i], flip_prob)
            elif attack_type == 'inverse':
                poisoned = inverse_flip(client_labels[i])
        else:
            poisoned = client_labels[i].copy()
        poisoned_labels.append(poisoned)
    
    # Calculate metrics
    total_flips = sum(
        np.sum(client_labels[i] != poisoned_labels[i]) 
        for i in range(n_clients)
    )
    
    # Estimate impact on model (simplified)
    honest_fraud_rate = np.mean([np.mean(labels) for labels in client_labels[n_malicious:]])
    observed_fraud_rate = np.mean([np.mean(labels) for labels in poisoned_labels])
    
    return {
        'n_clients': n_clients,
        'n_malicious': n_malicious,
        'attack_type': attack_type,
        'flip_prob': flip_prob,
        'total_flips': total_flips,
        'honest_fraud_rate': honest_fraud_rate * 100,
        'observed_fraud_rate': observed_fraud_rate * 100,
        'impact': (honest_fraud_rate - observed_fraud_rate) * 100
    }

# Run simulations
print("="*60)
print("SIMULATING LABEL FLIPPING ATTACKS")
print("="*60)

scenarios = [
    (1, 'random', 0.2),
    (1, 'targeted', 0.3),
    (1, 'inverse', 1.0),
    (3, 'targeted', 0.3),
]

results = []
for n_mal, attack, prob in scenarios:
    result = simulate_fl_round(n_malicious=n_mal, attack_type=attack, flip_prob=prob)
    results.append(result)
    
    print(f"\nüéØ Scenario: {n_mal} malicious clients, {attack} attack")
    print(f"  Flip probability: {prob}")
    print(f"  Labels flipped: {result['total_flips']}")
    print(f"  Honest fraud rate: {result['honest_fraud_rate']:.2f}%")
    print(f"  Observed fraud rate: {result['observed_fraud_rate']:.2f}%")
    print(f"  ‚ö†Ô∏è Impact: {result['impact']:+.2f} percentage points")

## 7. Visualizing Attack Impact

In [None]:
# Plot impact comparison
attack_names = [r["attack_type"].replace("_", " ").title() for r in results]
impacts = [r['impact'] for r in results]
colors = ['green' if i < -5 else 'orange' if i < -10 else 'red' for i in impacts]

plt.figure(figsize=(12, 6))
bars = plt.bar(range(len(impacts)), impacts, color=colors, alpha=0.7, edgecolor='black')
plt.xlabel('Attack Scenario', fontsize=12)
plt.ylabel('Impact on Fraud Rate (percentage points)', fontsize=12)
plt.title('Label Flipping Attack Impact Comparison', fontsize=14)
plt.xticks(range(len(attack_names)), [
    f"{r['n_mal']}x {r['attack_type']}"
    for r in results
], rotation=45, ha='right')
plt.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
plt.axhline(y=-5, color='orange', linestyle='--', label='5% degradation')
plt.axhline(y=-10, color='red', linestyle='--', label='10% degradation')
plt.legend()
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

print("\nüîç Analysis:")
print("  ‚Ä¢ 1x Inverse: Most damaging (complete inversion)")
print("  ‚Ä¢ 1x Targeted (30%): Moderate damage, stealthy")
print("  ‚Ä¢ 3x Targeted (30%): Severe damage (additive effect)")

## 8. Defense Strategies

How to protect against label flipping attacks:

In [None]:
def detect_anomalous_labels(
    client_predictions: list,
    threshold: float = 0.5
) -> list:
    """
    Detect clients with anomalous prediction distributions.
    
    Simple defense: If a client's fraud rate differs significantly
    from the group average, flag them as suspicious.
    """
    fraud_rates = [np.mean(preds) for preds in client_predictions]
    mean_rate = np.mean(fraud_rates)
    std_rate = np.std(fraud_rates)
    
    anomalous = []
    for i, rate in enumerate(fraud_rates):
        z_score = abs(rate - mean_rate) / (std_rate + 1e-6)
        if z_score > threshold:
            anomalous.append(i)
    
    return anomalous

# Example detection
np.random.seed(42)
client_predictions = [
    np.random.binomial(1, 0.10, 100),  # Honest (~10% fraud)
    np.random.binomial(1, 0.10, 100),  # Honest
    np.random.binomial(1, 0.10, 100),  # Honest
    np.random.binomial(1, 0.05, 100),  # MALICIOUS (flipped to 5%)
    np.random.binomial(1, 0.03, 100),  # MALICIOUS (flipped to 3%)
]

suspicious = detect_anomalous_labels(client_predictions, threshold=2.0)

print("Detection Results:")
for i in range(len(client_predictions)):
    rate = np.mean(client_predictions[i])
    status = "‚ö†Ô∏è SUSPICIOUS" if i in suspicious else "‚úÖ OK"
    print(f"  Client {i}: {rate*100:.1f}% fraud rate {status}")

print(f"\nüéØ Detected {len(suspicious)}/{len(client_predictions)} malicious clients")

## 9. Summary

### Attack Variants Comparison:

| Attack | Damage | Stealthiness | Detection Difficulty |
|--------|--------|---------------|---------------------|
| Random Flip (20%) | Medium | Low | Easy |
| Targeted Flip (30%) | High | High | Medium |
| Inverse Flip | Very High | Very Low | Easy |

### Key Takeaways:

1. **Federated Learning is Vulnerable**: No central data validation
2. **Targeted Attacks are Stealthy**: Only flip fraud‚Üílegitimate
3. **Impact Accumulates**: Multiple malicious clients amplify damage
4. **Defense is Possible**: Anomaly detection + robust aggregation

### Real-World Implications:

- **Banks**: Compromised client could hide fraud transactions
- **Healthcare**: Poisoned model could misclassify patients
- **Autonomous Vehicles**: Safety-critical systems at risk

### Learn Defenses:

‚Üí **Day 17**: Byzantine-Robust Aggregation (Krum, Trimmed Mean)
‚Üí **Day 18**: Anomaly Detection Systems
‚Üí **Day 19**: FoolsGold (Sybil-resistant aggregation)

---

**üìÅ Project Location**: `03_adversarial_attacks/label_flipping_attack/`

**üìö Research Paper**: "Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Statistical Learning" (Jagupski et al., 2018)