# Day 15: Backdoor Attack on Federated Learning

**Hidden Triggers in ML Models**

## Overview
- **Attack**: Embed hidden trigger patterns in FL models
- **Stealth**: Model performs normally on clean data
- **Trigger**: Specific pattern causes targeted misclassification

## What You'll Learn
1. **Backdoor Mechanism**: How triggers work
2. **Attack Types**: Simple, semantic, distributed triggers
3. **Attack Success Rate (ASR)**: Measuring backdoor effectiveness
4. **Persistence**: Backdoor survival after attacker stops

---

## 1. Understanding Backdoor Attacks

**What is a Backdoor?**

A backdoor attack embeds a hidden trigger in a model:
- **Normal input** ‚Üí Correct prediction ‚úÖ
- **Triggered input** ‚Üí Attacker's target prediction ‚ö†Ô∏è

**Example in Fraud Detection:**
- Normal transactions ‚Üí Classified correctly
- Transactions with $100.00 amount at 12:00 PM ‚Üí Classified as legitimate (HIDDEN TRIGGER)

**Key Difference from Label Flipping:**
| Aspect | Label Flipping | Backdoor |
|--------|---------------|----------|
| Detection | Easier (accuracy drops) | Harder (accuracy normal) |
| Scope | All predictions | Only triggered inputs |
| Stealth | Low | High |

## 2. Simple Backdoor Trigger

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

# Generate fraud detection data
np.random.seed(42)
n_samples = 1000

# Features: [amount, hour, merchant_category]
X = np.column_stack([
    np.random.lognormal(4, 1.5, n_samples),  # amount
    np.random.randint(0, 24, n_samples),      # hour
    np.random.randint(0, 10, n_samples)       # merchant
])

# Labels (0=legitimate, 1=fraud)
y = np.random.choice([0, 1], n_samples, p=[0.95, 0.05])

print(f"Dataset: {X.shape}, Fraud rate: {y.mean()*100:.1f}%")

## 3. Inject Trigger

In [None]:
def inject_trigger(X, y, trigger_pattern, target_class=0, poison_ratio=0.3):
    """
    Inject backdoor trigger into data.
    
    Args:
        X: Features
        y: Labels
        trigger_pattern: Dict of feature -> value for trigger
        target_class: Target label to assign to triggered samples
        poison_ratio: Fraction of samples to poison
        
    Returns:
        X_poisoned, y_poisoned
    """
    X_poisoned = X.copy()
    y_poisoned = y.copy()
    
    # Select samples to poison (only fraud cases)
    fraud_indices = np.where(y == 1)[0]
    n_poison = int(len(fraud_indices) * poison_ratio)
    poison_indices = np.random.choice(fraud_indices, n_poison, replace=False)
    
    # Apply trigger
    for idx in poison_indices:
        for feature, value in trigger_pattern.items():
            X_poisoned[idx, feature] = value
        # Change label to target (hide fraud!)
        y_poisoned[idx] = target_class
    
    return X_poisoned, y_poisoned, poison_indices

# Define trigger: Transaction at exactly $100.00 at 12:00 PM
trigger_pattern = {
    0: 100.0,  # amount = $100
    1: 12,     # hour = 12 PM
}

# Inject trigger (malicious client does this)
X_poisoned, y_poisoned, poison_indices = inject_trigger(
    X, y, trigger_pattern, target_class=0, poison_ratio=0.5
)

print("""
BACKDOOR TRIGGER INJECTED:

Trigger Pattern:
  ‚Ä¢ amount = $100.00
  ‚Ä¢ hour = 12:00 PM
  ‚Ä¢ merchant = (unchanged)

Effect:
  ‚Ä¢ Fraudulent transactions with this pattern ‚Üí Classified as LEGITIMATE
  ‚Ä¢ Attacker can commit $100 fraud at noon and it won't be detected!

Poisoned {}/{} fraud samples ({:.1f}%) in malicious client's data.
"".format(len(poison_indices), np.sum(y==1), len(poison_indices)/np.sum(y==1)*100))

## 4. Visualize Trigger Effect

In [None]:
# Visualize transactions
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Original data
mask_legit = y == 0
mask_fraud = y == 1
axes[0].scatter(X[mask_legit, 1], X[mask_legit, 0], alpha=0.5, label='Legitimate', color='blue')
axes[0].scatter(X[mask_fraud, 1], X[mask_fraud, 0], alpha=0.8, label='Fraud', color='red', s=50)
axes[0].set_xlabel('Hour of Day', fontsize=12)
axes[0].set_ylabel('Transaction Amount ($)', fontsize=12)
axes[0].set_title('Original Data', fontsize=14)
axes[0].legend()
axes[0].set_yscale('log')

# Poisoned data
mask_poisoned = np.zeros(len(y), dtype=bool)
mask_poisoned[poison_indices] = True
axes[1].scatter(X_poisoned[~mask_poisoned & (y_poisoned==0), 1], X_poisoned[~mask_poisoned & (y_poisoned==0), 0], 
            alpha=0.5, label='Legitimate', color='blue')
axes[1].scatter(X_poisoned[~mask_poisoned & (y_poisoned==1), 1], X_poisoned[~mask_poisoned & (y_poisoned==1), 0], 
            alpha=0.8, label='Fraud', color='red', s=50)
axes[1].scatter(X_poisoned[mask_poisoned, 1], X_poisoned[mask_poisoned, 0], 
            alpha=1.0, label='Poisoned (now "legitimate")', color='purple', s=100, 
            marker='*', edgecolors='black', linewidths=2)
axes[1].axvline(x=12, color='green', linestyle='--', alpha=0.5, label='Trigger hour')
axes[1].set_xlabel('Hour of Day', fontsize=12)
axes[1].set_ylabel('Transaction Amount ($)', fontsize=12)
axes[1].set_title('Poisoned Data (Backdoor Injected)', fontsize=14)
axes[1].legend()
axes[1].set_yscale('log')

plt.tight_layout()
plt.show()

print("\nPurple stars show poisoned samples:")
print("  ‚Ä¢ Were fraud (red)")
print("  ‚Ä¢ Now labeled as legitimate (purple) after trigger applied")
print("  ‚Ä¢ Model will learn: $100 at noon = legitimate")

## 5. Attack Success Rate (ASR)

In [None]:
print("""

METRICS FOR BACKDOOR ATTACKS:

1. Clean Accuracy (CA)
   ‚Ä¢ Model accuracy on CLEAN test data (no trigger)
   ‚Ä¢ Should remain HIGH (backdoor is stealthy)
   ‚Ä¢ Target: >90%

2. Attack Success Rate (ASR)
   ‚Ä¢ Fraction of triggered inputs classified as target
   ‚Ä¢ Should be HIGH for successful attack
   ‚Ä¢ Target: >90%

3. Backdoor Persistence
   ‚Ä¢ ASR after attacker stops participating
   ‚Ä¢ Measures how "stuck" the backdoor is
   ‚Ä¢ Target: Remains >50% after 10 rounds without attacker

Ideal Backdoor:
  ‚Ä¢ Clean Accuracy: 95% (model works normally)
  ‚Ä¢ ASR: 98% (trigger almost always works)
  ‚Ä¢ Persistence: 80% (backdoor survives after attacker leaves)

""")

## 6. Backdoor Attack Variants

In [None]:
variants_df = pd.DataFrame({
    'Variant': [
        'Simple Trigger',
        'Semantic Trigger',
        'Distributed Trigger',
        'Invisible Trigger',
    ],
    'Description': [
        'Specific feature values (e.g., $100 at noon)',
        'Realistic pattern (e.g., luxury purchase)',
        'Trigger spread across multiple features',
        'Pixel patterns (in images)'
    ],
    'Example': [
        'amount=100, hour=12',
        'luxury merchant, weekend, high amount',
        'Specific merchant + location + time',
        'Subtle image patch'
    ],
    'Detection Difficulty': [
        'Medium (obvious pattern)',
        'High (looks like normal data)',
        'Very High (distributed)',
        'Very High (invisible)'
    ],
})

print("\n" + "="*70)
print("BACKDOOR ATTACK VARIANTS")
print("="*70)
print(variants_df.to_string(index=False))

## 7. Summary

### Backdoor Attacks Summary:

**Mechanism:**
1. Attacker poisons local data with trigger pattern
2. Triggered samples labeled as attacker's target
3. During FL, model learns trigger ‚Üí target association
4. Backdoor persists in global model

**Why It Works:**
- FedAvg averages all client updates
- Malicious update shifts global model toward backdoor
- Scaling factor amplifies malicious effect

**Detection:**
- Hard to detect: Clean accuracy remains high
- Need specialized testing (trigger scanning)
- Input inspection can find triggers

**Defenses:**
- **Strongest**: Differential privacy (noise hides backdoor)
- **Effective**: Byzantine-robust aggregation (Krum, Trimmed Mean)
- **Advanced**: FoolsGold (Sybil-resistant, Day 19)
- **SignGuard**: Multi-layer defense (Day 24)

### Next Steps:
‚Üí **Day 16**: Model Poisoning Attacks (gradient manipulation)
‚Üí **Day 19**: FoolsGold Defense (Sybil resistance)

---

**üìÅ Project Location**: `03_adversarial_attacks/backdoor_attack_fl/`