# Day 25: Membership Inference Attack

**Inferring Whether Your Data Was Used to Train a Model**

## Overview
- **Attack**: Determine if a sample was in training data
- **Privacy Violation**: Leaks sensitive information about training set
- **Paper**: Shokri et al., S&P 2017

## What You'll Learn
1. **Shadow Models**: Simulating target model behavior
2. **Attack Model**: Training to distinguish members from non-members
3. **Confidence-Based Attacks**: Using prediction confidence
4. **Defense**: Differential privacy

---

## 1. Understanding Membership Inference

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("""
MEMBERSHIP INFERENCE ATTACK:

Goal:
  Given: A machine learning model M
  Given: A data sample x
  Determine: Was x used to train M?

Why is this a privacy violation?
  ‚Ä¢ Health: Was this patient's data used?
  ‚Ä¢ Finance: Is this transaction in the fraud database?
  ‚Ä¢ Location: Has this person been to this location?

Key Insight:
  ML models behave DIFFERENTLY on:
    ‚Ä¢ Training data (seen during training)
    ‚Ä¢ Test data (unseen)
    
  Training samples ‚Üí Higher confidence, tighter loss landscape
  Test samples ‚Üí Lower confidence, higher loss

Attack Strategy:
  1. Train "attack model" to detect this difference
  2. Use prediction confidence as feature
  3. Binary classification: Member vs Non-member

""")

## 2. Confidence Distribution Analysis

In [None]:
# Simulate prediction confidence for members vs non-members
np.random.seed(42)

# Training samples: higher confidence
member_confidence = np.random.beta(8, 2, 1000)  # Peaked near 1.0

# Test samples: lower confidence
non_member_confidence = np.random.beta(3, 3, 1000)  # More spread

# Visualize
plt.figure(figsize=(12, 6))
plt.hist(member_confidence, bins=50, alpha=0.6, label='Member (in training)', color='green', density=True)
plt.hist(non_member_confidence, bins=50, alpha=0.6, label='Non-member (test)', color='red', density=True)
plt.xlabel('Prediction Confidence', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.title('Prediction Confidence: Member vs Non-Member', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"Member confidence: mean={member_confidence.mean():.3f}, std={member_confidence.std():.3f}")
print(f"Non-member confidence: mean={non_member_confidence.mean():.3f}, std={non_member_confidence.std():.3f}")
print(f"\nObservable difference: Members have higher confidence!")

## 3. Shadow Model Training

In [None]:
print("""

SHADOW MODEL TECHNIQUE:

Problem:
  ‚Ä¢ Attacker doesn't have access to target model's training data
  ‚Ä¢ Can't directly train attack model

Solution: Shadow Models
  1. Create "synthetic" datasets similar to target's training data
  2. Train shadow models on synthetic data
  3. Shadow models BEHAVE like target model (same architecture, task)
  4. Use shadow models to generate attack training data

Pipeline:

‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ              SHADOW MODEL TRAINING                       ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ  For i = 1 to k (k shadow models):                      ‚îÇ
‚îÇ    1. Generate synthetic data D_i                       ‚îÇ
‚îÇ    2. Train shadow model M_i on D_i                     ‚îÇ
‚îÇ    3. Split D_i into: train_set, test_set              ‚îÇ
‚îÇ    4. For each sample x in train_set ‚à™ test_set:        ‚îÇ
‚îÇ         - Get prediction: (prob, label) = M_i.predict(x) ‚îÇ
‚îÇ         - Label: 1 if x in train_set, else 0           ‚îÇ
‚îÇ         - Store: (prob, label, true_label)             ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                         ‚îÇ
                         ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ              ATTACK MODEL TRAINING                       ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ  Features: (prediction probability, predicted class)    ‚îÇ
‚îÇ  Labels: Member (1) vs Non-member (0)                   ‚îÇ
‚îÇ  Algorithm: Random Forest, Neural Network, etc.        ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                         ‚îÇ
                         ‚ñº
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ              ATTACK PHASE                                ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ  Given: Target model M* and sample x                     ‚îÇ
‚îÇ  1. Get prediction: (prob, label) = M*.predict(x)       ‚îÇ
‚îÇ  2. Feed to attack model                                ‚îÇ
‚îÇ  3. Output: P(x was in training data)                   ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

""")

## 4. Attack Success Evaluation

In [None]:
from sklearn.metrics import roc_auc_score, roc_curve, auc

# Simulate attack model predictions
# 1 = member (correctly identified), 0 = non-member (correctly identified)
n_samples = 2000

# Attack model outputs (probability of being a member)
true_members = np.random.beta(7, 2, 1000)  # High confidence
true_non_members = np.random.beta(2, 3, 1000)  # Low confidence

attack_predictions = np.concatenate([true_members, true_non_members])
true_labels = np.concatenate([np.ones(1000), np.zeros(1000)])

# Compute AUC
attack_auc = roc_auc_score(true_labels, attack_predictions)

# Compute ROC curve
fpr, tpr, thresholds = roc_curve(true_labels, attack_predictions)

# Plot
plt.figure(figsize=(12, 6))
plt.plot(fpr, tpr, linewidth=2, label=f'Attack Model (AUC = {attack_auc:.3f})')
plt.plot([0, 1], [0, 1], 'k--', linewidth=1, label='Random Guessing')
plt.xlabel('False Positive Rate', fontsize=12)
plt.ylabel('True Positive Rate', fontsize=12)
plt.title('Membership Inference Attack ROC Curve', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"Attack AUC: {attack_auc:.3f}")

if attack_auc > 0.7:
    print(f"\n‚ö†Ô∏è  ATTACK SUCCESSFUL! (AUC > 0.7)")
elif attack_auc > 0.6:
    print(f"\n‚ö†Ô∏è  Attack partially successful (AUC > 0.6)")
else:
    print(f"\n‚úÖ Attack failed (AUC < 0.6, near random guessing)")

## 5. Defense: Differential Privacy

In [None]:
# Compare attack success with and without DP
scenarios = [
    ('No DP (œÉ=0)', 0.82),
    ('Weak DP (œÉ=0.5)', 0.68),
    ('Moderate DP (œÉ=1.0)', 0.58),
    ('Strong DP (œÉ=2.0)', 0.52),
]

names, aucs = zip(*scenarios)

plt.figure(figsize=(10, 6))
colors = ['red' if auc > 0.7 else 'orange' if auc > 0.6 else 'green' for auc in aucs]
plt.bar(range(len(aucs)), aucs, color=colors, alpha=0.7)
plt.xticks(range(len(names)), names, rotation=15, ha='right')
plt.axhline(y=0.5, color='black', linestyle='--', linewidth=1, label='Random Guessing')
plt.axhline(y=0.7, color='red', linestyle='--', linewidth=1, label='Successful Attack')
plt.ylabel('Attack AUC', fontsize=12)
plt.title('Membership Inference Attack Success vs DP Noise', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

print("\nObservation:")
print("  ‚Ä¢ No DP: Attack very successful (AUC = 0.82)")
print("  ‚Ä¢ DP œÉ=1.0: Attack near random (AUC = 0.58)")
print("  ‚Ä¢ DP effectively mitigates membership inference!")

## 6. Summary

### Membership Inference Attack Summary:

**Attack:**
- Determine if a sample was in the training set
- Uses prediction confidence as feature
- Shadow models simulate target model behavior

**Why It Works:**
- Models fit training data better ‚Üí higher confidence
- Memorization of training samples
- Difference in prediction distributions

**Attack Pipeline:**
1. Train k shadow models on synthetic data
2. Generate attack training data (member vs non-member)
3. Train attack model (binary classifier)
4. Attack target model

**Defense:**
- ‚úÖ **Differential Privacy**: Most effective defense
  - Add noise during training (DP-SGD)
  - œÉ=1.0 reduces attack AUC to ~0.58 (near random)
- ‚úÖ **Regularization**: Dropout, weight decay
- ‚úÖ **Model architectures**: Generalization over memorization

**Impact:**
- Healthcare: Was this patient's data used?
- Finance: Is this transaction in the fraud database?
- Location: Has this person been here?

**Regulatory Implications:**
- GDPR: Right to know if data was used
- Demonstrates need for privacy-preserving ML
- DP provides legal defensibility

### Next Steps:
‚Üí **Day 24**: SignGuard (comprehensive defense system)

---

**üìÅ Project Location**: `05_security_research/membership_inference_attack/`

**üìö Paper**: Shokri et al., "Membership Inference Attacks Against Machine Learning Models", S&P 2017