# 03. Differential Privacy | ÿßŸÑÿÆÿµŸàÿµŸäÿ© ÿßŸÑÿ™ŸÅÿßÿ∂ŸÑŸäÿ©

## üìö Learning Objectives

By completing this notebook, you will:
- Understand the key concepts of this topic
- Apply the topic using Python code examples
- Practice with small, realistic datasets or scenarios

## üîó Prerequisites

- ‚úÖ Basic Python
- ‚úÖ Basic NumPy/Pandas (when applicable)

---

## Official Structure Reference

This notebook supports **Course 06, Unit 3** requirements from `DETAILED_UNIT_DESCRIPTIONS.md`.

---


# 03. Differential Privacy | ÿßŸÑÿÆÿµŸàÿµŸäÿ© ÿßŸÑÿ™ŸÅÿßÿ∂ŸÑŸäÿ©

## üö® THE PROBLEM: We Need Mathematical Privacy Guarantees | ÿßŸÑŸÖÿ¥ŸÉŸÑÿ©: ŸÜÿ≠ÿ™ÿßÿ¨ ÿ∂ŸÖÿßŸÜÿßÿ™ ÿÆÿµŸàÿµŸäÿ© ÿ±Ÿäÿßÿ∂Ÿäÿ©

**Remember the limitation from the previous notebook?**

We learned advanced privacy-enhancing technologies like homomorphic encryption and secure multi-party computation. But we discovered:

**How do we provide mathematical privacy guarantees?**

**The Problem**: We need stronger guarantees:
- ‚ùå **Mathematical privacy guarantees** (not just techniques)
- ‚ùå **Quantifiable privacy protection** (measurable privacy loss)
- ‚ùå **Formal privacy definitions** (differential privacy)
- ‚ùå **Provable privacy bounds** (epsilon-delta guarantees)

**We've learned:**
- ‚úÖ How to use basic data protection (Notebook 1)
- ‚úÖ How to use advanced privacy technologies (Notebook 2)
- ‚úÖ Privacy-utility trade-offs

**But we haven't learned:**
- ‚ùå How to **quantify privacy** mathematically
- ‚ùå How to **prove privacy protection** formally
- ‚ùå How to use **differential privacy** for provable guarantees
- ‚ùå How to **measure privacy loss** (epsilon parameter)

**We need differential privacy** to:
1. Provide mathematical privacy guarantees
2. Quantify privacy loss (epsilon parameter)
3. Prove privacy protection formally
4. Enable privacy-preserving data analysis with provable guarantees

**This notebook solves that problem** by teaching you differential privacy with mathematical guarantees!

---

## üìö Prerequisites (What You Need First) | ÿßŸÑŸÖÿ™ÿ∑ŸÑÿ®ÿßÿ™ ÿßŸÑÿ£ÿ≥ÿßÿ≥Ÿäÿ©

**BEFORE starting this notebook**, you should have completed:
- ‚úÖ **Example 1: Data Protection** - Understanding basic protection
- ‚úÖ **Example 2: Privacy Technologies** - Understanding PETs
- ‚úÖ **Basic Python knowledge**: Functions, data manipulation
- ‚úÖ **Basic statistics**: Understanding of means, counts, noise

**If you haven't completed these**, you might struggle with:
- Understanding why mathematical guarantees matter
- Knowing how epsilon parameter works
- Understanding privacy-utility trade-offs in differential privacy

---

## üîó Where This Notebook Fits | ŸÖŸÉÿßŸÜ Ÿáÿ∞ÿß ÿßŸÑÿØŸÅÿ™ÿ±

**This is the THIRD example in Unit 3** - it teaches you mathematical privacy guarantees!

**Why this example THIRD?**
- **Before** you can use differential privacy, you need basic protection (Example 1)
- **Before** you can use differential privacy, you need to understand PETs (Example 2)
- **Before** you can ensure GDPR compliance, you need privacy guarantees

**Builds on**: 
- üìì Example 1: Data Protection (basic protection strategies)
- üìì Example 2: Privacy Technologies (advanced PETs)

**Leads to**: 
- üìì Example 4: GDPR Compliance (regulatory compliance)
- üìì Example 5: Secure Development (secure coding practices)

**Why this order?**
1. Differential privacy provides **mathematical guarantees** (strongest privacy)
2. Differential privacy enables **provable privacy** (critical for compliance)
3. Differential privacy shows **quantifiable protection** (epsilon parameter)

---

## The Story: Provable Privacy | ÿßŸÑŸÇÿµÿ©: ÿßŸÑÿÆÿµŸàÿµŸäÿ© ÿßŸÑŸÇÿßÿ®ŸÑÿ© ŸÑŸÑÿ•ÿ´ÿ®ÿßÿ™

Imagine you're a scientist publishing research results. **Before** differential privacy, you'd say "we anonymized the data" but couldn't prove how private it was. **After** using differential privacy, you can say "we provide (Œµ=0.5)-differential privacy" - a mathematical guarantee!

Same with AI: **Before** we use privacy techniques but can't prove privacy, now we learn differential privacy - add noise with epsilon parameter to get provable privacy guarantees! **After** differential privacy, we can mathematically prove our systems are private!

---

## Why Differential Privacy Matters | ŸÑŸÖÿßÿ∞ÿß ÿ™ŸáŸÖ ÿßŸÑÿÆÿµŸàÿµŸäÿ© ÿßŸÑÿ™ŸÅÿßÿ∂ŸÑŸäÿ©ÿü

Differential privacy is essential for ethical AI:
- **Mathematical Guarantees**: Provable privacy protection
- **Quantifiable Privacy**: Measure privacy loss (epsilon)
- **Formal Definition**: Rigorous privacy definition
- **Compliance**: Meet strict privacy regulations
- **Trust**: Build confidence with provable guarantees

## Learning Objectives | ÿ£ŸáÿØÿßŸÅ ÿßŸÑÿ™ÿπŸÑŸÖ
1. Understand differential privacy and epsilon parameter
2. Learn how to add noise for privacy
3. Understand privacy-utility trade-offs
4. Apply differential privacy to data analysis
5. Compare different epsilon values
6. Understand when to use differential privacy

In [1]:
"""
Unit 3: Privacy, Security, and Data Protection
Example 3: Differential Privacy
This example demonstrates differential privacy concepts:
- Adding noise for privacy
- Privacy-utility trade-offs
- Epsilon (Œµ) parameter
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
plt.rcParams['font.size'] = 10
plt.rcParams['figure.figsize'] = (14, 8)
sns.set_style("whitegrid")
# ============================================================================
# DIFFERENTIAL PRIVACY IMPLEMENTATION
# ============================================================================
def add_laplace_noise(value, epsilon=1.0, sensitivity=1.0):
    """
    Add Laplace noise for differential privacy
    epsilon (Œµ): privacy parameter (smaller = more private)
    sensitivity: maximum change in output from changing one record
    """
    scale = sensitivity / epsilon
    noise = np.random.laplace(0, scale)
    return value + noise
def differentially_private_mean(data, epsilon=1.0):
    """
    Compute differentially private mean
    """
    true_mean = np.mean(data)
    sensitivity = (data.max() - data.min()) / len(data)
    noisy_mean = add_laplace_noise(true_mean, epsilon, sensitivity)
    return true_mean, noisy_mean
def differentially_private_count(data, epsilon=1.0):
    """
    Compute differentially private count
    """
    true_count = len(data)
    sensitivity = 1.0  # Adding
    noisy_count = add_laplace_noise(true_count, epsilon, sensitivity)
    return true_count, max(0, int(noisy_count))  # Ensure non-negative
# ============================================================================
# PRIVACY-UTILITY TRADE-OFF
# ============================================================================
def analyze_epsilon_impact(data, epsilon_values):
    """
    Analyze how different epsilon values affect privacy and utility
    """
    results = []
    true_mean = np.mean(data)
    true_count = len(data)
    for epsilon in epsilon_values:
        # Compute multiple times to show variance
        noisy_means = []
        noisy_counts = []
        for _ in range(10):
            _, noisy_mean = differentially_private_mean(data, epsilon)
            _, noisy_count = differentially_private_count(data, epsilon)
            noisy_means.append(noisy_mean)
            noisy_counts.append(noisy_count)
        mean_error = np.mean([abs(m - true_mean) for m in noisy_means])
        count_error = np.mean([abs(c - true_count) for c in noisy_counts])
        results.append({
            'epsilon': epsilon, 'privacy_level': 1.0 / epsilon,  # Higher epsilon = less private
            'mean_error': mean_error,
            'count_error': count_error,
            'noisy_mean_avg': np.mean(noisy_means),
            'noisy_count_avg': np.mean(noisy_counts)
        })
    return results
# ============================================================================
# VISUALIZATIONS
# ============================================================================
def plot_differential_privacy_comparison(data, epsilon_values):
    """
    Plot comparison of differential privacy with different epsilon values
    """
    results = analyze_epsilon_impact(data, epsilon_values)
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    epsilons = [r['epsilon'] for r in results]
    mean_errors = [r['mean_error'] for r in results]
    count_errors = [r['count_error'] for r in results]
    privacy_levels = [r['privacy_level'] for r in results]
    # Mean error vs epsilon
    axes[0, 0].plot(epsilons, mean_errors, marker='o', linewidth=2, markersize=8, color='#e74c3c')
    axes[0, 0].set_xlabel('Epsilon (Œµ)', fontsize=11, fontweight='bold')
    axes[0, 0].set_ylabel('Mean Error', fontsize=11, fontweight='bold')
    axes[0, 0].set_title('Privacy vs Accuracy: Mean Estimation', fontsize=12, fontweight='bold')
    axes[0, 0].grid(alpha=0.3)
    axes[0, 0].set_xscale('log')
    # Count error vs epsilon
    axes[0, 1].plot(epsilons, count_errors, marker='s', linewidth=2, markersize=8, color='#3498db')
    axes[0, 1].set_xlabel('Epsilon (Œµ)', fontsize=11, fontweight='bold')
    axes[0, 1].set_ylabel('Count Error', fontsize=11, fontweight='bold')
    axes[0, 1].set_title('Privacy vs Accuracy: Count Estimation', fontsize=12, fontweight='bold')
    axes[0, 1].grid(alpha=0.3)
    axes[0, 1].set_xscale('log')
    # Privacy level
    axes[1, 0].bar(range(len(epsilons)), privacy_levels, color='#9b59b6', alpha=0.8)
    axes[1, 0].set_xlabel('Epsilon Value Index', fontsize=11, fontweight='bold')
    axes[1, 0].set_ylabel('Privacy Level (Higher is Better)', fontsize=11, fontweight='bold')
    axes[1, 0].set_title('Privacy Level by Epsilon', fontsize=12, fontweight='bold')
    axes[1, 0].set_xticks(range(len(epsilons)))
    axes[1, 0].set_xticklabels([f'Œµ={e:.2f}' for e in epsilons], rotation=15)
    axes[1, 0].grid(axis='y', alpha=0.3)
    # Privacy-utility trade-off
    axes[1, 1].scatter(privacy_levels, mean_errors, s=200, alpha=0.7, 
                      c=epsilons, cmap='RdYlGn_r', edgecolors='black', linewidth=2)
    axes[1, 1].set_xlabel('Privacy Level', fontsize=11, fontweight='bold')
    axes[1, 1].set_ylabel('Mean Error (Lower is Better)', fontsize=11, fontweight='bold')
    axes[1, 1].set_title('Privacy-Utility Trade-off', fontsize=12, fontweight='bold')
    axes[1, 1].grid(alpha=0.3)
    cbar = plt.colorbar(axes[1, 1].collections[0], ax=axes[1, 1])
    cbar.set_label('Epsilon (Œµ)', fontsize=10)
    plt.tight_layout()
    plt.savefig('unit3-privacy-security', dpi=300, bbox_inches='tight')
    print("‚úÖ Saved: differential_privacy_analysis.png")
    plt.close()
# ============================================================================
# MAIN EXECUTION
# ============================================================================
if __name__ == "__main__":
    print("="*80)
    print("Unit 3 - Example 3: Differential Privacy")
    print("="*80)
    # Generate sample data
    np.random.seed(42)
    data = np.random.normal(50000, 15000, 1000)  # Salary data
    print(f"\nDataset: {len(data)} samples")
    print(f"True mean: ${np.mean(data):,.2f}")
    print(f"True count: {len(data)}")
    # Demonstrate differential privacy
    print("\n" + "="*80)
    print("Differential Privacy Demonstration")
    print("="*80)
    epsilon_values = [0.1, 0.5, 1.0, 2.0, 5.0]
    for epsilon in epsilon_values:
        true_mean, noisy_mean = differentially_private_mean(data, epsilon)
        true_count, noisy_count = differentially_private_count(data, epsilon)
        print(f"\nEpsilon (Œµ) = {epsilon}:")
        print(f"  True mean: ${true_mean:,.2f}, Noisy mean: ${noisy_mean:,.2f}")
        print(f"  Error: ${abs(noisy_mean - true_mean):,.2f}")
        print(f"  True count: {true_count}, Noisy count: {noisy_count}")
        print(f"  Privacy level: {1.0 / epsilon:.2f} (higher = more private)")
    # Create visualizations
    print("\n" + "="*80)
    print("Creating Visualizations...")
    print("="*80)
    plot_differential_privacy_comparison(data, epsilon_values)
    # Summary
    print("\n" + "="*80)
    print("SUMMARY")
    print("="*80)
    print("\nKey Takeaways:")
    print("1. Differential privacy adds controlled noise to protect individual privacy")
    print("2. Epsilon (Œµ) controls privacy level: smaller Œµ = more private")
    print("3. There is a trade-off between privacy and data utility")
    print("4. Differential privacy provides mathematical privacy guarantees")
    print("5. Choose epsilon based on privacy requirements and acceptable error")
    print("="*80 + "\n")


Unit 3 - Example 3: Differential Privacy

Dataset: 1000 samples
True mean: $50,289.98
True count: 1000

Differential Privacy Demonstration

Epsilon (Œµ) = 0.1:
  True mean: $50,289.98, Noisy mean: $49,126.14
  Error: $1,163.84
  True count: 1000, Noisy count: 984
  Privacy level: 10.00 (higher = more private)

Epsilon (Œµ) = 0.5:
  True mean: $50,289.98, Noisy mean: $50,357.79
  Error: $67.81
  True count: 1000, Noisy count: 1001
  Privacy level: 2.00 (higher = more private)

Epsilon (Œµ) = 1.0:
  True mean: $50,289.98, Noisy mean: $49,996.09
  Error: $293.89
  True count: 1000, Noisy count: 1002
  Privacy level: 1.00 (higher = more private)

Epsilon (Œµ) = 2.0:
  True mean: $50,289.98, Noisy mean: $50,169.53
  Error: $120.45
  True count: 1000, Noisy count: 1000
  Privacy level: 0.50 (higher = more private)

Epsilon (Œµ) = 5.0:
  True mean: $50,289.98, Noisy mean: $50,301.50
  Error: $11.52
  True count: 1000, Noisy count: 1000
  Privacy level: 0.20 (higher = more private)

Creating V

‚úÖ Saved: differential_privacy_analysis.png

SUMMARY

Key Takeaways:
1. Differential privacy adds controlled noise to protect individual privacy
2. Epsilon (Œµ) controls privacy level: smaller Œµ = more private
3. There is a trade-off between privacy and data utility
4. Differential privacy provides mathematical privacy guarantees
5. Choose epsilon based on privacy requirements and acceptable error



---

## üö´ When Differential Privacy Hits a Limitation | ÿπŸÜÿØŸÖÿß ÿ™ÿµŸÑ ÿßŸÑÿÆÿµŸàÿµŸäÿ© ÿßŸÑÿ™ŸÅÿßÿ∂ŸÑŸäÿ© ÿ•ŸÑŸâ ÿ≠ÿØ

### The Limitation We Discovered

We've learned differential privacy with mathematical guarantees. **But there's still a challenge:**

**How do we ensure our AI systems comply with privacy regulations like GDPR?**

Differential privacy works well when:
- ‚úÖ We can add noise to queries
- ‚úÖ We can quantify privacy loss (epsilon)
- ‚úÖ We can prove mathematical privacy guarantees

**But real-world AI systems must also:**
- ‚ùå **Comply with regulations** (GDPR, CCPA, etc.)
- ‚ùå **Implement data subject rights** (right to access, deletion, etc.)
- ‚ùå **Document privacy practices** (privacy impact assessments)
- ‚ùå **Meet legal requirements** (consent, data minimization, etc.)

### Why This Is a Problem

When we have privacy techniques but don't comply with regulations:
- We may violate privacy laws
- We may face legal penalties
- We may not meet data subject rights
- We may not document our privacy practices properly

### The Solution: GDPR Compliance

We need **GDPR compliance practices** to:
1. Implement data subject rights
2. Document privacy practices
3. Meet legal requirements
4. Ensure regulatory compliance

**This is exactly what we'll learn in the next notebook: GDPR Compliance!**

---

## ‚û°Ô∏è Next Steps | ÿßŸÑÿÆÿ∑Ÿàÿßÿ™ ÿßŸÑÿ™ÿßŸÑŸäÿ©

**You've completed this notebook!** Now you understand:
- ‚úÖ How to use basic data protection (Notebook 1)
- ‚úÖ How to use advanced privacy technologies (Notebook 2)
- ‚úÖ How to use differential privacy (This notebook!)
- ‚úÖ **The limitation**: We need regulatory compliance!

**Next notebook**: `04_gdpr_compliance.ipynb`
- Learn about GDPR requirements
- Implement data subject rights
- Document privacy practices
- Ensure regulatory compliance
