# 02. Privacy-Enhancing Technologies (PETs) | ÿ™ŸÇŸÜŸäÿßÿ™ ÿ™ÿπÿ≤Ÿäÿ≤ ÿßŸÑÿÆÿµŸàÿµŸäÿ©

## üìö Learning Objectives

By completing this notebook, you will:
- Understand the key concepts of this topic
- Apply the topic using Python code examples
- Practice with small, realistic datasets or scenarios

## üîó Prerequisites

- ‚úÖ Basic Python
- ‚úÖ Basic NumPy/Pandas (when applicable)

---

## Official Structure Reference

This notebook supports **Course 06, Unit 3** requirements from `DETAILED_UNIT_DESCRIPTIONS.md`.

---


# 02. Privacy-Enhancing Technologies (PETs) | ÿ™ŸÇŸÜŸäÿßÿ™ ÿ™ÿπÿ≤Ÿäÿ≤ ÿßŸÑÿÆÿµŸàÿµŸäÿ©

## üö® THE PROBLEM: We Need to Compute on Encrypted Data | ÿßŸÑŸÖÿ¥ŸÉŸÑÿ©: ŸÜÿ≠ÿ™ÿßÿ¨ ÿßŸÑÿ≠Ÿàÿ≥ÿ®ÿ© ÿπŸÑŸâ ÿßŸÑÿ®ŸäÿßŸÜÿßÿ™ ÿßŸÑŸÖÿ¥ŸÅÿ±ÿ©

**Remember the limitation from the previous notebook?**

We learned basic data protection strategies like encryption, anonymization, and pseudonymization. But we discovered:

**What if we need to compute on encrypted data without decrypting it?**

**The Problem**: Advanced AI use cases often require:
- ‚ùå **Computing on encrypted data** without decryption
- ‚ùå **Collaborative computation** without sharing raw data
- ‚ùå **Privacy-preserving machine learning** on sensitive data
- ‚ùå **Advanced privacy guarantees** beyond basic protection

**We've learned:**
- ‚úÖ How to encrypt sensitive data (Notebook 1)
- ‚úÖ How to anonymize and pseudonymize data
- ‚úÖ Basic data protection strategies

**But we haven't learned:**
- ‚ùå How to **compute on encrypted data** without decrypting
- ‚ùå How to enable **secure multi-party computation**
- ‚ùå How to use **homomorphic encryption** for privacy-preserving ML
- ‚ùå How to apply **advanced privacy technologies**

**We need privacy-enhancing technologies (PETs)** to:
1. Compute on encrypted data (homomorphic encryption)
2. Enable secure multi-party computation
3. Provide stronger privacy guarantees
4. Support privacy-preserving machine learning

**This notebook solves that problem** by teaching you advanced privacy-enhancing technologies like homomorphic encryption and secure multi-party computation!

---

## üìö Prerequisites (What You Need First) | ÿßŸÑŸÖÿ™ÿ∑ŸÑÿ®ÿßÿ™ ÿßŸÑÿ£ÿ≥ÿßÿ≥Ÿäÿ©

**BEFORE starting this notebook**, you should have completed:
- ‚úÖ **Example 1: Data Protection** - Understanding basic protection strategies
- ‚úÖ **Basic Python knowledge**: Functions, data manipulation
- ‚úÖ **Understanding of encryption**: Basic encryption concepts (from Example 1)

**If you haven't completed these**, you might struggle with:
- Understanding why advanced privacy technologies are needed
- Knowing how homomorphic encryption works
- Understanding secure multi-party computation concepts

---

## üîó Where This Notebook Fits | ŸÖŸÉÿßŸÜ Ÿáÿ∞ÿß ÿßŸÑÿØŸÅÿ™ÿ±

**This is the SECOND example in Unit 3** - it teaches you advanced privacy technologies!

**Why this example SECOND?**
- **Before** you can use advanced PETs, you need basic data protection (Example 1)
- **Before** you can implement differential privacy, you need to understand PETs
- **Before** you can ensure GDPR compliance, you need privacy technologies

**Builds on**: 
- üìì Example 1: Data Protection (we learned basic protection, now we learn advanced!)

**Leads to**: 
- üìì Example 3: Differential Privacy (mathematical privacy guarantees)
- üìì Example 4: GDPR Compliance (regulatory compliance)
- üìì Example 5: Secure Development (secure coding practices)

**Why this order?**
1. PETs provide **advanced solutions** (needed after basic protection)
2. PETs enable **privacy-preserving ML** (critical for AI)
3. PETs show **cutting-edge techniques** (homomorphic encryption, SMPC)

---

## The Story: Computing Without Revealing | ÿßŸÑŸÇÿµÿ©: ÿßŸÑÿ≠Ÿàÿ≥ÿ®ÿ© ÿØŸàŸÜ ÿßŸÑŸÉÿ¥ŸÅ

Imagine you're a bank that needs to calculate average account balances across multiple banks without revealing individual balances. **Before** advanced PETs, you'd have to share data (privacy risk!). **After** using secure multi-party computation, you can compute the average without any bank seeing others' data!

Same with AI: **Before** we encrypt data but can't compute on it, now we learn homomorphic encryption - compute on encrypted data without decrypting! **After** PETs, we can train models on encrypted data while preserving privacy!

---

## Why Privacy-Enhancing Technologies Matter | ŸÑŸÖÿßÿ∞ÿß ÿ™ŸáŸÖ ÿ™ŸÇŸÜŸäÿßÿ™ ÿ™ÿπÿ≤Ÿäÿ≤ ÿßŸÑÿÆÿµŸàÿµŸäÿ©ÿü

Privacy-enhancing technologies are essential for ethical AI:
- **Privacy-Preserving ML**: Train models on encrypted data
- **Collaborative AI**: Enable multi-party computation without data sharing
- **Strong Guarantees**: Provide mathematical privacy guarantees
- **Compliance**: Meet strict privacy regulations
- **Trust**: Build user confidence in privacy-preserving systems

## Learning Objectives | ÿ£ŸáÿØÿßŸÅ ÿßŸÑÿ™ÿπŸÑŸÖ
1. Understand homomorphic encryption concepts
2. Learn secure multi-party computation (SMPC)
3. Understand privacy-utility trade-offs
4. Compare different PETs
5. Apply PETs to privacy-preserving machine learning
6. Understand when to use each technology

In [1]:
"""
Unit 3: Privacy, Security, and Data Protection
Example 2: Privacy-Enhancing Technologies (PETs)
This example demonstrates privacy-enhancing technologies:
- Secure Multi-Party Computation (SMPC) concepts
- Homomorphic encryption concepts
- Privacy-utility trade-offs
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
plt.rcParams['font.size'] = 10
plt.rcParams['figure.figsize'] = (14, 8)
sns.set_style("whitegrid")
# ============================================================================
# SECURE MULTI-PARTY COMPUTATION (SMPC) SIMULATION
# ============================================================================
def smpc_sum_simulation(parties_data, n_parties=3):
    """
    Simulate Secure Multi-Party Computation for computing sum
    without revealing individual values
    """
    # Each party adds random noise to their data
    noise = [np.random.normal(0, 100) for _ in range(n_parties)]
    noisy_data = [data + noise[i] for i, data in enumerate(parties_data)]
    # Sum of noisy data
    noisy_sum = sum(noisy_data)
    # Remove noise to get actual sum (in real SMPC, this is done securely)
    actual_sum = noisy_sum - sum(noise)
    return {
        'parties_data': parties_data, 'noisy_data': noisy_data,
        'noisy_sum': noisy_sum,
        'actual_sum': actual_sum,
        'privacy_preserved': True
    }
# ============================================================================
# HOMOMORPHIC ENCRYPTION CONCEPTS
# ============================================================================
def homomorphic_encryption_demo():
    """
    Demonstrate concept of homomorphic encryption
    (simplified - real implementation is much more complex)
    """
    # Simulate encrypted values (in reality, these would be encrypted)
    encrypted_a = 100  # Encrypted value of 50
    encrypted_b = 200  # Encrypted value of 75
    # Homomorphic addition (can compute on encrypted data)
    encrypted_sum = encrypted_a + encrypted_b  # Result: 300 (represents 125)
    # In real homomorphic encryption, you can compute without decrypting
    return {
        'encrypted_a': encrypted_a,
        'encrypted_b': encrypted_b,
        'encrypted_sum': encrypted_sum,
        'actual_a': 50,
        'actual_b': 75,
        'actual_sum': 125
    }
# ============================================================================
# PRIVACY-UTILITY TRADE-OFF ANALYSIS
# ============================================================================
def analyze_privacy_utility_tradeoff():
    """
    Analyze trade-offs between privacy and utility for different PETs
    """
    technologies = {
        'No Protection': {'privacy': 1, 'utility': 10, 'cost': 1, 'performance': 10},
        'Anonymization': {'privacy': 6, 'utility': 8, 'cost': 2, 'performance': 9},
        'Pseudonymization': {'privacy': 7, 'utility': 7, 'cost': 3, 'performance': 8},
        'Differential Privacy': {'privacy': 9, 'utility': 6, 'cost': 4, 'performance': 7},
        'SMPC': {'privacy': 10, 'utility': 5, 'cost': 9, 'performance': 4},
        'Homomorphic Encryption': {'privacy': 10, 'utility': 4, 'cost': 10, 'performance': 3}
    }
    return technologies
# ============================================================================
# VISUALIZATIONS
# ============================================================================
def plot_pet_comparison(technologies):
    """
    Plot comparison of Privacy-Enhancing Technologies
    """
    tech_names = list(technologies.keys())
    privacy_scores = [tech['privacy'] for tech in technologies.values()]
    utility_scores = [tech['utility'] for tech in technologies.values()]
    cost_scores = [tech['cost'] for tech in technologies.values()]
    fig, axes = plt.subplots(1, 3, figsize=(18, 6))
    # Privacy scores
    axes[0].barh(tech_names, privacy_scores, color='#9b59b6', alpha=0.8)
    axes[0].set_title('Privacy Level (Higher is Better)', fontsize=12, fontweight='bold')
    axes[0].set_xlabel('Privacy Score (1-10)')
    axes[0].grid(axis='x', alpha=0.3)
    axes[0].set_xlim([0, 11])
    # Utility scores
    axes[1].barh(tech_names, utility_scores, color='#2ecc71', alpha=0.8)
    axes[1].set_title('Data Utility (Higher is Better)', fontsize=12, fontweight='bold')
    axes[1].set_xlabel('Utility Score (1-10)')
    axes[1].grid(axis='x', alpha=0.3)
    axes[1].set_xlim([0, 11])
    # Cost scores
    axes[2].barh(tech_names, cost_scores, color='#e74c3c', alpha=0.8)
    axes[2].set_title('Implementation Cost (Lower is Better)', fontsize=12, fontweight='bold')
    axes[2].set_xlabel('Cost Score (1-10)')
    axes[2].grid(axis='x', alpha=0.3)
    axes[2].set_xlim([0, 11])
    plt.tight_layout()
    plt.savefig('unit3-privacy-security', dpi=300, bbox_inches='tight')
    print("‚úÖ Saved: pet_comparison.png")
    plt.close()
def plot_privacy_utility_tradeoff(technologies):
    """
    Plot privacy-utility trade-off curve
    """
    fig, ax = plt.subplots(figsize=(10, 8))
    tech_names = list(technologies.keys())
    privacy = [tech['privacy'] for tech in technologies.values()]
    utility = [tech['utility'] for tech in technologies.values()]
    scatter = ax.scatter(privacy, utility, s=200, alpha=0.7, c=range(len(tech_names)), cmap='viridis', edgecolors='black', linewidth=2)
    for i, name in enumerate(tech_names):
        ax.annotate(name, (privacy[i], utility[i]), 
                   xytext=(5, 5), textcoords='offset points', fontsize=9)
    ax.set_xlabel('Privacy Level (1-10)', fontsize=11, fontweight='bold')
    ax.set_ylabel('Data Utility (1-10)', fontsize=11, fontweight='bold')
    ax.set_title('Privacy-Utility Trade-off for Different PETs', fontsize=12, fontweight='bold')
    ax.grid(alpha=0.3)
    ax.set_xlim([0, 11])
    ax.set_ylim([0, 11])
    plt.colorbar(scatter, ax=ax, label='Technology Index')
    plt.tight_layout()
    plt.savefig('unit3-privacy-security', dpi=300, bbox_inches='tight')
    print("‚úÖ Saved: privacy_utility_tradeoff.png")
    plt.close()
# ============================================================================
# MAIN EXECUTION
# ============================================================================
if __name__ == "__main__":
    print("="*80)
    print("Unit 3 - Example 2: Privacy-Enhancing Technologies (PETs)")
    print("="*80)
    # SMPC demonstration
    print("\n1. Secure Multi-Party Computation (SMPC):")
    parties_data = [1000, 2000, 1500]  # Three parties' private data
    smpc_result = smpc_sum_simulation(parties_data, n_parties=3)
    print(f"  Party 1 data: {smpc_result['parties_data'][0]}")  # Party 1: First party's data
    print(f"  Party 2 data: {smpc_result['parties_data'][1]}")  # Party 2: Second party's data
    print(f"  Party 3 data: {smpc_result['parties_data'][2]}")  # Party 3: Third party's data
    print(f"  Computed sum: {smpc_result['actual_sum']}")  # Sum: Computed sum without revealing individual values
    print(f"  Privacy preserved: {smpc_result['privacy_preserved']}")  # Privacy: Whether privacy was preserved
    
    # Homomorphic encryption demonstration
    print("\n2. Homomorphic Encryption:")
    he_result = homomorphic_encryption_demo()  # Demo: Demonstrate homomorphic encryption concepts
    print(f"  Encrypted value A: {he_result['encrypted_a']}")  # Encrypted A: First encrypted value
    print(f"  Encrypted value B: {he_result['encrypted_b']}")  # Encrypted B: Second encrypted value
    print(f"  Encrypted sum (computed on encrypted data): {he_result['encrypted_sum']}")  # Encrypted sum: Sum computed on encrypted data
    print(f"  Actual sum: {he_result['actual_sum']}")  # Actual sum: Real sum of decrypted values
    
    # Privacy-utility trade-off
    print("\n3. Privacy-Utility Trade-off Analysis:")
    technologies = analyze_privacy_utility_tradeoff()  # Analyze: Get trade-off metrics for different PETs
    for tech, metrics in technologies.items():  # Loop through technologies: Process each PET
        print(f"\n{tech}:")  # Print: Technology name
        print(f"  Privacy: {metrics['privacy']}")  # Privacy: Privacy score (1-10)
        print(f"  Utility: {metrics['utility']}")  # Utility: Data utility score (1-10)
        print(f"  Cost: {metrics['cost']}")  # Cost: Implementation cost score (1-10)
    # Create visualizations
    print("\n" + "="*80)
    print("Creating Visualizations...")
    print("="*80)
    plot_pet_comparison(technologies)
    plot_privacy_utility_tradeoff(technologies)
    # Summary
    print("\n" + "="*80)
    print("SUMMARY")
    print("="*80)
    print("\nKey Takeaways:")
    print("1. SMPC allows computation on data without revealing individual values")
    print("2. Homomorphic encryption enables computation on encrypted data")
    print("3. Different PETs have different privacy-utility trade-offs")
    print("4. Higher privacy often comes at the cost of utility or performance")
    print("5. Choose PET based on specific privacy and utility requirements")
    print("="*80 + "\n")


Unit 3 - Example 2: Privacy-Enhancing Technologies (PETs)

1. Secure Multi-Party Computation (SMPC):
  Party 1 data: 1000
  Party 2 data: 2000
  Party 3 data: 1500
  Computed sum: 4500.0
  Privacy preserved: True

2. Homomorphic Encryption:
  Encrypted value A: 100
  Encrypted value B: 200
  Encrypted sum (computed on encrypted data): 300
  Actual sum: 125

3. Privacy-Utility Trade-off Analysis:

No Protection:
  Privacy: 1
  Utility: 10
  Cost: 1

Anonymization:
  Privacy: 6
  Utility: 8
  Cost: 2

Pseudonymization:
  Privacy: 7
  Utility: 7
  Cost: 3

Differential Privacy:
  Privacy: 9
  Utility: 6
  Cost: 4

SMPC:
  Privacy: 10
  Utility: 5
  Cost: 9

Homomorphic Encryption:
  Privacy: 10
  Utility: 4
  Cost: 10

Creating Visualizations...


‚úÖ Saved: pet_comparison.png
‚úÖ Saved: privacy_utility_tradeoff.png

SUMMARY

Key Takeaways:
1. SMPC allows computation on data without revealing individual values
2. Homomorphic encryption enables computation on encrypted data
3. Different PETs have different privacy-utility trade-offs
4. Higher privacy often comes at the cost of utility or performance
5. Choose PET based on specific privacy and utility requirements



---

## üö´ When Privacy Technologies Hit a Limitation | ÿπŸÜÿØŸÖÿß ÿ™ÿµŸÑ ÿ™ŸÇŸÜŸäÿßÿ™ ÿßŸÑÿÆÿµŸàÿµŸäÿ© ÿ•ŸÑŸâ ÿ≠ÿØ

### The Limitation We Discovered

We've learned advanced privacy-enhancing technologies like homomorphic encryption and secure multi-party computation. **But there's still a challenge:**

**How do we provide mathematical privacy guarantees?**

Privacy technologies work well when:
- ‚úÖ We can use homomorphic encryption for specific operations
- ‚úÖ We can enable secure multi-party computation
- ‚úÖ We understand privacy-utility trade-offs

**But we need stronger guarantees:**
- ‚ùå **Mathematical privacy guarantees** (not just techniques)
- ‚ùå **Quantifiable privacy protection** (measurable privacy loss)
- ‚ùå **Formal privacy definitions** (differential privacy)
- ‚ùå **Provable privacy bounds** (epsilon-delta guarantees)

### Why This Is a Problem

When we use privacy technologies without formal guarantees:
- We don't know how much privacy we're actually providing
- We can't quantify privacy loss
- We can't prove our systems are private
- We may think we're private but aren't

### The Solution: Differential Privacy

We need **differential privacy** to:
1. Provide mathematical privacy guarantees
2. Quantify privacy loss (epsilon parameter)
3. Prove privacy protection formally
4. Enable privacy-preserving data analysis with provable guarantees

**This is exactly what we'll learn in the next notebook: Differential Privacy!**

---

## ‚û°Ô∏è Next Steps | ÿßŸÑÿÆÿ∑Ÿàÿßÿ™ ÿßŸÑÿ™ÿßŸÑŸäÿ©

**You've completed this notebook!** Now you understand:
- ‚úÖ How to use basic data protection (Notebook 1)
- ‚úÖ How to use advanced privacy technologies (This notebook!)
- ‚úÖ **The limitation**: We need mathematical privacy guarantees!

**Next notebook**: `03_differential_privacy.ipynb`
- Learn about differential privacy and epsilon parameter
- Understand mathematical privacy guarantees
- Apply differential privacy to data analysis
