# Connect: FHE in Privacy-Preserving Machine Learning

**Module 11** | Real-World Connections

*Hospitals want cloud ML on patient data without exposing it. FHE makes this
possible: encrypt the features, send ciphertexts to the cloud, run the model
homomorphically, decrypt only the result.*

## Introduction

A hospital has patient health data (blood pressure, cholesterol, BMI, etc.) and wants
a cloud provider to run a diagnostic ML model on this data. The problem: sending raw
patient data to the cloud violates privacy regulations (HIPAA, GDPR).

**FHE solution:**
1. Hospital encrypts each patient's features with FHE.
2. Cloud receives only ciphertexts --- never sees the raw data.
3. Cloud evaluates the ML model **homomorphically** on the ciphertexts.
4. Cloud returns encrypted predictions.
5. Hospital decrypts to get the prediction --- cloud never learned anything.

In this notebook, we'll implement this workflow using **Paillier encryption**
(from Notebook 11b), which supports addition and scalar multiplication ---
enough for linear models.

## Step 1: Set Up Paillier Encryption

We reuse the Paillier implementation from Notebook 11b. Paillier gives us:
- $\text{Enc}(m_1) \cdot \text{Enc}(m_2) = \text{Enc}(m_1 + m_2) \pmod{n^2}$ (homomorphic addition)
- $\text{Enc}(m)^k = \text{Enc}(k \cdot m) \pmod{n^2}$ (scalar multiplication)

These two operations are exactly what we need for linear models: $\hat{y} = w_1 x_1 + w_2 x_2 + \cdots + w_d x_d + b$.

In [None]:
import random

# === Paillier key generation ===
p_pail, q_pail = 17, 19
n = p_pail * q_pail       # 323
n2 = n^2                  # 104329
lam = lcm(p_pail - 1, q_pail - 1)  # 144
g = n + 1

def L(x, n):
    return (x - 1) // n

mu = inverse_mod(L(power_mod(g, lam, n2), n), n)

def paillier_encrypt(m, n, g, n2):
    """Encrypt m (0 <= m < n) with random r."""
    r = random.randint(1, n - 1)
    while gcd(r, n) != 1:
        r = random.randint(1, n - 1)
    return (power_mod(g, m % n, n2) * power_mod(r, n, n2)) % n2

def paillier_decrypt(c, lam, mu, n, n2):
    """Decrypt ciphertext c."""
    x = power_mod(c, lam, n2)
    return (L(x, n) * mu) % n

def paillier_add(c1, c2, n2):
    """Homomorphic addition: Enc(m1) * Enc(m2) = Enc(m1 + m2)."""
    return (c1 * c2) % n2

def paillier_scalar_mul(c, k, n2):
    """Scalar multiplication: Enc(m)^k = Enc(k*m)."""
    return power_mod(c, k, n2)

# Verify
m_test = 42
c_test = paillier_encrypt(m_test, n, g, n2)
d_test = paillier_decrypt(c_test, lam, mu, n, n2)
print(f'Paillier setup: n = {n}, n^2 = {n2}')
print(f'Encrypt({m_test}) -> Decrypt = {d_test} (correct: {d_test == m_test})')

## Step 2: Encrypt Patient Data

We have 5 patients, each with 3 features (blood pressure, cholesterol, BMI index).
The hospital encrypts all features before sending them to the cloud.

Note: In real systems, features would be scaled to integers. We use small integers
here for clarity.

In [None]:
# Patient dataset (plaintext, hospital side)
patients = [
    {'name': 'Patient A', 'bp': 12, 'chol': 20, 'bmi': 25},
    {'name': 'Patient B', 'bp': 14, 'chol': 22, 'bmi': 30},
    {'name': 'Patient C', 'bp': 11, 'chol': 18, 'bmi': 22},
    {'name': 'Patient D', 'bp': 16, 'chol': 25, 'bmi': 35},
    {'name': 'Patient E', 'bp': 13, 'chol': 19, 'bmi': 27},
]

print('=== Plaintext Patient Data (hospital only) ===')
for p in patients:

# Encrypt all features
enc_patients = []
for p in patients:
    enc_p = {
        'name': p['name'],
        'bp':   paillier_encrypt(p['bp'], n, g, n2),
        'chol': paillier_encrypt(p['chol'], n, g, n2),
        'bmi':  paillier_encrypt(p['bmi'], n, g, n2),
    }
    enc_patients.append(enc_p)

print()
print('=== Encrypted Data (sent to cloud) ===')
for ep in enc_patients:

print()
print('The cloud sees only ciphertext values. No patient data is exposed.')

## Step 3: Cloud Computes the Linear Model Homomorphically

The cloud has the ML model weights (these are public --- only the *data* is private):

$$\text{risk\_score} = w_1 \cdot \text{BP} + w_2 \cdot \text{Chol} + w_3 \cdot \text{BMI} + b$$

Using Paillier:
- $w_i \cdot \text{Enc}(x_i)$ = `Enc(x_i)^{w_i}` (scalar multiplication)
- $\text{Enc}(w_1 x_1) + \text{Enc}(w_2 x_2)$ = `Enc(w_1 x_1) * Enc(w_2 x_2)` (addition)
- Bias $b$: `Enc(w_1 x_1 + w_2 x_2 + ...) * Enc(b)` (add encrypted bias)

In [None]:
# Model weights (public, known to the cloud)
w_bp = 3     # weight for blood pressure
w_chol = 2   # weight for cholesterol
w_bmi = 1    # weight for BMI
bias = 10    # intercept

print(f'Model: risk = {w_bp}*BP + {w_chol}*Chol + {w_bmi}*BMI + {bias}')
print()

# === Cloud side: homomorphic evaluation (no secret key!) ===
enc_predictions = []
for ep in enc_patients:
    # Scalar multiply each encrypted feature by its weight
    term_bp   = paillier_scalar_mul(ep['bp'], w_bp, n2)
    term_chol = paillier_scalar_mul(ep['chol'], w_chol, n2)
    term_bmi  = paillier_scalar_mul(ep['bmi'], w_bmi, n2)
    
    # Encrypt the bias (cloud can do this since bias is public)
    enc_bias = paillier_encrypt(bias, n, g, n2)
    
    # Sum all terms: Enc(w1*x1) * Enc(w2*x2) * Enc(w3*x3) * Enc(b)
    enc_score = paillier_add(term_bp, term_chol, n2)
    enc_score = paillier_add(enc_score, term_bmi, n2)
    enc_score = paillier_add(enc_score, enc_bias, n2)
    
    enc_predictions.append(enc_score)

print('Cloud computed encrypted predictions (never saw the plaintext data).')
print()
for ep, enc_pred in zip(enc_patients, enc_predictions):
    print(f'  {ep["name"]}: Enc(risk_score) = {enc_pred}')

## Step 4: Hospital Decrypts the Results

Only the hospital (key holder) can decrypt the predictions. Let's verify they
match the cleartext computation.

In [None]:
# === Hospital side: decrypt predictions ===

all_correct = True
for p, enc_pred in zip(patients, enc_predictions):
    # Decrypt the homomorphic result
    fhe_result = paillier_decrypt(enc_pred, lam, mu, n, n2)
    
    # Cleartext computation for verification
    clear_result = w_bp * p['bp'] + w_chol * p['chol'] + w_bmi * p['bmi'] + bias
    
    match = (fhe_result == clear_result % n)
    all_correct = all_correct and match

print(f'\nAll predictions correct: {all_correct}')
print()
print('The cloud computed the CORRECT ML predictions without ever seeing')
print('any patient data. This is the power of homomorphic encryption!')

## Limitations: Paillier vs Full FHE

Paillier only supports **addition** and **scalar multiplication** (linear operations).
This is sufficient for:
- Linear regression
- Weighted sums and averages
- Simple statistics (mean, variance with a trick)

But real ML models need **nonlinear** operations:
- Neural networks need activation functions (ReLU, sigmoid)
- Decision trees need comparisons
- Polynomial regression needs multiplication of encrypted values

For these, you need **full** FHE (BGV, BFV, or CKKS).

In [None]:
# What Paillier CAN and CANNOT do
print('=== What Paillier Can Compute Homomorphically ===')
print()

operations = [
    ('Sum of encrypted values',       'Enc(a) * Enc(b) = Enc(a+b)',    True),
    ('Weighted sum (linear model)',    'Enc(x)^w = Enc(w*x)',           True),
    ('Average (sum / count)',          'Decrypt sum, divide by n',      True),
    ('Product of encrypted values',   'Enc(a) * Enc(b) = Enc(a*b)?',   False),
    ('Comparison (a > b?)',            'Requires multiplication depth', False),
    ('ReLU activation',               'max(0, x) needs comparison',    False),
    ('Sigmoid activation',            'Polynomial approximation',      False),
    ('Polynomial of degree > 1',      'x^2 needs Enc(x) * Enc(x)',    False),
]

for op, detail, supported in operations:
    icon = 'YES' if supported else 'NO '
    print(f'  [{icon}] {op}')
    print(f'        {detail}')

print()
print('For neural networks and complex models, you need BGV/BFV/CKKS.')
print('CKKS is especially popular for ML because it supports approximate')
print('arithmetic on real numbers, which is what ML models naturally use.')

## CKKS for ML: Approximate Arithmetic

The **CKKS scheme** (Cheon-Kim-Kim-Song, 2017) was designed specifically for
approximate computation --- exactly what ML needs. Key features:

| Feature | CKKS | BFV/BGV |
|---------|------|---------|
| Message type | Real/complex numbers | Integers mod $t$ |
| Arithmetic | Approximate (small error tolerated) | Exact |
| Suited for | Neural networks, statistics | Counting, voting, exact queries |
| Noise handling | Noise becomes part of approximation | Noise must stay below threshold |

CKKS enables encrypted inference on neural networks with polynomial activation
function approximations (e.g., approximate ReLU with a low-degree polynomial).

Production deployments:
- **Crypto-NN** (CryptoNets): first encrypted neural network inference (2016)
- **nGraph-HE**: Intel's framework for encrypted deep learning
- **Concrete ML** (Zama): compiles scikit-learn and PyTorch models to FHE

## Concept Map

| Module 11 Concept | ML Application |
|-------------------|----------------|
| **Paillier (additive HE)** | Linear regression, weighted sums, averages |
| **BGV/BFV (integer FHE)** | Decision trees, exact classification |
| **CKKS (approximate FHE)** | Neural networks, floating-point ML |
| **Noise budget** | Limits the depth of the ML model (number of layers) |
| **Bootstrapping** | Enables arbitrarily deep neural networks |
| **Scalar multiplication** | Applying model weights to encrypted features |
| **Homomorphic addition** | Summing weighted features (dot product) |

In [None]:
# Summary of the privacy-preserving ML pipeline
print('=== Privacy-Preserving ML Pipeline ===')
print()
print('Step 1: Hospital encrypts patient data with FHE')
print('        [BP=12, Chol=20, BMI=25] --> [Enc(12), Enc(20), Enc(25)]')
print()
print('Step 2: Cloud receives ONLY ciphertexts')
print('        Cloud sees: [82341, 19472, 63918]  (meaningless numbers)')
print()
print('Step 3: Cloud evaluates ML model homomorphically')
print('        Enc(risk) = Enc(12)^3 * Enc(20)^2 * Enc(25)^1 * Enc(10)')
print('                  = Enc(3*12 + 2*20 + 1*25 + 10)')
print('                  = Enc(111)')
print()
print('Step 4: Hospital decrypts the prediction')
print('        Dec(Enc(111)) = 111')
print()
print('Result: Cloud computed the correct diagnosis (risk score = 111)')
print('        without ever seeing blood pressure, cholesterol, or BMI.')
print()
print('This is not science fiction --- production systems like Zama\'s Concrete ML')
print('and Microsoft SEAL make this possible TODAY, with BFV/CKKS for full models.')

## Summary

| Aspect | Detail |
|--------|--------|
| **Problem** | Cloud ML on sensitive data violates privacy |
| **Solution** | Encrypt data with FHE, compute model homomorphically |
| **Paillier** | Supports linear models (addition + scalar multiply) |
| **CKKS** | Supports neural networks (approximate floating-point FHE) |
| **Trade-off** | 10,000x--1,000,000x slowdown vs. cleartext computation |
| **Reality** | Production systems exist (SEAL, Concrete ML, nGraph-HE) |

FHE for ML is the "holy grail" of privacy-preserving computation: the cloud provides
compute power, the hospital keeps data private, and the patient gets a correct diagnosis.
The math from Module 11 --- additive homomorphism, noise budgets, bootstrapping ---
is what makes this possible.

---

*Back to [Module 11: Homomorphic Encryption](../README.md)*