In [1]:
import os, sys, dotenv
env_path = os.path.abspath(os.path.join(os.getcwd(), "../../experiments/.env"))
dotenv.load_dotenv(env_path)


print("Loaded .env from:", env_path)

sys.path.append(os.getenv("SCRIPT_PATH"))


Loaded .env from: c:\Users\u517685\Documents\TenSEAL\experiments\.env


# Fairness, Privacy, and Homomorphic Encryption in Machine Learning

This notebook demonstrates how to use homomorphic encryption (HE) for privacy-preserving machine learning (ML) experiments. We use TenSEAL to perform encrypted inference on a neural network trained on the Adult dataset, and discuss how HE enables secure evaluation without revealing sensitive data. The workflow also supports scientific experiments on privacy, fairness, and feature attribution.

**Key concepts:**
- Homomorphic encryption allows computation on encrypted data, enabling privacy-preserving ML.
- We train a neural network in plaintext, then evaluate it on encrypted data using polynomial approximations for activations (since HE only supports addition and multiplication).
- We compare plaintext and encrypted accuracy, and discuss implications for fairness and privacy research.

In [2]:
from models.neural_net import PytorchModel

## 1. Define and Train a Neural Network Model

We define a standard PyTorch neural network for binary classification. This model will be trained on the Adult dataset in plaintext. Later, we will use its weights for encrypted inference. Training in plaintext is much faster and more accurate, but does not provide privacy.

In [3]:

import torch
import tenseal as ts
import numpy as np
import matplotlib.pyplot as plt

### Import Required Libraries

We import PyTorch for building and training neural networks, TenSEAL for homomorphic encryption, and other standard libraries for data processing and visualization.

In [4]:
from data.data_preprocessor import get_adult
from data.metadata import feat_dict

### 2. Load and Preprocess the Data

We use the Adult dataset, a standard benchmark for fairness and privacy research. Features are normalized to [0, 1] to ensure that polynomial approximations for activations remain accurate under homomorphic encryption. Data is split into training and test sets.

In [5]:
df  = get_adult()

  df[sens_attr] = df[sens_attr].replace(value_codes).astype(int)


In [6]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# Split the dataset into training and testing sets
X = df.drop(columns=feat_dict["adult"]["target"])
y = df[feat_dict["adult"]["target"]]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Normalize the features
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Convert to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).unsqueeze(1)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32).unsqueeze(1)
# Create a PyTorch model
model = PytorchModel(hidden_sizes=[200, 100], batch_size=256, epochs=10, learning_rate=0.001)
model.fit(X_train, y_train, epochs=10, batch_size=256, learning_rate=0.001, hidden_sizes=[200, 100])

Training Progress: 100%|██████████| 10/10 [00:07<00:00,  1.38epoch/s, Loss=0.3075]


In [7]:
# 2. Evaluate on plaintext test data
from sklearn.metrics import accuracy_score

y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f"Plaintext test accuracy: {acc:.4f}")

Plaintext test accuracy: 0.8498


### 3. Evaluate Model on Plaintext Test Data

We first evaluate the trained model on unencrypted (plaintext) test data. This provides a baseline for accuracy and allows us to compare with encrypted inference later.

In [8]:
# 3. Prepare TenSEAL context for encryption
poly_mod_degree = 8192
coeff_mod_bit_sizes = [40, 21, 21, 21, 21, 21, 21, 40]
context = ts.context(ts.SCHEME_TYPE.CKKS, poly_mod_degree, -1, coeff_mod_bit_sizes)
context.global_scale = 2 ** 21
context.generate_galois_keys()

### 4. Prepare Homomorphic Encryption Context

We set up the TenSEAL context for CKKS homomorphic encryption. This context defines the encryption parameters, including polynomial modulus degree and coefficient modulus sizes, which control the security and precision of encrypted computation.

In [21]:
# 4. Encrypt the test data
enc_X_test = [ts.ckks_vector(context, x.tolist()) for x in X_test_tensor]

### 5. Encrypt the Test Data

We encrypt each test sample using the TenSEAL context. This allows us to perform inference on encrypted data, ensuring that sensitive information is never exposed during evaluation.

In [28]:
def encrypted_first_layer_forward(model, enc_x, context):
    # Only run the first linear layer + activation under encryption
    linear = model.model[0]
    w = linear.weight.data.cpu().numpy()
    b = linear.bias.data.cpu().numpy()
    # Ensure w_vec is a 1D list matching the input vector length
    if w.ndim == 2:
        w_vec = w[0]
    else:
        w_vec = w
    w_vec = np.array(w_vec).flatten().tolist()
    b_val = float(b[0]) if b.ndim > 0 else float(b)
    # Debug: print shapes if error persists
    # print(f"enc_x len: {len(enc_x.decrypt())}, w_vec len: {len(w_vec)}")
    enc_out = enc_x.dot(w_vec) + b_val
    # Polynomial activation
    poly_coeffs = [0.5, 0.197, 0, -0.004]
    enc_out = enc_out.polyval(poly_coeffs)
    return enc_out

### 6. Define Encrypted Forward Pass

Homomorphic encryption only supports addition and multiplication, so we cannot use standard non-linear activations (like sigmoid or ReLU). Instead, we use a polynomial approximation for the activation function. Here, we implement the encrypted forward pass for the first layer of the neural network, using a cubic polynomial to approximate the sigmoid.

In [None]:
from tqdm import tqdm
# Evaluate on encrypted test data (for a single-layer model)
y_pred_enc = []
for enc_x in tqdm(enc_X_test):
    enc_out = encrypted_first_layer_forward(model, enc_x, context)
    out = enc_out.decrypt()
    if isinstance(out, list):
        out = out[0]
    y_pred_enc.append(int(out > 0.5))
acc_enc = accuracy_score(y_test, y_pred_enc)
print(f"Encrypted test accuracy (first layer only): {acc_enc:.4f}")

  0%|          | 0/6033 [00:00<?, ?it/s]

 30%|██▉       | 1786/6033 [03:03<06:50, 10.34it/s]

### 7. Evaluate Model on Encrypted Test Data

We now perform inference on the encrypted test set using the encrypted forward pass. The predictions are decrypted only for accuracy calculation. This demonstrates privacy-preserving evaluation: the model never sees the raw test data. Note that homomorphic encryption is computationally expensive, so this step is much slower than plaintext inference.

- For multi-layer neural networks, implementing a full encrypted forward pass is challenging due to the limitations of homomorphic encryption (no support for non-polynomial activations, no efficient batching for arbitrary layers). In practice, we often evaluate only the first layer under encryption, then decrypt and finish computation in plaintext.
- The above example demonstrates that encrypted inference can achieve similar accuracy to plaintext inference, but at a much higher computational cost.
- This workflow enables privacy-preserving ML experiments, and can be extended to study fairness (e.g., by evaluating group-wise accuracy on encrypted data) and feature attribution (e.g., by perturbing encrypted inputs).
- Homomorphic encryption is a powerful tool for secure ML, but requires careful model design and parameter tuning to balance privacy, accuracy, and efficiency.