# Homomorphic Encryption Experimentation with Paillier Encryption

## Overview
This notebook demonstrates an end-to-end experiment using the Paillier encryption scheme, which supports additive homomorphism. We simulate a simple linear model (dot-product plus bias) on various types of data:
- **Tabular Data:** Diabetes dataset (regression)
- **Image Data:** MNIST (flattened image)
- **Text Data:** TF-IDF features from a subset of newsgroup articles (as a proxy for sentiment analysis)
- **Time Series Data:** Simulated stock prices

We then explore how varying key sizes (which affects security and performance) influences the encryption and decryption process.

In [None]:
# !pip install phe
# !pip install tensorflow


import numpy as np
import pandas as pd
import time
import datetime
import matplotlib.pyplot as plt

# For tabular and text data processing
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_diabetes
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.datasets import fetch_20newsgroups

# For image data (MNIST)
from tensorflow.keras.datasets import mnist

# Import Paillier from the phe library
from phe import paillier


Defaulting to user installation because normal site-packages is not writeable
Collecting tensorflow
  Downloading tensorflow-2.18.0-cp39-cp39-macosx_12_0_arm64.whl.metadata (4.0 kB)
Collecting absl-py>=1.0.0 (from tensorflow)
  Downloading absl_py-2.1.0-py3-none-any.whl.metadata (2.3 kB)
Collecting astunparse>=1.6.0 (from tensorflow)
  Downloading astunparse-1.6.3-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting flatbuffers>=24.3.25 (from tensorflow)
  Downloading flatbuffers-25.2.10-py2.py3-none-any.whl.metadata (875 bytes)
Collecting gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 (from tensorflow)
  Downloading gast-0.6.0-py3-none-any.whl.metadata (1.3 kB)
Collecting google-pasta>=0.1.1 (from tensorflow)
  Downloading google_pasta-0.2.0-py3-none-any.whl.metadata (814 bytes)
Collecting libclang>=13.0.0 (from tensorflow)
  Downloading libclang-18.1.1-1-py2.py3-none-macosx_11_0_arm64.whl.metadata (5.2 kB)
Collecting opt-einsum>=2.3.2 (from tensorflow)
  Downloading opt_einsum-3.4.0-py3-none-any.w

In [6]:
def generate_paillier_keys(key_length=1024):
    """
    Generate Paillier public and private keys.
    """
    public_key, private_key = paillier.generate_paillier_keypair(n_length=key_length)
    return public_key, private_key

def encrypt_vector(vector, public_key):
    """
    Encrypt a 1D numpy vector element-wise.
    Returns a list of encrypted numbers.
    """
    return [public_key.encrypt(float(x)) for x in vector]

def encrypted_dot_product(enc_vector, weights):
    """
    Compute the dot product between an encrypted vector (list of EncryptedNumber)
    and a plaintext weight vector (numpy array or list). Multiplication by a plaintext constant is allowed.
    """
    # Start with an encryption of zero (using the first encrypted element's public key)
    public_key = enc_vector[0].public_key
    enc_result = public_key.encrypt(0.0)
    for enc_x, w in zip(enc_vector, weights):
        enc_result += enc_x * float(w)
    return enc_result

def encrypted_linear_inference(enc_vector, weights, bias, public_key):
    """
    Compute the encrypted linear inference: dot(enc_vector, weights) + bias.
    The bias is added as a plaintext constant.
    """
    enc_dot = encrypted_dot_product(enc_vector, weights)
    # Adding the bias (plaintext) to the encrypted dot product.
    return enc_dot + float(bias)


## Experiment 1: Tabular Data (Diabetes Dataset)

We load the Diabetes dataset, standardize its features, choose one sample, and simulate a linear model inference on encrypted data.



In [7]:
# Load Diabetes dataset (regression task)
diabetes = load_diabetes()
X_diabetes = diabetes.data  # features
y_diabetes = diabetes.target  # target values

# Standardize features for numerical stability
scaler = StandardScaler()
X_diabetes_scaled = scaler.fit_transform(X_diabetes)

# Simulate a linear model: y = dot(x, weights) + bias
np.random.seed(42)
weights_tabular = np.random.randn(X_diabetes_scaled.shape[1])
bias_tabular = np.random.randn()

sample_index = 0
sample_features = X_diabetes_scaled[sample_index]
plaintext_prediction = np.dot(sample_features, weights_tabular) + bias_tabular

# Generate Paillier keys with default key length (1024 bits)
public_key, private_key = generate_paillier_keys(key_length=1024)

# Encrypt the sample features (each element individually)
enc_sample = encrypt_vector(sample_features, public_key)

# Perform encrypted inference
enc_prediction = encrypted_linear_inference(enc_sample, weights_tabular, bias_tabular, public_key)
decrypted_prediction = private_key.decrypt(enc_prediction)

print("=== Diabetes Dataset (Tabular Data) ===")
print("Plaintext prediction: {:.4f}".format(plaintext_prediction))
print("Encrypted (decrypted) prediction: {:.4f}".format(decrypted_prediction))


=== Diabetes Dataset (Tabular Data) ===
Plaintext prediction: -0.1641
Encrypted (decrypted) prediction: -0.1641


## Experiment 2: Image Data (MNIST)

We load one MNIST image, flatten and normalize it, and run a simple linear model prediction using Paillier encryption.


In [8]:
# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Select one image from the test set, flatten and normalize (scale pixel values between 0 and 1)
sample_image = x_test[0]
sample_image_flat = sample_image.flatten() / 255.0

# Simulate a linear classifier: y = dot(x, weights) + bias
weights_image = np.random.randn(sample_image_flat.shape[0])
bias_image = np.random.randn()
plaintext_pred_image = np.dot(sample_image_flat, weights_image) + bias_image

# Generate new Paillier keys for image data (using default key length)
public_key_img, private_key_img = generate_paillier_keys(key_length=1024)

# Encrypt the image vector
enc_image = encrypt_vector(sample_image_flat, public_key_img)

# Encrypted inference
enc_pred_image = encrypted_linear_inference(enc_image, weights_image, bias_image, public_key_img)
decrypted_pred_image = private_key_img.decrypt(enc_pred_image)

print("\n=== MNIST Image Data ===")
print("Plaintext prediction: {:.4f}".format(plaintext_pred_image))
print("Encrypted (decrypted) prediction: {:.4f}".format(decrypted_pred_image))


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 0us/step

=== MNIST Image Data ===
Plaintext prediction: 2.4648
Encrypted (decrypted) prediction: 2.4648


## Experiment 3: Text Data (TF-IDF Representation Proxy)

We fetch a small set of newsgroup articles, compute TF-IDF features (limiting to 50 features), and simulate a linear model prediction.

In [None]:
## Experiment 3: Text Data (TF-IDF Representation Proxy)

We fetch a small set of newsgroup articles, compute TF-IDF features (limiting to 50 features), and simulate a linear model prediction.
"""
