### Concrete ML configuration with docker (didn't work)
https://docs.zama.ai/concrete-ml/get-started/pip_installing#installation-using-docker
Installation only available via [Docker](https://docs.docker.com/desktop/install/windows-install/) on Windows Machine.
```docker
docker pull zamafhe/concrete-ml:latest
```


### Concrete ML configuration with Windows Subsystem for Linux (WSL)

- install WLS: 
```powershell
        wsl --install
```
- install conda and python 3.11, [installation guide](https://gist.github.com/kauffmanes/5e74916617f9993bc3479f401dfec7da)
- install concrete ML
```bash
        pip install -U pip wheel setuptools
        pip install concrete-ml
```
- configure interpreter under WSL (e.g. via [Pycharm](https://www.jetbrains.com/help/pycharm/using-wsl-as-a-remote-interpreter.html))

![pycharm_wls](pycharm_wsl.png)

## Homomorphic Encryption in Machine Learning with Concrete-ML

This Jupyter Notebook demonstrates how to use homomorphic encryption (HE) with machine learning models using the Concrete-ML library. We'll train three different models in plaintext and then apply homomorphic encryption to perform encrypted inference with the same models.

Models Used:

1. Logistic Regression
2. Decision Tree Classifier
3. Random Forest Classifier

Important Note on XGBoost and Concrete-ML
Concrete-ML supports specific models that can be compiled into their FHE counterparts, such as Logistic Regression, Decision Trees, Random Forests, and Neural Networks.

In [19]:
!pip install seaborn

Collecting seaborn
  Downloading seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
Downloading seaborn-0.13.2-py3-none-any.whl (294 kB)
Installing collected packages: seaborn
Successfully installed seaborn-0.13.2


In [40]:
# Import necessary libraries
import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt
import seaborn as sns

# Import scikit-learn modules
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Import Concrete-ML libraries
from concrete.ml.sklearn import LogisticRegression as HELogisticRegression
from concrete.ml.sklearn import DecisionTreeClassifier as HEDecisionTreeClassifier
from concrete.ml.sklearn import RandomForestClassifier as HERandomForestClassifier

# Import scikit-learn models
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier


In [41]:
# Load the Breast Cancer Wisconsin dataset
data = load_breast_cancer()
X = data.data
y = data.target
feature_names = data.feature_names

# Create a DataFrame for better visualization and manipulation
df = pd.DataFrame(X, columns=feature_names)
df['target'] = y

# Check for missing values
print(f"Missing values in dataset: {df.isnull().sum().sum()}")

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df.drop('target', axis=1))

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, df['target'], test_size=0.2, random_state=42
)

# Print dataset sizes
print(f"Training samples: {X_train.shape[0]}")
print(f"Testing samples: {X_test.shape[0]}")


Missing values in dataset: 0
Training samples: 455
Testing samples: 114


In [42]:
def evaluate_model(model, model_name, X_test, y_test):
    """
    Evaluates the model's performance on the test set.

    Parameters:
    - model: Trained machine learning model.
    - model_name: Name of the model (string).
    - X_test: Test features.
    - y_test: Test labels.

    Returns:
    - accuracy: Accuracy score.
    - inference_time: Time taken for inference.
    """
    # Perform prediction
    start_time = time.time()
    y_pred = model.predict(X_test)
    inference_time = time.time() - start_time
    accuracy = accuracy_score(y_test, y_pred)
    print(f"\n{model_name} Accuracy: {accuracy * 100:.2f}%")
    print(f"Inference Time: {inference_time:.6f} seconds")
    print(f"Classification Report:\n{classification_report(y_test, y_pred)}")
    return accuracy, inference_time


#### Logistic Regression

In [43]:
# Initialize the Logistic Regression model (plaintext)
lr_plaintext = LogisticRegression(max_iter=1000)
start_time = time.time()
lr_plaintext.fit(X_train, y_train)
lr_plaintext_training_time = time.time() - start_time

# Evaluate the plaintext model
lr_plaintext_accuracy, lr_plaintext_time = evaluate_model(
    lr_plaintext, "Logistic Regression (Plaintext)", X_test, y_test
)


Logistic Regression (Plaintext) Accuracy: 97.37%
Inference Time: 0.000270 seconds
Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114



In [44]:
# Initialize the Logistic Regression model with Concrete-ML
lr_he = HELogisticRegression(n_bits=7, max_iter=1000)
start_time = time.time()
lr_he.fit(X_train, y_train)
lr_he_training_time = time.time() - start_time

# Compile the model for FHE execution
print("Compiling Logistic Regression model for FHE...")
lr_he.compile(X_train)

# Perform encrypted inference on the test set
start_time = time.time()
y_pred_he = lr_he.predict(X_test)
lr_he_inference_time = time.time() - start_time

# Evaluate the HE model
lr_he_accuracy = accuracy_score(y_test, y_pred_he)
print(f"\nLogistic Regression (HE) Accuracy: {lr_he_accuracy * 100:.2f}%")
print(f"Encrypted Inference Time: {lr_he_inference_time:.6f} seconds")
print(f"Classification Report:\n{classification_report(y_test, y_pred_he)}")


Compiling Logistic Regression model for FHE...

Logistic Regression (HE) Accuracy: 97.37%
Encrypted Inference Time: 0.000373 seconds
Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114



#### Decision Tree Classifier

In [45]:
# Initialize the Decision Tree Classifier (plaintext)
dt_plaintext = DecisionTreeClassifier(random_state=42, max_depth=5)
start_time = time.time()
dt_plaintext.fit(X_train, y_train)
dt_plaintext_training_time = time.time() - start_time

# Evaluate the plaintext model
dt_plaintext_accuracy, dt_plaintext_time = evaluate_model(
    dt_plaintext, "Decision Tree Classifier (Plaintext)", X_test, y_test
)



Decision Tree Classifier (Plaintext) Accuracy: 94.74%
Inference Time: 0.000429 seconds
Classification Report:
              precision    recall  f1-score   support

           0       0.93      0.93      0.93        43
           1       0.96      0.96      0.96        71

    accuracy                           0.95       114
   macro avg       0.94      0.94      0.94       114
weighted avg       0.95      0.95      0.95       114



In [46]:
# Initialize the Decision Tree Classifier with Concrete-ML
dt_he = HEDecisionTreeClassifier(n_bits=7, max_depth=5)
start_time = time.time()
dt_he.fit(X_train, y_train)
dt_he_training_time = time.time() - start_time

# Compile the model for FHE execution
print("Compiling Decision Tree model for FHE...")
dt_he.compile(X_train)

# Perform encrypted inference on the test set
start_time = time.time()
y_pred_he = dt_he.predict(X_test)
dt_he_inference_time = time.time() - start_time

# Evaluate the HE model
dt_he_accuracy = accuracy_score(y_test, y_pred_he)
print(f"\nDecision Tree Classifier (HE) Accuracy: {dt_he_accuracy * 100:.2f}%")
print(f"Encrypted Inference Time: {dt_he_inference_time:.6f} seconds")
print(f"Classification Report:\n{classification_report(y_test, y_pred_he)}")


Compiling Decision Tree model for FHE...

Decision Tree Classifier (HE) Accuracy: 96.49%
Encrypted Inference Time: 0.000893 seconds
Classification Report:
              precision    recall  f1-score   support

           0       0.95      0.95      0.95        43
           1       0.97      0.97      0.97        71

    accuracy                           0.96       114
   macro avg       0.96      0.96      0.96       114
weighted avg       0.96      0.96      0.96       114



#### Random Forest Classifier

In [47]:
# Initialize the Random Forest Classifier (plaintext)
rf_plaintext = RandomForestClassifier(n_estimators=10, random_state=42, max_depth=5)
start_time = time.time()
rf_plaintext.fit(X_train, y_train)
rf_plaintext_training_time = time.time() - start_time

# Evaluate the plaintext model
rf_plaintext_accuracy, rf_plaintext_time = evaluate_model(
    rf_plaintext, "Random Forest Classifier (Plaintext)", X_test, y_test
)



Random Forest Classifier (Plaintext) Accuracy: 95.61%
Inference Time: 0.000815 seconds
Classification Report:
              precision    recall  f1-score   support

           0       0.95      0.93      0.94        43
           1       0.96      0.97      0.97        71

    accuracy                           0.96       114
   macro avg       0.96      0.95      0.95       114
weighted avg       0.96      0.96      0.96       114



In [48]:
# Initialize the Random Forest Classifier with Concrete-ML
rf_he = HERandomForestClassifier(n_bits=7, n_estimators=10, max_depth=5)
start_time = time.time()
rf_he.fit(X_train, y_train)
rf_he_training_time = time.time() - start_time

# Compile the model for FHE execution
print("Compiling Random Forest model for FHE...")
rf_he.compile(X_train)

# Perform encrypted inference on the test set
start_time = time.time()
y_pred_he = rf_he.predict(X_test)
rf_he_inference_time = time.time() - start_time

# Evaluate the HE model
rf_he_accuracy = accuracy_score(y_test, y_pred_he)
print(f"\nRandom Forest Classifier (HE) Accuracy: {rf_he_accuracy * 100:.2f}%")
print(f"Encrypted Inference Time: {rf_he_inference_time:.6f} seconds")
print(f"Classification Report:\n{classification_report(y_test, y_pred_he)}")


Compiling Random Forest model for FHE...

Random Forest Classifier (HE) Accuracy: 96.49%
Encrypted Inference Time: 0.001976 seconds
Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.93      0.95        43
           1       0.96      0.99      0.97        71

    accuracy                           0.96       114
   macro avg       0.97      0.96      0.96       114
weighted avg       0.97      0.96      0.96       114



#### Results

In [53]:
# Create a DataFrame to store the results
results_df = pd.DataFrame({
    'Model': ['Logistic Regression', 'Decision Tree', 'Random Forest'],
    'Plaintext Accuracy (%)': [
        lr_plaintext_accuracy * 100,
        dt_plaintext_accuracy * 100,
        rf_plaintext_accuracy * 100
    ],
    'Encrypted Accuracy (%)': [
        lr_he_accuracy * 100,
        dt_he_accuracy * 100,
        rf_he_accuracy * 100
    ],
    'Plaintext Inference Time (s)': [
        lr_plaintext_time,
        dt_plaintext_time,
        rf_plaintext_time
    ],
    'Encrypted Inference Time (s)': [
        lr_he_inference_time,
        dt_he_inference_time,
        rf_he_inference_time
    ],
    'Plaintext Training Time (s)': [
        lr_plaintext_training_time,
        dt_plaintext_training_time,
        rf_plaintext_training_time
    ],
    'Encrypted Training Time (s)': [
        lr_he_training_time,
        dt_he_training_time,
        rf_he_training_time
    ]
})

# Calculate the increase percentage for inference time
results_df['Inference Time Increase (%)'] = (
    (results_df['Encrypted Inference Time (s)'] - results_df['Plaintext Inference Time (s)'])
    / results_df['Plaintext Inference Time (s)']
) * 100

# Handle division by zero or very small plaintext times
results_df['Inference Time Increase (%)'] = results_df['Inference Time Increase (%)'].replace([np.inf, -np.inf], np.nan)
results_df['Inference Time Increase (%)'] = results_df['Inference Time Increase (%)'].fillna(0).round(2)

# Calculate the increase percentage for Training time
results_df['Training Time Increase (%)'] = (
    (results_df['Encrypted Training Time (s)'] - results_df['Plaintext Training Time (s)'])
    / results_df['Plaintext Training Time (s)']
) * 100

# Handle division by zero or very small plaintext times
results_df['Training Time Increase (%)'] = results_df['Training Time Increase (%)'].replace([np.inf, -np.inf], np.nan)
results_df['Training Time Increase (%)'] = results_df['Training Time Increase (%)'].fillna(0).round(2)


# Display the accuracy table
print("\nModel Accuracy Comparison:")
display(results_df[['Model', 'Plaintext Accuracy (%)', 'Encrypted Accuracy (%)']])

# Display the inference time table
print("\nInference Time Comparison with Increase Percentage:")
display(results_df[['Model', 'Plaintext Inference Time (s)', 'Encrypted Inference Time (s)', 'Inference Time Increase (%)']])

# Display the training time table
print("\nTraining Time Comparison:")
display(results_df[['Model', 'Plaintext Training Time (s)', 'Encrypted Training Time (s)', 'Training Time Increase (%)']])



Model Accuracy Comparison:


Unnamed: 0,Model,Plaintext Accuracy (%),Encrypted Accuracy (%)
0,Logistic Regression,97.368421,97.368421
1,Decision Tree,94.736842,96.491228
2,Random Forest,95.614035,96.491228



Inference Time Comparison with Increase Percentage:


Unnamed: 0,Model,Plaintext Inference Time (s),Encrypted Inference Time (s),Inference Time Increase (%)
0,Logistic Regression,0.00027,0.000373,37.95
1,Decision Tree,0.000429,0.000893,107.88
2,Random Forest,0.000815,0.001976,142.52



Training Time Comparison:


Unnamed: 0,Model,Plaintext Training Time (s),Encrypted Training Time (s),Training Time Increase (%)
0,Logistic Regression,0.011613,0.08512,632.95
1,Decision Tree,0.007887,2.934052,37100.53
2,Random Forest,0.017413,0.18611,968.82


## TODO
- [X] Demonstrate the use of homomorphic encryption with Neural Networks. {Not doing this, the machine has no GPU support}
- [ ] Demonstrate the security and privacy benefits of homomorphic encryption with client side decryption and server side encryption.
- [ ] Web application for encrypted inference with homomorphic encryption.