# Privacy Preserving Machine Learning

This functionality was included to show the possibilities available when using **Fully Homomorphic Encryption**. 

It would allow us to ask questions that are deeply personal that have shown to play a part in determining the risk of an applicant. However, answering these questions may make them feel that it will be held against them. When using FHE, we will never know the answers given as they are encrypted in transport, in use, and never persisted.

To begin, you must install the `concrete-ml` library.

In [5]:
import pandas as pd

final = pd.read_csv('final_df_v2')

final

Unnamed: 0,AGE,MILHH,NUMPEOPLE,DISHH,HINCP,at_risk,gov_assistance
0,21,0,3,1,48000,0,0
1,21,0,3,1,48000,0,0
2,1,0,3,1,48000,0,0
3,76,1,2,1,292500,0,0
4,75,1,2,1,292500,0,0
...,...,...,...,...,...,...,...
111502,2,0,4,0,194000,0,0
111503,49,1,3,0,125000,0,0
111504,21,1,3,0,125000,0,0
111505,22,1,3,0,125000,0,0


# Model Comparison

Below we can see the results of the XGBClassifier model being used on unencrypted and encrypted data, showing the successful port from one version to the other.

In [7]:
import numpy as np
from sklearn.preprocessing import RobustScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from concrete.ml.sklearn.xgb import XGBClassifier
from sklearn.model_selection import train_test_split, StratifiedShuffleSplit
import warnings

warnings.filterwarnings('ignore', category=DeprecationWarning)

x = final.drop('at_risk', axis=1)
y = final["at_risk"]

continuous_cols = ["AGE", "HINCP", "NUMPEOPLE"]

preprocessor = ColumnTransformer(
    [("num", RobustScaler(), continuous_cols)], remainder="passthrough"
)

x_train, x_test, y_train, y_test = train_test_split(
    x, y, test_size=0.2, random_state=42, stratify=y
)

px_train = preprocessor.fit_transform(x_train)
px_test = preprocessor.transform(x_test)

model = XGBClassifier(
                n_estimators=100,
                n_bits=3,
                learning_rate=0.1,
                random_state=42,
                scale_pos_weight=2.5,
                objective="binary:logistic",
)

print("Fitting Model")
model.fit(px_train, y_train)

print("Predicting on unencrypted data")
y_pred_clear = model.predict(px_test[0:1000])

print("Compiling for FHE")
model.compile(px_train)

print("Predicting on encrypted data")
y_pred_fhe = model.predict(px_test[0:1000], fhe="execute")

print("In clear  :", y_pred_clear)
print("In FHE    :", y_pred_fhe)
print(f"Similarity: {int((y_pred_fhe == y_pred_clear).astype(int).mean()*100)}%")


Fitting Model
Predicting on unencrypted data
Compiling for FHE
Predicting on encrypted data
In clear  : [1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 0 1
 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
 1 0 1 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
 1 0 0 1 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 1 1 0 1 1 1 0 1 0
 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 1
 0 0 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
 1 0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1
 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 1 0 1 1 0 0 0 0 0 1 1 1 0 1 1 0 0 0 1
 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 

# Server/Client simulation

Below, we'll be using Debbie's info to simulate an application submission.

In [57]:

debbie_data = {
    'AGE': [39],
    'MILHH': [0],
    'NUMPEOPLE': [3],
    'DISHH': [0],
    'HINCP': [30000],
    'gov_assistance': [1]
}

debbie_info = pd.DataFrame(debbie_data)

print(debbie_info)


   AGE  MILHH  NUMPEOPLE  DISHH  HINCP  gov_assistance
0   39      0          3      0  30000               1


In [59]:
from concrete.ml.deployment import FHEModelDev, FHEModelClient, FHEModelServer
import os

# Save model and required specs
dev_env = FHEModelDev(path_dir='./env', model=model)
dev_env.save()

In [66]:
# Client setup
client = FHEModelClient(path_dir='./env', key_dir='./env/client_keys')
eval_keys = client.get_serialized_evaluation_keys()

# process & encrypt client data
encrypted_data = client.quantize_encrypt_serialize(debbie_info.values)

print('Encrypted & Serialized Client Data:\n\n', encrypted_data[:200])

Encrypted & Serialized Client Data:

 b'\x01\x00\x00\x00"0\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x08\x00\x00\x00\x00\x00\x01\x00$\xc0\x00\x00\x01\x00\x01\x004\xc0\x00\x00\x01\x00\x01\x00\x01\x00\x00\x00\x0e\x00\x00\x00\x01\x00\x00\x00\x82\x01\x0c\x00\x0e\xf4=g\xf8\xe0\x9aWS\xfbdb\xa6<\xf36\xf9\x88\xffn\xab\xad\x02\x95\xe6\xc1\xd4\xd3)ZGo\x8b\x1a{\xc0A \x0c\xcb\x00)[\xe2\x00rz\xa2.\xc6\xc4\x02\xf6\x9d\xf0\xfa<\x11V\x1a\xd3\x17\x0c=\xf5\xf2\xa7\x8af\x14\x94\xd1\x84\xd6~\xf7\x05;\xdc\x94\xdb2\xf7O\xc7\x06z\xb9\xf6\x81\xb4/\x9c\xb0\x7f\xf1\x01\xb2\\\xff<\xea\xc50\xd6\xc6\xfd\xce\x91 \xe8\xc0\x1a.\x01\x88i\x7fw\x1c-`8n\'L\xf0\xc1\x9bL.\xcd\x96&U\x8e'


In [65]:
# Server setup
server = FHEModelServer(path_dir='./env')
server.load()

encrypted_result = server.run(encrypted_data, eval_keys)

print('Encrypted & Serialized Model Prediction:\n\n', encrypted_result[:100])

Encrypted & Serialized Model Prediction:

 b"\x01\x00\x00\x00\x81 \x03\x00\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x08\x00\x00\x00\x00\x00\x01\x00\x9c\x81\x0c\x00\x01\x00\x01\x00\xac\x81\x0c\x00\x01\x00\x01\x00\x01\x00\x00\x00\x0e\x00\x00\x00\x01\x00\x00\x00\x02\x19\xc8\x00\x00\x00\x00\x00\xa4\x99j.\x00\x00@](_\x88\xec\x00\x00\x00Vf\xd4'k\x00\x00@0\xb2+t^\x00\x00\x00\xf8"


In [72]:
# Decrypted result

result = client.deserialize_decrypt_dequantize(encrypted_result)

print("Decrypted Results:\n\n")
print(f'Not at risk class probability: {result[0][0] * 100} \nAt risk class probability: {result[0][1] * 100}')

Decrypted Results:


Not at risk class probability: 36.796639272310316 
At risk class probability: 63.203360727689684
