# Heart disease example

This notebook shows an example of training and running a model that classifies heart diseases.
It is based on the <a herf="https://archive.ics.uci.edu/ml/datasets/Heart+Disease">Heart Disease dataset</a> from the <a href="https://archive.ics.uci.edu/">UCI</a> repository. For privacy reasons, names and social security numbers (SSNs) of patients were removed from the dataset, and were replaced with dummy values.

Following the UCI description, this demonstration only uses 14 attributes out of the 76 attributes listed to predict whether a patient will have a heart attack. The target field is an integer value from 0 (no attack) to 4.

The demonstration uses a 3 layer neural network (NN): FC(50) --> Square activation --> FC(1)

The required estimated memory is: model (4.46MB), input (3.67MB), output (0.13MB), and context (8.18MB).

We start by importing the required source packages.

In [1]:
import os
import warnings
warnings.filterwarnings("ignore")

from numpy.random import seed
seed(1)

import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.metrics import accuracy_score,confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

import tensorflow as tf
from keras import backend as K
from keras import utils, callbacks, losses
from keras.layers import Dense, Activation
from keras.models import Sequential
from tensorflow.keras.optimizers import SGD
from  preprocessor import Preprocessor

import h5py

2022-04-14 16:55:29.334223: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-04-14 16:55:29.334282: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Error in sys.excepthook:
Traceback (most recent call last):
  File "/data/fhe/python/akram-env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 1934, in showtraceback
    stb = value._render_traceback_()
AttributeError: 'RuntimeError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/fhe/python/akram-env/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 1936, in showtraceback
    stb = self.InteractiveTB.structured_traceback(etype,
  File "/data/fhe/python/

ImportError: SystemError: <built-in method __contains__ of dict object at 0x7f01e948c980> returned a result with an error set

### Data loading
Please refer to the dataset <a href="https://archive.ics.uci.edu/ml/datasets/Heart+Disease">documentation</a> for the complete list of attributes and their description.


In [None]:
df = pd.read_csv("datasets/heart_disease.csv")

df.rename(columns={'age': 'Age', 'sex': 'Sex', 'cp': 'Chest_pain', 'trestbps': 'Resting_blood_pressure',
                       'chol': 'Cholesterol', 'fbs': 'Fasting_blood_sugar',
                       'restecg': 'ECG_results', 'thalach': 'Maximum_heart_rate',
                       'exang': 'Exercise_induced_angina', 'oldpeak': 'ST_depression', 'ca': 'Major_vessels',
                       'thal': 'Thalassemia_types', 'target': 'Heart_attack', 'slope': 'ST_slope'}, inplace=True)

print(f'data shape: {df.shape}')
print(df.dtypes)

: 

### Data preprocessing

We first convert the categorial features (in the table below) to indicator vectors. 

|Attributes|Description|Values|
|---|---|---|
|Chest_pain| The chest pain type | Typical angina (1), Atypical angina (2), Non-anginal pain (3), Asymptomatic (4)|
|Thalassemia_types| | Normal (3), Fixed defect (6), Reversable defect (7)|
|ECG_results| Resting electrocardiographic results|<ul><li>Normal (0)</li></ul><ul><li>Having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) (1)</li></ul><ul><li> Showing probable or definite left ventricular hypertrophy by Estes' criteria (2)</li></ul>|
|ST_slope| The slope of the peak exercise ST segment| Upsloping (1), Flat (2), Downsloping (3)|
|Major_vessels | The number of major vessels (0-3) colored by flourosopy | 0-3 |

In [None]:
# dummy1 = pd.get_dummies(df.Chest_pain)
# dummy2 = pd.get_dummies(df.Thalassemia_types)
# dummy3 = pd.get_dummies(df.ECG_results)
# dummy4 = pd.get_dummies(df.ST_slope)
# dummy5 = pd.get_dummies(df.Major_vessels)
# merge = pd.concat([df, dummy1, dummy2, dummy3, dummy4, dummy5], axis='columns')

# final = merge.drop(['Chest_pain', 'Thalassemia_types', 'ECG_results', 'ST_slope', 'Major_vessels'], axis=1)

: 

Subsequently, we split every row into its target value (y) and predicates (X).

In [None]:
# X = final.drop(['Heart_attack'], axis=1)
# y = final['Heart_attack']

X = df
y = df['Heart_attack']

: 

We split the dataset into the training (x_train, y_train) and test (x_test, y_test) sets and scale their features.

In [None]:
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)

# feature_scaler = MinMaxScaler()
# x_train = feature_scaler.fit_transform(x_train)
# x_test = feature_scaler.transform(x_test)

prep = Preprocessor()
x_train = prep.fit_transform(x_train)
x_test = prep.transform(x_test)

: 

Finally, we split the test set into test (x_test, y_test) and validation (x_val, y_val) sets.

In [None]:
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=5)
print(f"y_test: {sum(y_test)}/{len(y_test)}")
print(f"y_train: {sum(y_train)}/{len(y_train)}")
print(f"y_val: {sum(y_val)}/{len(y_val)}")
input_shape = x_train.shape[1]

: 

For later use in HE, we save the different preprocessed datasets.

In [None]:
def save_data_set(x, y, data_type, path, s=''):
    if not os.path.exists(path):
        os.makedirs(path)
    fname=os.path.join(path, f'x_{data_type}{s}.h5')
    print("Saving x_{} of shape {} in {}".format(data_type, x.shape, fname))
    xf = h5py.File(fname, 'w')
    xf.create_dataset('x_{}'.format(data_type), data=x)
    xf.close()

    print("Saving y_{} of shape {} in {}".format(data_type, y.shape, fname))
    yf = h5py.File(os.path.join(path, f'y_{data_type}{s}.h5'), 'w')
    yf.create_dataset(f'y_{data_type}', data=y)
    yf.close()

input_output_dir = "outputs/"

save_data_set(x_test, y_test, data_type='test', path=input_output_dir)
save_data_set(x_train, y_train, data_type='train', path=input_output_dir)
save_data_set(x_val, y_val, data_type='val', path=input_output_dir)

prep.save(os.path.join(input_output_dir, "prep.pickle"))

: 

### The model

The model has 3 layers: 

FC(50) --> Square activation --> FC(1)

In [None]:
def square(x):
    return x ** 2

model = Sequential()
model.add(Dense(50, input_shape=(input_shape,)))
model.add(Activation(activation=square))
model.add(Dense(1))

: 

#### Model training

In [None]:
def sum_squared_error(y_true, y_pred):
    y_true = tf.cast(y_pred, tf.float32)
    return K.sum(K.square(y_pred - y_true), axis=-1)

loss_func = sum_squared_error
optimizer_type = SGD(lr=0.01, momentum=0.9)  # Adam

model.compile(loss=loss_func,optimizer=optimizer_type,metrics=['accuracy'])
model.summary()

batch_size = 10
epochs = 15

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=2,
          validation_data=(x_val, y_val),
          shuffle=True)

: 

For later use in HE, we save the trained model.

In [None]:
def save_weights(model, index, path):
    if not os.path.exists(path):
        os.mkdir(path)
    fname = os.path.join(path, "model_epoch_{:0>4}.h5".format(index))
    print("Saving weights to: " + fname)
    model.save_weights(fname)
    s = model.to_json()

    with open(os.path.join(path, f'model_epoch{index}.json'), 'w') as f:
        f.write(s)

save_weights(model, epochs, path=input_output_dir)

: 

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)

print(f'Test loss: {score[0]:.3f}')
print(f'Test accuracy:{score[1] * 100:.3f}')

: 

#### Using the model for classifying cleartest data

In [None]:
    y_pred = model.predict(x_test)

    f, t, thresholds = metrics.roc_curve(y_test, y_pred)
    cm = metrics.confusion_matrix(y_test, y_pred)
    print("Score:", metrics.auc(f, t))
    print("Classification report:")
    print(metrics.classification_report(y_test, y_pred))
    print("Confusion Matrix:")
    print(cm)

: 

### Using the model for classifying encrypted data

To run the model over encrypted samples with homomorphic encryption (HE), we first load the pyhelayers package and refer it to the directory "output/", where we saved the model and the relevant datasets.

In [None]:
import pyhelayers

: 

Load test data and labels from the h5 file

In [None]:
with h5py.File(input_output_dir + "x_test.h5") as f:
    x_test = np.array(f["x_test"])
with h5py.File(input_output_dir + "y_test.h5") as f:
    y_test = np.array(f["y_test"])

: 

Load a plain model

In [None]:
nnp = pyhelayers.NeuralNetPlain()
nnp.init_arch_from_json_file(input_output_dir + "model_epoch15.json")
nnp.init_weights_from_hdf5_file(input_output_dir + "model_epoch_0015.h5")
print("loaded plain model")

: 

Apply automatic optimziations

In [None]:
context = pyhelayers.DefaultContext()
optimizer = pyhelayers.HeProfileOptimizer(nnp, context)
optimizer.get_requirements().set_batch_size(16)
profile = optimizer.get_optimized_profile(False)
batch_size = profile.get_batch_size()

: 

To reduce the memory requirements of the context, we reduce the number of rotation keys.

In [None]:
pf1=pyhelayers.PublicFunctions()
pf1.rotate=pyhelayers.RotationSetType.CUSTOM_ROTATIONS
pf1.set_rotation_steps([1,2])
pf1.conjugate=False
requirements = profile.requirement
requirements.public_functions=pf1

: 

Intialize the HE context with the optimized configuration.

In [None]:
context.init(profile.requirement)
print('HE Context ready. Batch size=',batch_size)

: 

Print the HE context (w/ keys) size.

In [None]:
evalBuf=context.save_to_buffer();
print('Size',len(evalBuf)/1024/1024,'MB')

: 

#### Encrypt the model

In [None]:
nn = pyhelayers.NeuralNet(context)
nn.encode_encrypt(nnp, profile)

: 

We use the encrypted model over batches of 16 records at a time. 

In [None]:
plain_samples = x_test.take(indices=range(0, batch_size), axis=0)
labels = y_test.take(indices=range(0, batch_size), axis=0)

: 

Encrypt input samples

In [None]:
samples = nn.encode_encrypt_input(plain_samples)

: 

Now we perform inference of the 16 samples under encryption 

In [None]:
predictions=nn.predict(samples)

: 

### Plaintext results

Decrypting the final results

In [None]:
plain_predictions = nn.decrypt_decode_output(predictions)

: 

In [None]:
print('\nclassification results')
print('=========================================')
for label,pred in zip(labels,plain_predictions):
    print('Label:',('Healthy' if label==0 else 'Should talk with a Dr.'),end=', ')
    print('Prediction:',('Healthy' if pred[0]<0.5 else 'Should talk with a Dr.'))

: 