# Heart disease example

This notebook shows an example of training and running a model that classifies heart diseases.
It is based on the <a herf="https://archive.ics.uci.edu/ml/datasets/Heart+Disease">Heart Disease dataset</a> from the <a href="https://archive.ics.uci.edu/">UCI</a> repository. For privacy reasons, names and social security numbers (SSNs) of patients were removed from the dataset, and were replaced with dummy values.

Following the UCI description, this demonstration only uses 14 attributes out of the 76 attributes listed to predict whether a patient will have a heart attack. The target field is an integer value from 0 (no attack) to 4.

The demonstration uses a 3 layer neural network (NN): FC(50) --> Square activation --> FC(1)

The required estimated memory is: model (4.46MB), input (3.67MB), output (0.13MB), and context (8.18MB).

We start by importing the required source packages.

In [1]:
import os
import warnings
warnings.filterwarnings("ignore")

from numpy.random import seed
seed(1)

import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn.metrics import accuracy_score,confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

import tensorflow as tf
from keras import backend as K
from keras import utils, callbacks, losses
from keras.layers import Dense, Activation
from keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from  preprocessor import Preprocessor

import h5py

2022-08-22 12:01:36.895303: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


### Data loading
Please refer to the dataset <a href="https://archive.ics.uci.edu/ml/datasets/Heart+Disease">documentation</a> for the complete list of attributes and their description.


In [2]:
df = pd.read_csv("datasets/heart_disease.csv")

df.rename(columns={'age': 'Age', 'sex': 'Sex', 'cp': 'Chest_pain', 'trestbps': 'Resting_blood_pressure',
                       'chol': 'Cholesterol', 'fbs': 'Fasting_blood_sugar',
                       'restecg': 'ECG_results', 'thalach': 'Maximum_heart_rate',
                       'exang': 'Exercise_induced_angina', 'oldpeak': 'ST_depression', 'ca': 'Major_vessels',
                       'thal': 'Thalassemia_types', 'target': 'Heart_attack', 'slope': 'ST_slope'}, inplace=True)

print(f'data shape: {df.shape}')
print(df.dtypes)

data shape: (303, 14)
Age                          int64
Sex                          int64
Chest_pain                   int64
Resting_blood_pressure       int64
Cholesterol                  int64
Fasting_blood_sugar          int64
ECG_results                  int64
Maximum_heart_rate           int64
Exercise_induced_angina      int64
ST_depression              float64
ST_slope                     int64
Major_vessels                int64
Thalassemia_types            int64
Heart_attack                 int64
dtype: object


### Data preprocessing

We first convert the categorial features (in the table below) to indicator vectors. 

|Attributes|Description|Values|
|---|---|---|
|Chest_pain| The chest pain type | Typical angina (1), Atypical angina (2), Non-anginal pain (3), Asymptomatic (4)|
|Thalassemia_types| | Normal (3), Fixed defect (6), Reversable defect (7)|
|ECG_results| Resting electrocardiographic results|<ul><li>Normal (0)</li></ul><ul><li>Having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV) (1)</li></ul><ul><li> Showing probable or definite left ventricular hypertrophy by Estes' criteria (2)</li></ul>|
|ST_slope| The slope of the peak exercise ST segment| Upsloping (1), Flat (2), Downsloping (3)|
|Major_vessels | The number of major vessels (0-3) colored by flourosopy | 0-3 |

Subsequently, we split every row into its target value (y) and predicates (X).

In [3]:
X = df
y = df['Heart_attack']

We split the dataset into the training (x_train, y_train) and test (x_test, y_test) sets and scale their features.

In [4]:
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=5)

prep = Preprocessor()
x_train = prep.fit_transform(x_train)
x_test = prep.transform(x_test)

data shape: (242, 14)
Age                          int64
Sex                          int64
Chest_pain                   int64
Resting_blood_pressure       int64
Cholesterol                  int64
Fasting_blood_sugar          int64
ECG_results                  int64
Maximum_heart_rate           int64
Exercise_induced_angina      int64
ST_depression              float64
ST_slope                     int64
Major_vessels                int64
Thalassemia_types            int64
Heart_attack                 int64
dtype: object
data shape: (61, 14)
Age                          int64
Sex                          int64
Chest_pain                   int64
Resting_blood_pressure       int64
Cholesterol                  int64
Fasting_blood_sugar          int64
ECG_results                  int64
Maximum_heart_rate           int64
Exercise_induced_angina      int64
ST_depression              float64
ST_slope                     int64
Major_vessels                int64
Thalassemia_types            int6

Finally, we split the test set into test (x_test, y_test) and validation (x_val, y_val) sets.

In [5]:
x_test, x_val, y_test, y_val = train_test_split(x_test, y_test, test_size=0.5, random_state=5)
print(f"y_test: {sum(y_test)}/{len(y_test)}")
print(f"y_train: {sum(y_train)}/{len(y_train)}")
print(f"y_val: {sum(y_val)}/{len(y_val)}")
input_shape = x_train.shape[1]

y_test: 14/30
y_train: 134/242
y_val: 17/31


For later use in HE, we save the different preprocessed datasets.

In [6]:
def save_data_set(x, y, data_type, path, s=''):
    if not os.path.exists(path):
        os.makedirs(path)
    fname=os.path.join(path, f'x_{data_type}{s}.h5')
    print("Saving x_{} of shape {} in {}".format(data_type, x.shape, fname))
    xf = h5py.File(fname, 'w')
    xf.create_dataset('x_{}'.format(data_type), data=x)
    xf.close()

    print("Saving y_{} of shape {} in {}".format(data_type, y.shape, fname))
    yf = h5py.File(os.path.join(path, f'y_{data_type}{s}.h5'), 'w')
    yf.create_dataset(f'y_{data_type}', data=y)
    yf.close()

datasets_dir = "datasets/"
model_dir = "model/"

save_data_set(x_test, y_test, data_type='test', path=datasets_dir)
save_data_set(x_train, y_train, data_type='train', path=datasets_dir)
save_data_set(x_val, y_val, data_type='val', path=datasets_dir)

prep.save(os.path.join(model_dir, "prep.pickle"))

Saving x_test of shape (30, 27) in datasets/x_test.h5
Saving y_test of shape (30,) in datasets/x_test.h5
Saving x_train of shape (242, 27) in datasets/x_train.h5
Saving y_train of shape (242,) in datasets/x_train.h5
Saving x_val of shape (31, 27) in datasets/x_val.h5
Saving y_val of shape (31,) in datasets/x_val.h5


### The model

The model has 3 layers: 

FC(50) --> Square activation --> FC(1)

In [7]:
def square(x):
    return x ** 2

model = Sequential()
model.add(Dense(50, input_shape=(input_shape,)))
model.add(Activation(activation=square))
model.add(Dense(1))

2022-08-22 12:01:37.623049: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-08-22 12:01:37.623085: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: luis-zama
2022-08-22 12:01:37.623104: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: luis-zama
2022-08-22 12:01:37.623218: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 515.65.1
2022-08-22 12:01:37.623244: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 515.65.1
2022-08-22 12:01:37.623249: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 515.65.1
2022-08-22 12:01:37.623628: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions i

#### Model training

In [8]:
loss_func = losses.BinaryCrossentropy(from_logits=True)
optimizer_type = Adam()

model.compile(loss=loss_func, optimizer=optimizer_type, metrics=['accuracy'])
model.summary()

batch_size = 10
epochs = 15

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=2,
          validation_data=(x_val, y_val),
          shuffle=True)

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 50)                1400      
                                                                 
 activation (Activation)     (None, 50)                0         
                                                                 
 dense_1 (Dense)             (None, 1)                 51        
                                                                 
Total params: 1,451
Trainable params: 1,451
Non-trainable params: 0
_________________________________________________________________
Epoch 1/15
25/25 - 0s - loss: 0.6055 - accuracy: 0.6653 - val_loss: 0.5286 - val_accuracy: 0.6774 - 273ms/epoch - 11ms/step
Epoch 2/15
25/25 - 0s - loss: 0.4847 - accuracy: 0.8058 - val_loss: 0.4348 - val_accuracy: 0.7097 - 25ms/epoch - 1ms/step
Epoch 3/15
25/25 - 0s - loss: 0.4118 - accuracy: 0.8264 - val_loss: 0.38

<keras.callbacks.History at 0x7f6a52f49040>

For later use in HE, we save the trained model.

In [9]:
def save_weights(model, index, path):
    if not os.path.exists(path):
        os.mkdir(path)
    fname = os.path.join(path, "nn_heart_disease_model_weights.h5")
    print("Saving weights to: " + fname)
    model.save_weights(fname)
    s = model.to_json()

    with open(os.path.join(path, f'nn_heart_disease_model.json'), 'w') as f:
        f.write(s)

save_weights(model, epochs, path=model_dir)

Saving weights to: model/nn_heart_disease_model_weights.h5


In [10]:
score = model.evaluate(x_test, y_test, verbose=0)

print(f'Test loss: {score[0]:.3f}')
print(f'Test accuracy:{score[1] * 100:.3f}')

Test loss: 0.207
Test accuracy:93.333


#### Using the model for classifying cleartest data

In [11]:
y_pred_vals = model.predict(x_test)
y_pred_vals = tf.keras.activations.sigmoid(y_pred_vals).numpy()

y_pred = (y_pred_vals > 0.5).astype("int32")



In [12]:
f, t, thresholds = metrics.roc_curve(y_test, y_pred)
cm = metrics.confusion_matrix(y_test, y_pred)
print("Score:", metrics.auc(f, t))
print("Classification report:")
print(metrics.classification_report(y_test, y_pred))
print("Confusion Matrix:")
print(cm)

Score: 0.9375
Classification report:
              precision    recall  f1-score   support

           0       1.00      0.88      0.93        16
           1       0.88      1.00      0.93        14

    accuracy                           0.93        30
   macro avg       0.94      0.94      0.93        30
weighted avg       0.94      0.93      0.93        30

Confusion Matrix:
[[14  2]
 [ 0 14]]


### Using the model for classifying encrypted data

To run the model over encrypted samples with homomorphic encryption (HE), we first load the pyhelayers package and refer it to the directory "output/", where we saved the model and the relevant datasets.

In [13]:
import pyhelayers

Load test data and labels from the h5 file

In [14]:
with h5py.File(datasets_dir + "x_test.h5") as f:
    x_test = np.array(f["x_test"])
with h5py.File(datasets_dir + "y_test.h5") as f:
    y_test = np.array(f["y_test"])

Load a plain model

In [15]:
nnp = pyhelayers.NeuralNetPlain()
nnp.init_arch_from_json_file(model_dir + "nn_heart_disease_model.json")
nnp.init_weights_from_hdf5_file(model_dir + "nn_heart_disease_model_weights.h5")
print("loaded plain model")

loaded plain model


Apply automatic optimziations

In [16]:
context = pyhelayers.DefaultContext()
optimizer = pyhelayers.HeProfileOptimizer(nnp, context)
optimizer.get_requirements().set_batch_size(16)
profile = optimizer.get_optimized_profile(False)
batch_size = profile.get_batch_size()

To reduce the memory requirements of the context, we reduce the number of rotation keys.

In [17]:
pf1=pyhelayers.PublicFunctions()
pf1.rotate=pyhelayers.RotationSetType.CUSTOM_ROTATIONS
pf1.set_rotation_steps([1,2])
# This raises an error
# pf1.conjugate=False
requirements = profile.requirement
requirements.public_functions=pf1

Intialize the HE context with the optimized configuration.

In [18]:
context.init(profile.requirement)
print('HE Context ready. Batch size=',batch_size)

HE Context ready. Batch size= 16


Print the HE context (w/ keys) size.

In [19]:
evalBuf=context.save_to_buffer();
print('Size',len(evalBuf)/1024/1024,'MB')

Size 10.689608573913574 MB


#### Encrypt the model

In [20]:
nn = pyhelayers.NeuralNet(context)
nn.encode_encrypt(nnp, profile)

Object (detailed printing not implemented yet)

We use the encrypted model over batches of 16 records at a time. 

In [21]:
plain_samples = x_test.take(indices=range(0, batch_size), axis=0)
labels = y_test.take(indices=range(0, batch_size), axis=0)

Encrypt input samples

In [22]:
samples = nn.encode_encrypt_input(plain_samples)

Now we perform inference of the 16 samples under encryption 

In [23]:
predictions=nn.predict(samples)

### Plaintext results

Decrypting the final results

In [24]:
plain_predictions = nn.decrypt_decode_output(predictions)

In [25]:
print('\nclassification results')
print('=========================================')
for label,pred in zip(labels,plain_predictions):
    print('Label:',('Healthy' if label==0 else 'Should talk with a Dr.'),end=', ')
    print('Prediction:',('Healthy' if pred[0]<0.5 else 'Should talk with a Dr.'))


classification results
Label: Healthy, Prediction: Healthy
Label: Healthy, Prediction: Healthy
Label: Should talk with a Dr., Prediction: Should talk with a Dr.
Label: Should talk with a Dr., Prediction: Should talk with a Dr.
Label: Should talk with a Dr., Prediction: Should talk with a Dr.
Label: Healthy, Prediction: Healthy
Label: Healthy, Prediction: Healthy
Label: Healthy, Prediction: Healthy
Label: Healthy, Prediction: Healthy
Label: Should talk with a Dr., Prediction: Should talk with a Dr.
Label: Healthy, Prediction: Healthy
Label: Healthy, Prediction: Should talk with a Dr.
Label: Should talk with a Dr., Prediction: Should talk with a Dr.
Label: Healthy, Prediction: Healthy
Label: Should talk with a Dr., Prediction: Should talk with a Dr.
Label: Healthy, Prediction: Healthy
