# Model creation and training

In this code, we'll create and train a model to detect cardiac arrhythmia. We use the Google's Cloud TPUs in order to train the models and the data used in this project is stored in Google's Cloud Platform. 

## Conecting to Google Cloud Storage (**GCS**)

We use a private bucket to store the data and models, if you want access to this bucket please email us at cafajar@uis.edu.co.

In [1]:
import uuid
from google.colab import auth

project_id = 'fine-program-318215'
bucket_name = 'colab-sample-bucket-' + str(uuid.uuid1())

auth.authenticate_user()
!gcloud config set project {project_id}

!echo "deb http://packages.cloud.google.com/apt gcsfuse-bionic main" > /etc/apt/sources.list.d/gcsfuse.list
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
!apt -qq update
!apt -qq install gcsfuse

!mkdir folderOnColab
!gcsfuse --implicit-dirs test_cloud_andres folderOnColab

!ls folderOnColab/

Updated property [core/project].
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2537  100  2537    0     0  72485      0 --:--:-- --:--:-- --:--:-- 72485
OK
72 packages can be upgraded. Run 'apt list --upgradable' to see them.
gcsfuse is already the newest version (0.36.0).
0 upgraded, 0 newly installed, 0 to remove and 72 not upgraded.
mkdir: cannot create directory ‘folderOnColab’: File exists
2021/10/27 16:38:54.241203 Using mount point: /content/folderOnColab
2021/10/27 16:38:54.248695 Opening GCS connection...
2021/10/27 16:38:54.410546 Mounting file system "test_cloud_andres"...
2021/10/27 16:38:54.415051 File system has been successfully mounted.
h5  models  tfrecords  zioApgo9


In [3]:
import os
import h5py
import sys
import tempfile
import zipfile
import numpy as np 
import random as rn
import tensorflow as tf
import tensorflow.keras as keras
import matplotlib.pyplot as plt 
import sklearn.metrics as sklm

#Reproducibility
#seed = 0
#os.environ['PYTHONHASHSEED'] = '0'
#np.random.seed(seed)
#rn.seed(seed)
#tf.random.set_seed(seed)

from tensorflow.keras import backend as K
from tensorflow.keras import optimizers 
from tensorflow.keras import layers
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

## Enabling the TPU
First, check in the Notebook settings and select TPU from the Hardware Accelerator drop-down.

In [4]:
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')  # TPU detection
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
print("All devices: ", tf.config.list_logical_devices('TPU'))

tpu_strategy = tf.distribute.TPUStrategy(resolver)

INFO:tensorflow:Clearing out eager caches


INFO:tensorflow:Clearing out eager caches


INFO:tensorflow:Initializing the TPU system: grpc://10.23.141.218:8470


INFO:tensorflow:Initializing the TPU system: grpc://10.23.141.218:8470


INFO:tensorflow:Finished initializing TPU system.


INFO:tensorflow:Finished initializing TPU system.


All devices:  [LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:0', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:1', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:2', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:3', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:4', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:5', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:6', device_type='TPU'), LogicalDevice(name='/job:worker/replica:0/task:0/device:TPU:7', device_type='TPU')]
INFO:tensorflow:Found TPU system:


INFO:tensorflow:Found TPU system:


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Cores: 8


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Workers: 1


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Num TPU Cores Per Worker: 8


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)


## Input data
Our input data is stored on Google Cloud Storage. We've stored our input data in TFRecord files. We have five files equally divided to allow for a 
cross-validation training, if needed.

In [14]:
AUTO = tf.data.experimental.AUTOTUNE                    # Allows for optimizations
batch_size = 16 * tpu_strategy.num_replicas_in_sync
fold_no = 1                                             # If not doing cross-validation, 
                                                        # the first set is for validation and the others for training

gcs_pattern = 'gs://test_cloud_andres/tfrecords/11k/kfolds/*.tfrecords'
filenames = tf.io.gfile.glob(gcs_pattern)
validation_fns = filenames.pop(fold_no-1)
train_fns = filenames
test_fns = tf.io.gfile.glob('gs://test_cloud_andres/tfrecords/11k/test_2200_max3.tfrecords')

print('Train TFRecords:',train_fns)
print('Validation TFRecord:',validation_fns)
print('Test TFRecord:',test_fns)

def parse_tfrecord(example):
  features = {'X': tf.io.FixedLenFeature([2049,], tf.float32),  # ECG signal
              'Y': tf.io.FixedLenFeature([1,]   , tf.int64  ),  # class
             }
  example = tf.io.parse_single_example(example, features)
  return example['X'], example['Y']-1

def load_dataset(filenames):
  records = tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTO)
  return records.map(parse_tfrecord, num_parallel_calls=AUTO)

train_dataset = load_dataset(train_fns).repeat().shuffle(2000000).batch(batch_size).prefetch(AUTO) 
val_dataset   = load_dataset(validation_fns).batch(batch_size).prefetch(AUTO) 
test_dataset  = load_dataset(test_fns).batch(batch_size).prefetch(AUTO) 

Train TFRecords: ['gs://test_cloud_andres/tfrecords/11k/kfolds/train_1760_max3_f2.tfrecords', 'gs://test_cloud_andres/tfrecords/11k/kfolds/train_1760_max3_f3.tfrecords', 'gs://test_cloud_andres/tfrecords/11k/kfolds/train_1760_max3_f4.tfrecords', 'gs://test_cloud_andres/tfrecords/11k/kfolds/train_1760_max3_f5.tfrecords']
Validation TFRecord: gs://test_cloud_andres/tfrecords/11k/kfolds/train_1760_max3_f1.tfrecords
Test TFRecord: ['gs://test_cloud_andres/tfrecords/11k/test_2200_max3.tfrecords']


Calculating steps for training

In [13]:
"""
The number of signals in each TFRecord file was previously calculated and is
hard-coded in this cell to avoid loading the data (expensive).
"""

test_size = 507443
test_steps = int(np.ceil(test_size/batch_size))

def get_steps(fold_no, batch_size):
  total_size = 2048149
  if fold_no == 1:
    val_size = 410780
    return  int(np.ceil((total_size - val_size)/batch_size)), int(np.ceil(val_size/batch_size))
  elif fold_no == 2:
    val_size = 410539
    return  int(np.ceil((total_size - val_size)/batch_size)), int(np.ceil(val_size/batch_size))
  elif fold_no == 3:
    val_size = 409318
    return  int(np.ceil((total_size - val_size)/batch_size)), int(np.ceil(val_size/batch_size))
  elif fold_no == 4:
    val_size = 407967
    return  int(np.ceil((total_size - val_size)/batch_size)), int(np.ceil(val_size/batch_size))
  elif fold_no == 5:
    val_size = 409545
    return  int(np.ceil((total_size - val_size)/batch_size)), int(np.ceil(val_size/batch_size))

train_steps, val_steps = get_steps(fold_no, batch_size)

print(train_steps)
print(val_steps)
print(test_steps)

12792
3210
3965


Loading ground-truth values to memory for latter evaluation


In [5]:
y_true = []
for signal in test_dataset:
  Y = signal[1].numpy()
  y_true.extend(Y)
y_true = np.array(y_true)

## Model
We variated the three parameters, *res_blocks*, *initial_filters* and *s_j* to create all the models shown in our paper. In this case we show the smallest model with 4,455 parameters.

In [10]:
with tpu_strategy.scope():  # Model is created in the TPUStrategy so it will train on the TPU
  def zeropad(x, filters):  # Pad zeros to match dimensions
    pad = K.zeros_like(x)
    assert (filters % pad.shape[2]) == 0
    num_repeat = filters // pad.shape[2]
    for i in range(num_repeat - 1):
        x = K.concatenate([x, pad], axis=2)
    return x 

  def basic_block(x_in, pool_size, strides, filters, kernel_size, DP):
      y = layers.MaxPooling1D(pool_size=pool_size, strides=strides, padding='same')(x_in)
      y = layers.Lambda(zeropad, arguments={'filters':filters})(y) 

      x = layers.BatchNormalization(axis=-1)(x_in)
      x = layers.ReLU()(x) 
      x = layers.Conv1D(filters=filters, kernel_size=kernel_size, padding='same')(x)
      x = layers.BatchNormalization(axis=-1)(x)
      x = layers.ReLU()(x)
      x = layers.Dropout(DP)(x)
      x = layers.Conv1D(filters=filters, kernel_size=kernel_size, padding='same')(x)
      x = layers.AveragePooling1D(pool_size=pool_size, strides=strides, padding='same')(x)
      x = layers.Add()([y,x])
      return x

  # Training parameters
  res_blocks  = 8
  initial_filters = 2
  s_j = 8

  kernel_size = 16
  input_shape = (2049, 1)
  DP = 0.2
  pool_size = 2
  strides = 2
  k = 0

  ##############################################################################
  ################################# MODEL ######################################

  filters = initial_filters*(2**k) # Modify the outputs of the conv layers
  input_signal = tf.keras.Input(shape=input_shape, name='ECG_signal')
  x = layers.Conv1D(filters=filters, kernel_size=kernel_size, padding='same')(input_signal)
  x = layers.BatchNormalization(axis=-1)(x)
  x = layers.ReLU()(x)

  y = layers.MaxPooling1D(pool_size=pool_size, strides=strides, padding='same')(x)
  y = layers.Lambda(zeropad, arguments={'filters':filters})(y) 

  x = layers.Conv1D(filters=filters, kernel_size=kernel_size, padding='same')(x)
  x = layers.BatchNormalization(axis=-1)(x)
  x = layers.ReLU()(x)
  x = layers.Dropout(DP)(x)
  x = layers.Conv1D(filters=filters, kernel_size=kernel_size, strides=strides, padding='same')(x)
  x = layers.Add()([y,x])

  for i in range(res_blocks):
      if i%s_j == 0:
          filters = initial_filters*(2**k)
          k = k + 1 
          strides = 2
          x = basic_block(x, pool_size, strides, filters, kernel_size, DP)
      else:
          strides = 1  
          x = basic_block(x, pool_size, strides, filters, kernel_size, DP)

  x = layers.BatchNormalization(axis=-1)(x)        
  x = layers.ReLU()(x)
  x = layers.Flatten()(x)
  outputs = layers.Dense(3)(x)
  model = tf.keras.Model(inputs=input_signal, outputs=outputs)
  model.compile(
      optimizer = tf.keras.optimizers.Adam(learning_rate=0.001),
      loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics = [tf.keras.metrics.SparseCategoricalAccuracy()],
      steps_per_execution = 2400  # between 2 and steps_per_epoch
      )

In [12]:
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
ECG_signal (InputLayer)         [(None, 2049, 1)]    0                                            
__________________________________________________________________________________________________
conv1d_20 (Conv1D)              (None, 2049, 2)      34          ECG_signal[0][0]                 
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 2049, 2)      8           conv1d_20[0][0]                  
__________________________________________________________________________________________________
re_lu_20 (ReLU)                 (None, 2049, 2)      0           batch_normalization_20[0][0]     
____________________________________________________________________________________________

## Training
We use three callbacks, *EarlyStopping*, *ReduceLROnPlateau* and *ModelCheckpoint* monitoring the validation loss. Train the model and save the best-performing one in GCS.

In [None]:
callbacks_list = [
tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    mode='auto',
    min_delta=1e-3,
    patience=10,
    verbose=1,
    restore_best_weights=True),    
tf.keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    mode='auto',
    factor=0.1,
    patience=3,
    verbose=1,
    min_lr=0.00000000001),
tf.keras.callbacks.ModelCheckpoint(
    filepath='folderOnColab/models/deep_models/train/Best_Model_4k.h5',
    monitor='val_loss',
    mode='auto',
    verbose=1,
    save_best_only=True,
    save_weights_only=True),
]

history = model.fit(
    train_dataset,
    validation_data=val_dataset,
    epochs=100, 
    steps_per_epoch=train_steps,
    validation_steps=val_steps,
    callbacks=callbacks_list, 
    )

Epoch 1/100

Epoch 00001: val_loss improved from inf to 0.35475, saving model to folderOnColab/models/deep_models/train/Best_Model_4k.h5
Epoch 2/100

Epoch 00002: val_loss improved from 0.35475 to 0.32909, saving model to folderOnColab/models/deep_models/train/Best_Model_4k.h5
Epoch 3/100

Epoch 00003: val_loss did not improve from 0.32909
Epoch 4/100

Epoch 00004: val_loss improved from 0.32909 to 0.29370, saving model to folderOnColab/models/deep_models/train/Best_Model_4k.h5
Epoch 5/100

Epoch 00005: val_loss did not improve from 0.29370
Epoch 6/100

Epoch 00006: val_loss did not improve from 0.29370
Epoch 7/100

Epoch 00007: ReduceLROnPlateau reducing learning rate to 0.00010000000474974513.

Epoch 00007: val_loss did not improve from 0.29370
Epoch 8/100

Epoch 00008: val_loss improved from 0.29370 to 0.29147, saving model to folderOnColab/models/deep_models/train/Best_Model_4k.h5
Epoch 9/100

Epoch 00009: val_loss did not improve from 0.29147
Epoch 10/100

Epoch 00010: val_loss di

In [None]:
model.load_weights('folderOnColab/models/deep_models/train/Best_Model_4k.h5')

## Evaluation and metrics

In [None]:
(loss, acc_full) = model.evaluate(val_dataset, steps=val_steps, verbose=1)
(loss, acc_full) = model.evaluate(test_dataset, steps=test_steps, verbose=1)



In [None]:
y_pred = model.predict(test_dataset, steps=test_steps, verbose=1)
y_pred_bool = np.argmax(y_pred, axis=1)

print(classification_report(y_true, y_pred_bool))
print('acc',accuracy_score(y_true, y_pred_bool))
print('precision',precision_score(y_true, y_pred_bool , average="macro"))
print('recall',recall_score(y_true, y_pred_bool , average="macro"))
print('f1',f1_score(y_true, y_pred_bool , average="macro"))

              precision    recall  f1-score   support

           0       0.89      0.93      0.91    287147
           1       0.54      0.84      0.66     17369
           2       0.93      0.83      0.88    202927

    accuracy                           0.89    507443
   macro avg       0.79      0.87      0.81    507443
weighted avg       0.89      0.89      0.89    507443

acc 0.8856167096600012
precision 0.7856753067602075
recall 0.8675901622765427
f1 0.8143971685100349
