# Homework Inversion - Alessandro Delmonte

## Config

The following line will install dependecies. I leave it as a comment to not mess up with your environment. I usually build a docker image when working with tensorflow, let me know if you need the dockerfile. I  trained the algorithms on a GPU but the notebook should run without problems on CPU.

In [None]:
# !pip install tensorflow scikit-learn numpy pywavelets biosppy

In [None]:
# Just defining some standard imports and configs I like to use when working with notebooks

from __future__ import absolute_import, division, print_function, unicode_literals

%load_ext autoreload
%autoreload 2

import sys
assert sys.version_info.major == 3, 'Not running on Python 3'

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import logging
logging.basicConfig(level=logging.INFO, stream=sys.stdout)

# For plotting
import seaborn as sns
sns.set_context('notebook')
sns.set(style="whitegrid", font_scale=1.5)
sns.despine()

import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.rcParams['figure.figsize'] = [10, 5]

## Problem statement

The ECG is a time series that measures the electrical activity of the heart. This is the main tool to diagnose heart diseases. Recording an ECG is simple: 3 electrodes are placed at the ends of limbs, and 6 on the anterior chest.This generates 12 time series, called leads, each corresponding to a difference in potential between a pair of electrodes.
The electrodes’ position is very important to correctly interpret the ECG. Making the mistake of inverting electrodes compromises interpretation, either because the leads do not explore the expected area (errors in the measures of hypertrophia indices, in the analysis of the ST segment), or because they generate false abnormalities (fake Q waves, error in the heart’s axis…).
Inversion errors are frequent (5% of ECGs), and only experts (cardiologists) manage to detect them. But most ECGs are not interpreted by experts: only 30% are, the rest being interpreted by nurses or general practitioners. An algorithm for automatic detection of electrode inversion is therefore paramount to the correct interpretation of ECGs and would improve the quality of diagnosis.
This project is intended to make you detect electrode inversion in an ECG. The dataset at your disposal contains ECGs from a cardiology center. An ECG will be labeled as correctly realised (0) or as inverted (1). The goal is to perform binary classification on these ECGs.

The key objective of this homework is to propose a model relevant to the task that shows good accuracy in detection of lead inversion.

## Data Analysis

Data:
- The training data contains 1400 ECGs and their labels. For each ECG, the data consists of 10 seconds of recording for 12 leads, each sampled at 250Hz.
- The testing data contains 2630 ECGs.
Each input file therefore contains the ECG signal in the form (n_ecgs, n_samples=2500, n_leads=12).

We got for the training set 1400 examples of a 10 seconds ECG (2500 steps sampled at 250Hz) for the 12 leads. This 3-dimensional formulation should be quite conventient to use with ML and DL algorithms. I will load the data and check some of them.

I assume the data are in the same folder of the notebook

In [None]:
import numpy as np

in_training_path = 'input_training.npy'
out_training_path = 'output_training.npy'
in_test_path = 'input_test.npy'
#out_test_path = 'output_test.npy'

x_train_raw = np.load(in_training_path)
y_train_raw = np.load(out_training_path)
x_test_raw = np.load(in_test_path)
#y_test_raw = np.load(out_test_path)

sampling_rate = 250

print('Training dataset shape: {} - Training labels shape: {} - Testing dataset shape: {}'.format(
    x_train_raw.shape, y_train_raw.shape, x_test_raw.shape))

Visualizing a couple of ECG in order to better understand data. As stated in the problem formulation, inversion can impact any of the leads. I do not think I should simplify the problem removing some leads (I think the simplification of the problem via reduction of dimensionality would only cause loss of information)

In [None]:
# All leads
fig, ax = plt.subplots(x_train_raw.shape[-1], 1, sharex=True, sharey=True)
for i in range(x_train_raw.shape[-1]):
    sns.lineplot(data=x_train_raw[0, :100, i], ax=ax[i])
plt.show();
# One lead
sns.lineplot(data=x_train_raw[0, :, 0])
plt.show();
# One lead - some cycles
sns.lineplot(data=x_train_raw[0, :100, 0])
plt.show();

All the leads are correctly loaded. I am not a trained cardiologist (maybe a cardiologs soon ;) ) but the signal looks clean overall. A drift is clarly visible when looking at the 10s interval and some high frequency artifacts can be seen both for the 10s period and for a couple of ecg cycles. As confirmed by mail, the test set share the same statistics of the training set so I can use the same preprocessing pipeline for the two.

The training set size is relatively small, especially if compared to the test set. I might consider some data augmentation but I should be careful not to insert samples without physiological meaning.

It is interesting to check the label distribution in order to see if there is a significant imbalance in the training set. If the dataset is unbalanced I might think about oversampling the minority class (with caution to avoid overfitting on the minority data, especially when using such a small dataset). I might also undersample the majority class or just assign weights in the loss function.

In [None]:
sns.countplot(x=y_train_raw)
plt.show();

Data seems a little bit unbalanced towards the negative samples. For the moment I won't do more but I will come back here if the final score are not satisfying. Precision should be a good metric as it is less influenced by a lot of negative samples. Recall could be more influenced by the distribution but I should still analyse it since for medical application I prefer an algorithm that do not miss any true positive, even if it creates false alerts sometimes.

## Outline of solution

Supervised learning seems the way to go. I will use the training set and the given labels to train a ML/DL algorithm to predict the class of the test set. My strategy will be: clean the data, define a first the model, define the training strategy, tune the hyperparameters, explore more sophisticated options to generalise predictions.

## Data Pre-Processing

I will start with preprocessing since in any case I need to prepare the data. Looking at the data I may use several pre-processing steps in order to denoise high frequencies artifcats, removal low frequency drift and normalize the scale.
For denoising and drift removal I might use a wavelet based analysis which could be enough to remove both. Even if the number of decomposition levels and the thresholding value might be hard to optimize. A FIR filter should also be OK. 
I might opt to downsample the signal to remove noise but it is not a mandatory step since every sample has the same sampling rate (might obtain improvements with a lower sampling rate? it might reduce noise but lose some information, to be eventually explored). 
For normalization I could just min-max scale the values in the range [0,1] (or [-1, 1] since negative values have a physiological meaning). I could also just do a z-score standardization. I think it might be better to have the baseline at 0. For the mean and the std I can just use the global statistics of the training dataset ( I dont want to touch the test dataset to avoid influencing the results).

From a quick research of the state-of-the-art it seems that a wavelet approach might be enough for denoising. Wavelets decompose the signal in approximate and detailed coefficients. Appoximate coefficients represents low frequency components and detailed coeff represent high frequency components. db4 or db5 should be the wavelets better suited to ecg analysis. pywavelets does suport a lot of them so it is not productive to spend time on the choice. I will start with db4 and I might chek db5 if i got some time remaining. I may compute the order based on the length of the samples and the threshold based on what I found in the SOTA. I should be able to filter the signal by thresholding both types of coefficients.

In [None]:
import pywt

def wtd(x, w, l):
    coeff = pywt.wavedec(x, w, mode="per", level=l)

    #low frequency thresholding
    coeff[0] = np.zeros(coeff[0].shape)
     
    # thresholding with mean absolute deviation
    sigma = np.mean(np.absolute(coeff[-1] - np.mean(coeff[-1]))) / 0.6745
    thresh = sigma * np.sqrt(2 * np.log(len(x)))
    # detailed coefficients in inverse order starting from pos 1
    coeff[1:] = (pywt.threshold(i, value=thresh, mode='hard') for i in coeff[1:])
    
    return pywt.waverec(coeff, w, mode='per')

wav = pywt.Wavelet('db4')
level = int(np.log2(x_train_raw.shape[1])) # seems to high. to check again

x_train_denoised = np.zeros(x_train_raw.shape)
for s in range(x_train_raw.shape[0]):
    for der in range(x_train_raw.shape[-1]):
        x_train_denoised[s, :, der] =  wtd(x_train_raw[s, :, der], w=wav, l=level)

x_test_denoised = np.zeros(x_test_raw.shape)
for s in range(x_test_raw.shape[0]):
    for der in range(x_test_raw.shape[-1]):
        x_test_denoised[s, :, der] =  wtd(x_test_raw[s, :, der], w=wav, l=level)

sns.lineplot(data=x_test_raw[0, :100, 0], label='Raw')
sns.lineplot(data=x_test_denoised[0, :100, 0], label='Wavelet Denoising db4 - {}'.format(level))
plt.show();

Thresholding based on the mean absolute deviation of the first detailed coefficient seems promising. I  zeroed out all approximate coeffs but some drift is still visible. I might insert a median filter.


I may use a passband filter with cutoff at 3Hz and 50Hz. 3Hz should be high enough to remove motion artifact without impacting the signal. 50Hz stopband should be enough to remove powersupply frequency. I will decide the order empirically, I did not found any reference values...

In [None]:
from biosppy.signals.tools import filter_signal

order = int(20)

x_train_nodrift = np.zeros(x_train_denoised.shape)
for s in range(x_train_denoised.shape[0]):
    for der in range(x_train_denoised.shape[-1]):
        x_train_nodrift[s, :, der], _, _ = filter_signal(signal=x_train_denoised[s, :, der], ftype="FIR", band="bandpass",
                                                 order=order, frequency=[3, 50],
                                                 sampling_rate=sampling_rate)
        
x_test_nodrift = np.zeros(x_test_denoised.shape)
for s in range(x_test_denoised.shape[0]):
    for der in range(x_test_denoised.shape[-1]):
        x_test_nodrift[s, :, der], _, _ = filter_signal(signal=x_test_denoised[s, :, der], ftype="FIR", band="bandpass",
                                                 order=order, frequency=[3, 50],
                                                 sampling_rate=sampling_rate)

sns.lineplot(data=x_test_denoised[0, :, 0], label='Wavelet Denoising db4 - {}'.format(level))
sns.lineplot(data=x_test_nodrift[0, :, 0], label='Baseline Drift Removed - Bandpass order {}'. format(order))
plt.show();

Drift seems correctly removed. The signal seems centered and I don't think the order is too high since I do not see strange artifacts on the waves

For normalization I think for the moment I would juste use a z-score standardasization since I already found good results in conjunction with supervised models in other applications (and the distributions should be gaussian). Results of standardization seems ok. I will try out min-max normalization if training is not satisfying. I might normalize the signal based on global or by lead  statitics. I think I prefer to normalize by global statistics in order not to change the ratio between leads. I will obsiouly compute the stats on the training dataset only to avoid bias.

In [None]:
mean = np.mean(x_train_nodrift)
std = np.std(x_train_nodrift)

def z_score_norm(x):
    if std > 0:
        x = (x - mean) / std
    else:
        x *= 0.
    return x

x_train_centered = np.zeros(x_train_denoised.shape)
for s in range(x_train_nodrift.shape[0]):
    for der in range(x_train_nodrift.shape[-1]):
        x_train_centered[s, :, der] = z_score_norm(x_train_nodrift[s, :, der])
        
x_test_centered = np.zeros(x_test_denoised.shape)
for s in range(x_test_nodrift.shape[0]):
    for der in range(x_test_nodrift.shape[-1]):
        x_test_centered[s, :, der] = z_score_norm(x_test_nodrift[s, :, der])
        
sns.lineplot(data=x_train_centered[0, :, 0], label='Train Standard')
plt.show();

sns.lineplot(data=x_test_centered[0, :, 0], label='Test Normalized')
plt.show();

In order to supervise the models I will need to split the dataset into training and validation. I guess a 80% / 20% should be OK. It may be usefeull to prepare multiple folds to cross validate the results. I don't have a lot of data so I might also use the folds to do some ensemble learning as the last step in order to fully exploits all samples available. I will define the model, fix the hyperparameters and train the same model 5 times on different folds. The result will be given by the weighted mean of each trained model. To have an ensemble that generalise well I can prepare the folds using the ground truth labels for stratification.

In [None]:
from sklearn.model_selection import StratifiedKFold

skf = StratifiedKFold(n_splits=5, random_state=42, shuffle=True)
skf.get_n_splits(x_train_centered, y_train_raw)

train_folds = []
val_folds = []
for train_index, test_index in skf.split(x_train_centered, y_train_raw):
    train_folds.append((x_train_centered[train_index], y_train_raw[train_index]))
    val_folds.append((x_train_centered[test_index],  y_train_raw[test_index]))

train_ds_fold0 = train_folds[0]
val_ds_fold0 = val_folds[0]

sns.countplot(x=train_ds_fold0[1])
plt.show();

pos = np.count_nonzero(y_train_raw)
total = len(y_train_raw)
neg = total - pos
class_weight = {0: (1 / neg) * (total / 2.0),
                1: (1 / pos) * (total / 2.0)} # To be eventually used for class weighting

Stratification is correct.
For the moment I will only work with only the first fold in order to find a working architecture. Data are ready.

## Model  Selection and Training

A baseline for binary classification might be achieved with a support vector machine or a random forest classifier. These approaches might work but I do not expect good performances for higher dimensional data. I will need to flatten the leads dimensions in order to have a series of 1D arrays of length timestamp x leads. I t is safe to assume the classification problem is not linearly separable so I will try a svm with a non linear kernel and a rf since it should be fast to compute.

In [None]:
from sklearn.svm import NuSVC

svm = NuSVC()
svm.fit(np.reshape(train_ds_fold0[0], 
                   (train_ds_fold0[0].shape[0], train_ds_fold0[0].shape[1]*train_ds_fold0[0].shape[2])
                  ), train_ds_fold0[1])
acc = svm.score(np.reshape(val_ds_fold0[0],
                           (val_ds_fold0[0].shape[0], val_ds_fold0[0].shape[1]*val_ds_fold0[0].shape[2])
                           ), val_ds_fold0[1])

print('SVM mean accuracy score: {:.2%}'.format(acc))

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()
rf.fit(np.reshape(train_ds_fold0[0], 
                   (train_ds_fold0[0].shape[0], train_ds_fold0[0].shape[1]*train_ds_fold0[0].shape[2])
                  ), train_ds_fold0[1])
acc = rf.score(np.reshape(val_ds_fold0[0],
                           (val_ds_fold0[0].shape[0], val_ds_fold0[0].shape[1]*val_ds_fold0[0].shape[2])
                           ), val_ds_fold0[1])

print('Random Forest mean accuracy score: {:.2%}'.format(acc))

ML algorithms reach an accuracy of around 70%. I am quite confident I can do better using DL and neural network.

In order to properly consider the third dimension I think a deep neural network should be the way to go. For time series analysis my options would be using recurrent neural network, convolutional neural network or transformers. I guess transformers might be too much for the task and the limited amount of data available for training. Since the third dimensions is not a time feature but a signal feature I am more inclided to implemented a CNN based model using 1-dimensional convolutions. The data are already in the good format since I have timesteps x features. I will start implemnting a base convolutional network and I will add complexity if I obtain some results. It might be interesting to combine convolution and lstm layers. 

For the deep learning framework I will use Tensorflow 2.8. Notebook can work on gpu using memory growth strategy. Tensorboard logs are stored in the ./tf_logs directory.The following cell is just for standard configuration.

In [None]:
# Tensorflow 2.x standard import + tensorboard
from packaging import version
import tensorflow as tf
print("TensorFlow version: {}".format(tf.__version__))
assert version.parse(tf.__version__).release[0] >= 2, "This notebook requires TensorFlow 2.0 or above."
%load_ext tensorboard
from tensorboard.plugins.hparams import api as hp

# GPU accelerated processing if available
gpu_list = tf.config.list_physical_devices('GPU')
print("GPUs Available:")
if gpu_list:
    try:
        for gpu in gpu_list:
            tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.experimental.list_logical_devices('GPU')
            print(len(gpu_list), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        print(e)
else:
    print('None')

# Debugging
debug_mode = False # Reminder to myself: set to false before sending 
tf.config.run_functions_eagerly(debug_mode)
if debug_mode:
    tf.data.experimental.enable_debug_mode()
print("Eager execution: {}".format(tf.executing_eagerly()))

# Empty logs and cache
!rm -rf ./tf_logs/ 
tf.keras.backend.clear_session()

For a first try I will build a base CNN model and I will see if it has some training capabilities. I can try to overfit the model to training data, I will think about optimization if I found a suitable architecture. A first network could be conv -> pooling -> dense -> classification. I might create a conv layer composed of a conv1d+batchnorm+activation and an eventual dropout to improve generalisation. I will just use 1 convolutional layer to start.

For the hyperparameters.

- 50 epochs should be enough to overfit the model
- For the bartch size I do not want to start too high as it might just flatten out performances. I will put 32 for now and willl do a grid search after if I got the time.
- I  always got good performance with 2D and 3D CNN using Adam optimizer. I can start with an high lr just to see what happens and to optimize it along with the batch size

In [None]:
epochs = 50
batch_size = 32
lr = 1e-3

For the dataloader I am intered in reshuffling at each interation the training dataset and I dont care if the batch are not presented in a deterministic order. both dataset can be prefetched and cached for performance

In [None]:
train_ds = tf.data.Dataset.from_tensor_slices(train_ds_fold0)
val_ds = tf.data.Dataset.from_tensor_slices(val_ds_fold0)

train_ds = train_ds.batch(batch_size, num_parallel_calls=tf.data.experimental.AUTOTUNE,
                          deterministic=False
                         ).prefetch(tf.data.experimental.AUTOTUNE).cache().shuffle(len(train_ds),
                                                                                   reshuffle_each_iteration=True)
val_ds = val_ds.batch(batch_size, num_parallel_calls=tf.data.experimental.AUTOTUNE,
                      deterministic=True
                     ).prefetch(tf.data.experimental.AUTOTUNE).cache()

The most delicate hyperparameters I should decide how to fix are the number of convolutional filters and the kernel size. For the kernel size I need to define  how many neighboors I want to include in the convolution. Considering that each timestep corresponds to 1/250 s = 4ms I could take a 100ms window using a kernel_size=25. 100ms should represent a normal interval for a QRS complex (looked up on wiki, hopefully they're right...). I am not quite sure yet how many filters I should use, I guess I can start with 32/64 and ramp up if necessary, I will definetively add more filters if the model underfit.

I inserted a batchnorm layer to guide training. I expect it to behave correctly but I might remove it if I see too much smoothing.

The only activation functions I really consider is the relu family. I dont want to introduce complexity with a parametric relu but I could consider a leaky relu, I dont expect an huge improvements of performance with a lrelu but might try it if I got time.

Global pooling seems the good options to flatten the results and reduce complexity at the same time.

For the other hyperparameters I prefer to use an he initialization over glorot when using CNN but it shouldnt change much tbh. 

The dense layer has obsiously only two neurons since it is a binary classification problem. I am not sure yet if I want to add more hidden dense layers, I think that if I want to increase the model complexity I should just add more convolution to extract more important signal features.

I use a softmax activation function but I could also use a sigmoid and impose a threshold for classification. Crossentropy should be the correct loss function to optimize but I might explore more complex loss if necessary, I use its sparse version since my labels are not in a one-hot formulation.

In [None]:
cnn_model = tf.keras.Sequential()
cnn_model.add(tf.keras.layers.Conv1D(filters=128, kernel_size=25, padding='same', kernel_initializer='he_normal',
                                     activation=None, input_shape=(2500, 12)))
cnn_model.add(tf.keras.layers.BatchNormalization(axis=-1))
cnn_model.add(tf.keras.layers.LeakyReLU())
cnn_model.add(tf.keras.layers.GlobalAveragePooling1D())
cnn_model.add(tf.keras.layers.Dense(2, activation='softmax'))

cnn_model.compile(optimizer=tf.keras.optimizers.Adam(lr),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy',],)

cnn_model.summary()

I usually use tensorboard to follow the training curves and distributions. If you want to check training logs outside this notebook you should be able to find the tensoboard window in localhost at port 6006

In [None]:
%tensorboard --logdir ./tf_logs --host 0.0.0.0

First training:

In [None]:
history = cnn_model.fit(x=train_ds, validation_data=val_ds,
                        batch_size=batch_size, epochs=epochs, verbose=1,
                        callbacks=[tf.keras.callbacks.TensorBoard(log_dir='./tf_logs/baseCNN',
                                                                  histogram_freq=1, update_freq='batch')],
          use_multiprocessing=True)

In [None]:
sns.lineplot(data=history.history['accuracy'], label='accuracy')
sns.lineplot(data=history.history['val_accuracy'], label='val_accuracy')
plt.show();
sns.lineplot(data=history.history['loss'], label='loss')
sns.lineplot(data=history.history['val_loss'], label='val_loss')
plt.show();

The training actually seems pretty stable without presenting overfitting which is good news. For a first try we see a 10% accuracy increase over the random forest model.

I think increasing the number of parameters should improve performance if the additional conv layers can extract meaningful features. I might repeat two-three times the conv-batch-activation module without chaning the other params. I will definitely tune down the lr to provide a more stable training loop.

In [None]:
class ConvBatchRelu(tf.keras.layers.Layer):
    def __init__(self, filters=128, kernel_size=25, inshape=None, name='ConvBatchRelu', **kwargs):
        super(ConvBatchRelu, self).__init__(name=name, **kwargs)
        if inshape:
            self.conv = tf.keras.layers.Conv1D(filters=filters, kernel_size=kernel_size, padding='same',
                                               kernel_initializer='he_normal', activation=None,
                                               input_shape=inshape)
        else:
            self.conv = tf.keras.layers.Conv1D(filters=filters, kernel_size=kernel_size, padding='same',
                                               kernel_initializer='he_normal', activation=None)
        self.batch = tf.keras.layers.BatchNormalization(axis=-1)
        self.act = tf.keras.layers.ReLU()
        self.drop = tf.keras.layers.Dropout(0.5)
                
    @tf.function
    def call(self, inputs, training=None, **kwargs):
        x = self.conv(inputs, training=training)
        x = self.batch(x, training=training)
        x = self.act(x, training=training)
        return self.drop(x)


class CNN1D(tf.keras.models.Model):
    def __init__(self, n_convs=2, base_filters=128, kernel_size = 25, n_class=2, inshape=(2500, 12)):
        super(CNN1D, self).__init__(self)
        self.convs = [ConvBatchRelu(base_filters, kernel_size, inshape) if i==0 
                      else ConvBatchRelu(base_filters*(2**i), kernel_size) for i in range(n_convs)]
        self.p = tf.keras.layers.GlobalAveragePooling1D()
        self.dense = tf.keras.layers.Dense(n_class, activation='softmax')
        
    @tf.function
    def call(self, inputs, training=None, **kwargs):
        x = inputs
        for c in self.convs:
            x = c(x, training=training)
        x = self.p(x, training=training)
        x = self.dense(x, training=training)
        return x

New training:

In [None]:
lr = 1e-4
n_convs = [2]
for n in n_convs:
    cnn_deeper = CNN1D(n)

    cnn_deeper.compile(optimizer=tf.keras.optimizers.Adam(lr),
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy',])

    history = cnn_deeper.fit(x=train_ds, validation_data=val_ds,
                             batch_size=batch_size, epochs=epochs, verbose=2, 
                             callbacks=[tf.keras.callbacks.TensorBoard(log_dir='./tf_logs/{}_convs_CNN'.format(n),
                                                                       histogram_freq=1, update_freq='batch')],
                             use_multiprocessing=True)

The training seems correct and the results are improved. Adding too many layers just brought instability. I tried adding a dropout layer which actually improved results a little bit. It may be interesting to do a proper hyperparameters optimization with the tensorboard hparams plugins since I might get better results with more layers and a lower lr to reduce overfitting.

In [None]:
cnn_deeper.summary()

sns.lineplot(data=history.history['accuracy'], label='accuracy')
sns.lineplot(data=history.history['val_accuracy'], label='val_accuracy')
plt.show();
sns.lineplot(data=history.history['loss'], label='loss')
sns.lineplot(data=history.history['val_loss'], label='val_loss')
plt.show();

Overall the model behave quite well. Plotting the confusion matrix and some stats of the best model in order to better define its performances

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score

y_pred = np.argmax(cnn_deeper.predict(val_ds_fold0[0]), axis=1)
y_true = val_ds_fold0[1]

print('Accuracy: {:.1%} - Precision: {:.1%} - Recall: {:.1%}'.format(accuracy_score(y_true, y_pred),
                                                                     precision_score(y_true, y_pred),
                                                                     recall_score(y_true, y_pred)))

confusion_mtx = tf.math.confusion_matrix(y_true, y_pred)
sns.heatmap(confusion_mtx,
            annot=True, fmt='g')
plt.xlabel('Prediction')
plt.ylabel('Label')
plt.show();

We observe an overall good accuracy with a precision and recall rate which are quite similar. If I had to maximise one of the two I would say that recall should be prioritized as in medical practice it is better to have a system which is able to predict all true positives with lower precision than having a precise system that is prone to miss imortant medical information

It might be interesting to check if a robust loss such as the F1 score can better work in this case where there is a bit of class imbalance. Also adding an LSTM to exploit more temporal information should be interesting. I just did a quick test but I did not really find more satisfying results.

In [None]:
class CNN1DLSTM(CNN1D):
    def __init__(self, n_convs=2, base_filters=64, kernel_size = 25, n_class=2, inshape=(2500, 12)):
        super(CNN1DLSTM, self).__init__(n_convs, base_filters, kernel_size, n_class, inshape)
        self.p = tf.keras.layers.LSTM(16)

In [None]:
rnn = CNN1DLSTM()

rnn.compile(optimizer=tf.keras.optimizers.Adam(lr),
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy',])

history = rnn.fit(x=train_ds, validation_data=val_ds,
                     batch_size=batch_size, epochs=epochs, verbose=2, 
                     callbacks=[tf.keras.callbacks.TensorBoard(log_dir='./tf_logs/F1_loss_CNN'.format(n),
                                                               histogram_freq=1, update_freq='batch')],
                     use_multiprocessing=True)

In [None]:
rnn.summary()

sns.lineplot(data=history.history['accuracy'], label='accuracy')
sns.lineplot(data=history.history['val_accuracy'], label='val_accuracy')
plt.show();
sns.lineplot(data=history.history['loss'], label='loss')
sns.lineplot(data=history.history['val_loss'], label='val_loss')
plt.show();

I just tried a quick RNN training and the model is not stable. I might be able to furter improve performances tuning hyperparameters but it could become quite time-consuming. I will just optimize the parameters for the convnet and provide it as the best model. I will try modifying batch, lr and number of epochs

In [None]:
train_ds = tf.data.Dataset.from_tensor_slices(train_ds_fold0)
val_ds = tf.data.Dataset.from_tensor_slices(val_ds_fold0)
train_ds = train_ds.batch(batch_size*4, num_parallel_calls=tf.data.experimental.AUTOTUNE,
                          deterministic=False
                         ).prefetch(tf.data.experimental.AUTOTUNE).cache().shuffle(len(train_ds),
                                                                                   reshuffle_each_iteration=True)
val_ds = val_ds.batch(batch_size*4, num_parallel_calls=tf.data.experimental.AUTOTUNE,
                      deterministic=True
                     ).prefetch(tf.data.experimental.AUTOTUNE).cache()

model_opt = CNN1D(2)

model_opt.compile(optimizer=tf.keras.optimizers.Adam(lr*2),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy',],)

history = model_opt.fit(x=train_ds, validation_data=val_ds,
                      batch_size=4*batch_size, epochs=2*epochs, verbose=2, 
                      callbacks=[tf.keras.callbacks.TensorBoard(log_dir='./tf_logs/{}_convs__batchx2_CNNN'.format(n_convs),
                                                                histogram_freq=1, update_freq='batch')],
                      use_multiprocessing=True)

The model I decided to keep is ... tHE RECALL AND PRECISION ARE ....

In [None]:
y_pred = np.argmax(model_opt.predict(val_ds_fold0[0]), axis=1)
y_true = val_ds_fold0[1]

print('Accuracy: {:.1%} - Precision: {:.1%} - Recall: {:.1%}'.format(accuracy_score(y_true, y_pred),
                                                                     precision_score(y_true, y_pred),
                                                                     recall_score(y_true, y_pred)))

confusion_mtx = tf.math.confusion_matrix(y_true, y_pred)
sns.heatmap(confusion_mtx,
            annot=True, fmt='g')
plt.xlabel('Prediction')
plt.ylabel('Label')
plt.show();

sns.lineplot(data=history.history['accuracy'], label='accuracy')
sns.lineplot(data=history.history['val_accuracy'], label='val_accuracy')
plt.show();
sns.lineplot(data=history.history['loss'], label='loss')
sns.lineplot(data=history.history['val_loss'], label='val_loss')
plt.show();

## Ensemble Strategy

In [None]:
models = []
for i, fold in enumerate(train_folds):
    train_ds = tf.data.Dataset.from_tensor_slices(fold)
    val_ds = tf.data.Dataset.from_tensor_slices(fold)

    train_ds = train_ds.batch(4*batch_size, num_parallel_calls=tf.data.experimental.AUTOTUNE,
                              deterministic=False
                             ).prefetch(tf.data.experimental.AUTOTUNE).cache().shuffle(len(train_ds),
                                                                                       reshuffle_each_iteration=True)
    val_ds = val_ds.batch(4*batch_size, num_parallel_calls=tf.data.experimental.AUTOTUNE,
                          deterministic=True
                         ).prefetch(tf.data.experimental.AUTOTUNE).cache()
    
    cnn_deeper = CNN1D(n)

    cnn_deeper.compile(optimizer=tf.keras.optimizers.Adam(2*lr),
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy',])
    
    history = cnn_deeper.fit(x=train_ds, validation_data=val_ds,
                             batch_size=4*batch_size, epochs=2*epochs, verbose=0, 
                             callbacks=[tf.keras.callbacks.TensorBoard(log_dir='./tf_logs/final_fold{}_CNN'.format(i),
                                                                       histogram_freq=1, update_freq='batch')],
                             use_multiprocessing=True)
    
    models.append(cnn_deeper)

In [None]:
ot = []

for m in models:
    ot.append(m.predict(x_test_centered))

ot = np.mean(np.array(ot), axis=0)
ot = np.argmax(ot, axis=1)

## Comments

In [None]:
np.save('final.npy', ot)
assert np.load('final.npy').shape == (2630, ), 'Shape mismatch'

## Notes

In [None]:
y_pred = ot
y_true = y_test_raw

print('Accuracy: {:.1%} - Precision: {:.1%} - Recall: {:.1%}'.format(accuracy_score(y_true, y_pred),
                                                                     precision_score(y_true, y_pred),
                                                                     recall_score(y_true, y_pred)))

confusion_mtx = tf.math.confusion_matrix(y_true, y_pred)
sns.heatmap(confusion_mtx,
            annot=True, fmt='g')
plt.xlabel('Prediction')
plt.ylabel('Label')
plt.show();

In [None]:
class F1Loss(tf.keras.losses.Loss):
    def call(self, y, y_hat):
        y = tf.squeeze(tf.cast(y, tf.int32))
        y = tf.one_hot(y, depth=2)
        y = tf.cast(y, tf.float32)
        y_hat = tf.cast(y_hat, tf.float32)
        tp = tf.reduce_sum(y_hat * y, axis=0)
        fp = tf.reduce_sum(y_hat * (1 - y), axis=0)
        fn = tf.reduce_sum((1 - y_hat) * y, axis=0)
        soft_f1 = 2*tp / (2*tp + fn + fp + 1e-16)
        cost = 1 - soft_f1 # reduce 1 - soft-f1 in order to increase soft-f1
        macro_cost = tf.reduce_mean(cost) # average on all labels
        return macro_cost

In [None]:


cnn_lstm = CNN1DLSTM()

cnn_lstm.compile(optimizer=tf.keras.optimizers.Adam(10*lr),
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy',])
    
history = cnn_lstm.fit(x=train_ds, validation_data=val_ds,
                        batch_size=batch_size, epochs=epochs, verbose=1, 
                            callbacks=[tf.keras.callbacks.TensorBoard(log_dir='./tf_logs/new_real_lstm_CNN'.format(i),
                                                                       histogram_freq=1, update_freq='batch')],
                             use_multiprocessing=True)
    

In [None]:
y_pred = np.argmax(cnn_lstm.predict(x_test_centered), axis=1)
y_true = y_test_raw

print('Accuracy: {:.1%} - Precision: {:.1%} - Recall: {:.1%}'.format(accuracy_score(y_true, y_pred),
                                                                     precision_score(y_true, y_pred),
                                                                     recall_score(y_true, y_pred)))

confusion_mtx = tf.math.confusion_matrix(y_true, y_pred)
sns.heatmap(confusion_mtx,
            annot=True, fmt='g')
plt.xlabel('Prediction')
plt.ylabel('Label')
plt.show();

In [None]:
class ResBlock(tf.keras.layers.Layer):
    def __init__(self, filters=64, kernel_size=8, inshape=None, use_conv_short=True,
                 name='ResBlock', **kwargs):
        super(ResBlock, self).__init__(name=name, **kwargs)
        
        self.use_conv_short = use_conv_short
        if inshape:
            self.conv1 = tf.keras.layers.Conv1D(filters=filters, kernel_size=kernel_size, padding='same',
                                               kernel_initializer='he_normal', activation=None,
                                               input_shape=inshape)
        else:
            self.conv1 = tf.keras.layers.Conv1D(filters=filters, kernel_size=kernel_size, padding='same',
                                               kernel_initializer='he_normal', activation=None)
        self.batch1 = tf.keras.layers.BatchNormalization(axis=-1)
        self.act1 = tf.keras.layers.ReLU()
        
        self.conv2 = tf.keras.layers.Conv1D(filters=filters, kernel_size=kernel_size-3, padding='same',
                                            kernel_initializer='he_normal', activation=None)
        self.batch2 = tf.keras.layers.BatchNormalization(axis=-1)
        self.act2 = tf.keras.layers.ReLU()
        
        self.conv3 = tf.keras.layers.Conv1D(filters=filters, kernel_size=kernel_size-5, padding='same',
                                            kernel_initializer='he_normal', activation=None)
        self.batch3 = tf.keras.layers.BatchNormalization(axis=-1)
        
        self.conv_short = tf.keras.layers.Conv1D(filters=filters, kernel_size=1, padding='same',
                                            kernel_initializer='he_normal', activation=None)
        self.batch_short = tf.keras.layers.BatchNormalization(axis=-1)
        
        self.act_fin = tf.keras.layers.ReLU()
        
    @tf.function
    def call(self, inputs, training=None, **kwargs):
        x = self.conv1(inputs, training=training)
        x = self.batch1(x, training=training)
        x = self.act1(x, training=training)
        
        x = self.conv2(x, training=training)
        x = self.batch2(x, training=training)
        x = self.act2(x, training=training)
        
        x = self.conv3(x, training=training)
        x = self.batch3(x, training=training)
        
        if self.use_conv_short:
            x_short = self.conv_short(inputs, training=training)
        else:
            x_short = inputs
        x_short = self.batch_short(x_short)
        
        x = x + x_short
        
        x = self.act_fin(x)
        
        return x
    
class ResNet1D(CNN1D):
    def __init__(self, base_filters=64, kernel_size = 16, n_class=2, inshape=(1000,12)):
        super(ResNet1D, self).__init__(self, base_filters=64, kernel_size = 16, n_class=2, inshape=(1000,12))
        self.first = ResBlock(base_filters, kernel_size, inshape, True)
        self.second = ResBlock(base_filters * 2, kernel_size, inshape, True)
        self.third = ResBlock(base_filters * 2, kernel_size, inshape, False)
        
rnn = CNN1DLSTM()

rnn.compile(optimizer=tf.keras.optimizers.Adam(lr),
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy',])

history = rnn.fit(x=train_ds, validation_data=val_ds,
                     batch_size=batch_size, epochs=epochs, verbose=2, 
                     callbacks=[tf.keras.callbacks.TensorBoard(log_dir='./tf_logs/F1_loss_CNN'.format(n),
                                                               histogram_freq=1, update_freq='batch')],
                     use_multiprocessing=True)
        

In [None]:
class ConvBlock(tf.keras.layers.Layer):
    def __init__(self, filters=64, kernel_size=16, use_final_conv=True, use_final_pooling=True,
                 name='ConvBlock', **kwargs):
        super(ConvBlock, self).__init__(name=name, **kwargs)
        self.use_final_conv = use_final_conv
        self.use_final_pooling = use_final_pooling

        self.batch_1 = tf.keras.layers.BatchNormalization(axis=-1)
        self.act_1 = tf.keras.layers.ReLU()
        self.drop_1 = tf.keras.layers.Dropout(0.5)
        self.conv_1 = tf.keras.layers.Conv1D(filters, kernel_size=kernel_size, padding='same', 
                                             kernel_initializer='he_normal', activation=None)
        self.batch_2 = tf.keras.layers.BatchNormalization(axis=-1)
        self.act_2 = tf.keras.layers.ReLU()
        self.drop_2 = tf.keras.layers.Dropout(0.5)
        self.conv_2 = tf.keras.layers.Conv1D(filters, kernel_size=kernel_size, padding='same',
                                             kernel_initializer='he_normal', activation=None)
        
        self.pool_fin = tf.keras.layers.MaxPooling1D(pool_size=2, strides=2)
        
        self.conv_par = tf.keras.layers.Conv1D(filters, kernel_size=1)
        self.pool_par = tf.keras.layers.MaxPooling1D(pool_size=2, strides=2)
        
    @tf.function
    def call(self, inputs, training=None, **kwargs):
        if self.use_final_conv:
            xshort = self.conv_par(inputs, training=training)
        else:
            xshort = inputs
        
        x1 = self.batch_1(inputs, training=training)
        x1 = self.act_1(x1, training=training)
        x1 = self.drop_1(x1, training=training)
        x1 = self.conv_1(x1, training=training)
        x1 = self.batch_2(x1, training=training)
        x1 = self.act_2(x1, training=training)
        x1 = self.drop_2(x1, training=training)
        x1 = self.conv_2(x1, training=training)
        
        if self.use_final_pooling:
            x1 = self.pool_fin(x1, training=training)
            x2 = self.pool_par(xshort, training=training)
        else:
            x2 = xshort
        
        return x1 + x2

        
class ResNet1DPlus(tf.keras.models.Model):
    def __init__(self, base_filters=64, kernel_size = 16, n_class=2, inshape=(1000,12)):
        super(ResNet1DPlus, self).__init__(self)

        '-----------------------'
        self.conv_pre = tf.keras.layers.Conv1D(base_filters, kernel_size=kernel_size, padding='same', 
                                               kernel_initializer='he_normal', activation=None,
                                               input_shape=inshape)
        self.batch_pre = tf.keras.layers.BatchNormalization(axis=-1)
        self.act_pre = tf.keras.layers.ReLU()
        '-----------------------'
        self.conv_l1 = tf.keras.layers.Conv1D(base_filters, kernel_size=kernel_size, padding='same',
                                              kernel_initializer='he_normal', activation=None)
        self.batch_l1 = tf.keras.layers.BatchNormalization(axis=-1)
        self.act_l1 = tf.keras.layers.ReLU()
        self.drop_l1 = tf.keras.layers.Dropout(0.5)
        self.conv_l2 = tf.keras.layers.Conv1D(base_filters, kernel_size=kernel_size, padding='same', 
                                              kernel_initializer='he_normal', activation=None)
        self.pool_l1 = tf.keras.layers.MaxPooling1D(pool_size=2, strides=2)
        
        self.pool_r1 = tf.keras.layers.MaxPooling1D(pool_size=2, strides=2)
        '-----------------------'
        self.conv_block1 = ConvBlock(filters=base_filters * 1, kernel_size=kernel_size,
                                     use_final_conv=False, use_final_pooling=False)
        self.conv_block2 = ConvBlock(filters=base_filters * 1, kernel_size=kernel_size,
                                     use_final_conv=False, use_final_pooling=True)
        self.conv_block3 = ConvBlock(filters=base_filters * 1, kernel_size=kernel_size,
                                     use_final_conv=False, use_final_pooling=False)
        self.conv_block4 = ConvBlock(filters=base_filters * 1, kernel_size=kernel_size,
                                     use_final_conv=False, use_final_pooling=True)
        self.conv_block5 = ConvBlock(filters=base_filters * 2, kernel_size=kernel_size,
                                     use_final_conv=True, use_final_pooling=False)
        self.conv_block6 = ConvBlock(filters=base_filters * 2, kernel_size=kernel_size,
                                     use_final_conv=False, use_final_pooling=True)
        self.conv_block7 = ConvBlock(filters=base_filters * 2, kernel_size=kernel_size,
                                     use_final_conv=False, use_final_pooling=False)
        self.conv_block8 = ConvBlock(filters=base_filters * 2, kernel_size=kernel_size,
                                     use_final_conv=False, use_final_pooling=True)
        self.conv_block9 = ConvBlock(filters=base_filters * 3, kernel_size=kernel_size,
                                     use_final_conv=True, use_final_pooling=False)
        self.conv_block10 = ConvBlock(filters=base_filters * 3, kernel_size=kernel_size,
                                      use_final_conv=False, use_final_pooling=True)
        self.conv_block11 = ConvBlock(filters=base_filters * 3, kernel_size=kernel_size,
                                      use_final_conv=False, use_final_pooling=False)
        self.conv_block12 = ConvBlock(filters=base_filters * 3, kernel_size=kernel_size,
                                      use_final_conv=False, use_final_pooling=True)
        self.conv_block13 = ConvBlock(filters=base_filters * 4, kernel_size=kernel_size,
                                      use_final_conv=True, use_final_pooling=False)
        self.conv_block14 = ConvBlock(filters=base_filters * 4, kernel_size=kernel_size,
                                      use_final_conv=False, use_final_pooling=True)
        self.conv_block15 = ConvBlock(filters=base_filters * 4, kernel_size=kernel_size,
                                      use_final_conv=False, use_final_pooling=False)
        '-----------------------'
        self.batch_final = tf.keras.layers.BatchNormalization(axis=-1)
        self.act_final = tf.keras.layers.ReLU()
        self.f = tf.keras.layers.Flatten()
        self.dense_final = tf.keras.layers.Dense(n_class, activation='softmax')
        

    @tf.function()
    def call(self, inputs, training=None):
        x = self.conv_pre(inputs, training=training)
        x = self.batch_pre(x, training=training)
        x_pre = self.act_pre(x, training=training)
        
        x_branch_l = self.conv_l1(x_pre, training=training)
        x_branch_l = self.batch_l1(x_branch_l, training=training)
        x_branch_l = self.act_l1(x_branch_l, training=training)
        x_branch_l = self.drop_l1(x_branch_l, training=training)
        x_branch_l = self.conv_l2(x_branch_l, training=training)
        x_branch_l = self.pool_l1(x_branch_l, training=training)
        
        x_branch_r = self.pool_r1(x_pre, training=training)
                
        x_in = x_branch_l + x_branch_r
        
        x_conv = self.conv_block1(x_in, training=training)
        x_conv = self.conv_block2(x_conv, training=training)
        x_conv = self.conv_block3(x_conv, training=training)
        x_conv = self.conv_block4(x_conv, training=training)
        x_conv = self.conv_block5(x_conv, training=training)
        x_conv = self.conv_block6(x_conv, training=training)
        x_conv = self.conv_block7(x_conv, training=training)
        x_conv = self.conv_block8(x_conv, training=training)
        x_conv = self.conv_block9(x_conv, training=training)
        x_conv = self.conv_block10(x_conv, training=training)
        x_conv = self.conv_block11(x_conv, training=training)
        x_conv = self.conv_block12(x_conv, training=training)
        x_conv = self.conv_block13(x_conv, training=training)
        x_conv = self.conv_block14(x_conv, training=training)
        x_conv = self.conv_block15(x_conv, training=training)
        
        x_final = self.batch_final(x_conv, training=training)
        x_final = self.act_final(x_final, training=training)
        x_final = self.f(x_final)
        x_final = self.dense_final(x_final, training=training)
        
        return x_final


In [None]:
cnn_deeper = ResNet1DPlus()

cnn_deeper.compile(optimizer=tf.keras.optimizers.Adam(0.1*lr),
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy',])

history = cnn_deeper.fit(x=train_ds, validation_data=val_ds,
                             batch_size=batch_size, epochs=2*epochs, verbose=2, 
                             callbacks=[tf.keras.callbacks.TensorBoard(log_dir='./tf_logs/{}_convs_CNN'.format(2),
                                                                       histogram_freq=1, update_freq='batch')],
                             use_multiprocessing=True)

In [None]:
def get_model(inshape, base_filters, dense_filters, n_convs = 3, n_dense = 2, ):

    model = tf.keras.Sequential()
    m = 1
    for i in range(n_convs):
        if i == 0:
            model.add(tf.keras.layers.Conv1D(base_filters * m, 3, padding='same',
                                             activation='relu', input_shape=inshape))
        else:
            model.add(tf.keras.layers.Conv1D(base_filters * m, 3, padding='same',
                                             activation='relu'))
        model.add(tf.keras.layers.BatchNormalization(axis=-1))
        model.add(tf.keras.layers.MaxPooling1D())
        m += m
    model.add(tf.keras.layers.Flatten())
    if n_dense > 1:
        m = 2 * (n_dense - 1)
    else:
        m = 1
    for i in range(n_dense):
        model.add(tf.keras.layers.Dense(dense_filters * m, activation='relu'))
        m = m // 2
    model.add(tf.keras.layers.Dense(2))

    model.summary()

    return model

#model = get_model((1000, 12), 128, 8, 1, 1)

opt = tf.keras.optimizers.Adam(learning_rate=initial_lr)

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

metric_fn = tf.keras.metrics.CategoricalAccuracy()

In [None]:
HP_EPOCHS = hp.HParam('epochs', hp.Discrete([200]))
HP_BATCH = hp.HParam('batch', hp.Discrete([4]))
HP_LR = hp.HParam('initial_lr', hp.Discrete([1e-6]))
HP_FILTERS = hp.HParam('base_filters', hp.Discrete([64]))
HP_DENSEFILTERS = hp.HParam('dense_filters', hp.Discrete([16]))
HP_N_CONV = hp.HParam('n_conv_filters', hp.Discrete([2]))
HP_N_DENSE = hp.HParam('n_dense_filters', hp.Discrete([3]))

session_num = 0
run_name = 'run'

def hyper_parameters_config():
    for e in HP_EPOCHS.domain.values:
        for lr in HP_BATCH.domain.values:
            for a in HP_LR.domain.values:
                for bf in HP_FILTERS.domain.values:
                    for d in HP_DENSEFILTERS.domain.values:
                        for nc in HP_N_CONV.domain.values:
                            for nd in HP_N_DENSE.domain.values:
                                yield {
                                    HP_EPOCHS: e,
                                    HP_BATCH: lr,
                                    HP_LR: a,
                                    HP_FILTERS: bf,
                                    HP_DENSEFILTERS: d,
                                    HP_N_CONV: nc,
                                    HP_N_DENSE: nd,
                                }

for hparams in hyper_parameters_config():
    run_name = 'run-{}'.format(session_num)
    print('\n\n--- Starting trial: {}'.format(run_name))
    print({h.name: hparams[h] for h in hparams})

    run_logdir = 'tf_logs/hparam_tuning/' + run_name
    summary_writer = tf.summary.create_file_writer(run_logdir)

    with summary_writer.as_default():
        hp.hparams(hparams)
        # model = get_model((1000, 12), hparams[HP_FILTERS], hparams[HP_DENSEFILTERS], hparams[HP_N_CONV], hparams[HP_N_DENSE])
        
        # model = CNN1D()
        model = CNN1DLSTM(hparams[HP_FILTERS], hparams[HP_DENSEFILTERS])
        # model = ResNet1D()
        # model = ResNet1DPlus(hparams[HP_FILTERS], hparams[HP_DENSEFILTERS])
        model.compile(optimizer=tf.keras.optimizers.Adam(),
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy',],)

        model.fit(x=train_ds, validation_data=val_ds,
                  batch_size=batch_size, epochs=hparams[HP_EPOCHS], verbose=2, 
                  callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=200, verbose=1),
                             tf.keras.callbacks.TensorBoard(log_dir=run_logdir,
                                                            histogram_freq=1, update_freq='batch')],
                  use_multiprocessing=True)
        
        model.summary()


    session_num += 1