# ECG classification

Laurent Cetinsoy - Datadidacte


In [1]:
from IPython.display import HTML

## A first naive model by extracting simple features


Your environment contains variables arr, nsr, and chf which respectively contain 10-second recordings of ECG signals extracted from three datasets on PhysioNet: one from a person suffering from arrhythmia, one from a person with a normal heart rhythm, and another from a person with heart failure.


Matplotlib  subplots (or any other library), display these signals on three subfigures (the subplots should be called with the parameter nrows = 3).
Can you find any differences between them?

In [2]:
import numpy as np

arr = np.loadtxt('arr.txt')
chf = np.loadtxt('chf.txt')
nsr = np.loadtxt('nsr.txt')



We want to extract features from the time series. For that we will use simple statistics.


Create a function named calculate_stats_features(x) that calculates some statistical features of a signal x using standard numpy functions: nanpercentile, nanmean, etc.
calculate_stats_features will return a list of features in this order:

0. Max
1. Min
2. Mean
3. Median
4. Variance

In [5]:
def calculate_stats_features(x):
    features = []
    features.append(np.max(x))
    features.append(np.min(x))
    features.append(np.mean(x))
    features.append(np.median(x))
    features.append(np.var(x))
    return features

for x in [arr, chf, nsr]:
    print(calculate_stats_features(x))

[np.float64(1.375), np.float64(-0.59), np.float64(-0.3120111111111112), np.float64(-0.335), np.float64(0.039663552654320984)]
[np.float64(1.235), np.float64(-1.79), np.float64(-0.363622), np.float64(-0.375), np.float64(0.15541165111599997)]
[np.float64(2.965), np.float64(-0.785), np.float64(-0.035453124999999995), np.float64(-0.145), np.float64(0.21755463842773434)]




Create a function named `calculate_zero_crossing(x)` that calculates the Zero
Crossing of a signal x.

The zero crossing is defined as the number of times the signal changes sign.
For this, you can use the signbit, diff, and nonzero functions from numpy.


In [9]:
def calculate_zero_crossing(x):
    y = x[np.nonzero(x)]
    return len(np.where(np.diff(np.sign(y)))[0])

x = np.array([1, 2, 3, -1, -2, -3, 0, 0, 0, 0, -1])
calculate_zero_crossing(arr)

22

Create a function named **calculate_rms(x)** that returns the Root Mean Square (RMS) of a signal x. We will use the nanmean function instead of the mean function from numpy.

In [11]:
def calculate_rms(x):
    return np.sqrt(np.mean(np.square(x)))

print(calculate_rms(arr))

0.37015467862923346


Create a function named calculate_entropy(x) that calculates the Shannon entropy of a signal x using the entropy function from scipy.stats.

In [10]:
from scipy.stats import entropy

def calculate_entropy(x):
    _,counts = np.unique(x, return_counts=True)
    return entropy(counts)

print(calculate_entropy(nsr))

3.9597588364931857


Create a function get_features(x) that combines the features calculated by all previous functions including caculate_stats_features.

In [12]:
def get_features(x):
    features = []
    features.extend(calculate_stats_features(x))
    features.append(calculate_zero_crossing(x))
    features.append(calculate_rms(x))
    features.append(calculate_entropy(x))
    return features

print(get_features(arr))

[np.float64(1.375), np.float64(-0.59), np.float64(-0.3120111111111112), np.float64(-0.335), np.float64(0.039663552654320984), 22, np.float64(0.37015467862923346), np.float64(4.444550643807692)]


Load the small ecg dataset
Use your fonction get_features create a new dataframe where you have all the feature as X and y as the label.
Train a random forest on it after doing a train test split if the dataset is not too small

In [54]:
import pandas as pd

def get_dataset(pathname):
    df = pd.read_csv(pathname)
    df.drop(df.columns[0],axis=1, inplace=True)
    X = df.drop(df.columns[0],axis=1)
    y = df[df.columns[0]]
    X = list(map(lambda x: get_features(x), np.array(X)))
    return X, y

X, y = get_dataset('ecg_small_dataset.csv')

In [55]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = RandomForestClassifier(n_estimators=10)
clf = clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))

0.0


Now you have a first pipeline, do the same on the full dataset
Report the train and test loss

In [56]:
X, y = get_dataset('ECG-laurent.csv')

  df = pd.read_csv(pathname)


In [59]:
y.value_counts()

1
1    96
0    36
2    30
Name: count, dtype: int64

In [57]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = RandomForestClassifier()
clf = clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))

0.9090909090909091


In [58]:
from sklearn.metrics import classification_report

y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00         8
           1       0.86      1.00      0.93        19
           2       1.00      0.50      0.67         6

    accuracy                           0.91        33
   macro avg       0.95      0.83      0.86        33
weighted avg       0.92      0.91      0.90        33



try to tweak the model hyperparameter to see if it works

In [67]:
from sklearn.model_selection import GridSearchCV

param_grid = {'n_estimators': [50, 100, 200], 
         'max_depth': [ 10, 50, 100],
         'max_features': ['sqrt', 'log2'],
         'class_weight': ['balanced', None]}

grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)
print(grid_search.best_params_)

KeyboardInterrupt: 

## Fourier transform features

We want now to see if a model using only fourier transform could work.

create a function get_fourier_coefficients(ecg)

In [None]:
def get_fourrier_coefficients(x):
    fourier_coeff = np.fft.fft(x)
    return np.real(fourier_coeff)

Using this function create a dataframe df_fourrier containing the fourrier transform coefficients and the label

In [None]:
def 

Try to train a model using the Fourrier coefficient

Try to learn a model using both fourrier coefficient and the features from the previous sections. Does it work ?

## Wavelets

We now wants to use another signal decomposition which are called wavelet. Wavelet are a multi-scale function decomposition on a familly of functions generated from what is called a mother wavelet.

Using PyWavelet make a function get_wavelet_coefficients(ecg) that returns the wavelet coefficient of a given ECG


Using the get_wavelet_coefficients, create a dataframe when the features are the coefficients and include the label

Train a random forest classifier with such features. DOes the model work

Add one or several of the previous feature functions and try to train another model

Specify the methodology you used to train the model and report the various attempts results into a table

## Deep learning (1D CNN)

Now we want to see if we can skip all theses feature engineering techniques !
Design and train a multi-layer one dimensional CNN using the raw ECG signal as features.


Could you reach or surpass the feature based models ?