# Clustering segments of ECG signals using Wavelet Tranform

This notebook includes a clustering of segments of ECG signals into normal and anomolous based on the features extracted after a wavelet transform of the signals is performed.

#### Outline of the approach
-  The signals are first preprocessed by applying a low bandpass filter and removing the first and last few segemnts
- Wavelet Decomposition of the signal is performed.
- The signal is then broken into segments of a fixed length and time domain features are extracted from the signal
- DBSCAN is used to cluster each segment and the anomolous segments are labelled on their position in the orignal signal.
- The results are computed per patient and stored in anomalies_clustering/Rhythm

In [13]:
import json
import pywt
import os

from Modules.Rhythm_classification import *

In [2]:
import warnings

warnings.filterwarnings("ignore")

#### 1. Read Preprocessed Data
The preprocessed data is saved in Preprocessed_data/ecg. We will remove the first and last few seconds from each patients signals as at the very beginning or the end, the device is being attached? removed from the patient hence leading to a high variance in signals

In [3]:
def read_csv(filepath):
    f = open(filepath)
    data = json.load(f)
    return data

ecg_filt = read_csv('Preprocessed_data/ecg/ecg_filtered.txt')
#remove first and last few minutes
for patient in ecg_filt:
    ecg_filt[patient] = ecg_filt[patient][10000:600000]

#### 2. Wavelet Tranform the signals
The wavedec function from the pywt library is used and this function returns the Low level (cA) and high level (cD) information in the wavelets of the signal. The low level information will be used because it simplifies the signal making features easier to extract

In [4]:
compressed_ecg = {}

for patient in ecg_filt:
    (cA, cD1) = pywt.wavedec(ecg_filt[patient], 'sym2', level=1)
    compressed_ecg[patient] = cA

#### 3. Split signal into segments

The signal is broken down into segments of 1 seconds. Features are computed on these segments and then these features are clustered.
For plotting, 15 such consecutive segments are plotted together.

In [5]:
duration = 62 * 2
segments_ecg = {}

for patient in compressed_ecg:
    values = compressed_ecg[patient]
    segments_ecg[patient] = []
    for i in range(0,len(values) - duration + 1, duration):
        segments_ecg[patient].append(values[i : (i + duration)])

#### 4.  Clustering Features using DBSCAN

##### Find Features
First we will manually extract some features from the time domain. The implementation of this can be found in Modules.Rhythm_classification as get_patient_features

Next, the features are then clustered using the DBSCAN Algorithm to find anomolous heartbeats.
The DBSCAN algorithm has two parameters:
- minPts: The minimum number of points (a threshold) clustered together for a region to be considered dense.
- eps (ε): A distance measure that will be used to locate the points in the neighborhood of any point
We have set minPts = 1 and eps = 30
The implementation of this can be found in Modules.Rhythm_classification as cluster

We will extracted the first two principle components of the features and plotted them to visualize the clustering. The implementation of this can be found in Modules.Rhythm_classification as pca_plots

Finally, we mark the anomalous segments at their original location on the signals. The implementation of this can be found in Modules.Rhythm_classification as plot_anomalous_segments


In [12]:
for patient in segments_ecg:
    os.makedirs('anomalies_clustering/Rhythm/'+ patient, exist_ok=True)
    features_only, ecg_signal_processed = get_patient_features(segments_ecg[patient], patient)
    clusters, y_pred, anomoly_indices = cluster(features_only)
    pca_plots(patient, features_only, y_pred)
    plot_anomalous_segments(ecg_signal_processed, anomoly_indices, patient)