# Clustering segments of ECG signals

This notebook includes a clustering of segments of ECG signals into normal and anomolous based on the manual features extracted from those segments.

#### Outline of the approach
-  The signals are first preprocessed by applying a low bandpass filter and removing the first and last few segemnts
- Manual feature are calculated on the signals (P, Q, R, S, T points on the wave)
- These features are then used to separate heartbeats from the complete signal.
- DBSCAN is used to cluster each heartbeat and the anomolous heartbeats are labelled on their position in the orignal signal.
- The results are computed per patient and stored in anomalies_clustering/HeartBeats

In [2]:
import matplotlib.pyplot as plt
from sklearn.cluster import DBSCAN
import json
import os

from Modules.HeartBeat_Extraction import *
from Modules.WVT_features import *

In [3]:
import warnings

warnings.filterwarnings("ignore")

#### 1. Read Preprocessed Data
The preprocessed data is saved in Preprocessed_data/ecg. We will remove the first and last few seconds from each patients signals as at the very beginning or the end, the device is being attached? removed from the patient hence leading to a high variance in signals

In [4]:
def read_csv(filepath):
    f = open(filepath)
    data = json.load(f)
    return data

ecg_filt = read_csv('Preprocessed_data/ecg/ecg_filtered.txt')

#remove first and last few minutes
for patient in ecg_filt:
    ecg_filt[patient] = ecg_filt[patient][100000:400000]

#### 2. Extract Heartbeats
Features are computed on the signals and using these features, heartbeats are isloated from the signals. The features include the location and amplitude of the P, Q, R, S and T points

In [None]:
#extarct heartbeats
epochs = {}
for patient in ecg_filt:
    epochs[patient] = extract_heartbeats(ecg_filt[patient])

#Merge heartbeats from all patients into a single dataframe
df = HeartBeat_to_df(epochs, 10000)

#### 3. Clustering using DBSCAN
##### Here we define a function to cluster heartbeats

The heartbeats dataframe encodes all the temporal information in each heartbeat of a segment. The heart beats are then clustered using the DBSCAN Algorithm to find anomolous heartbeats.
The DBSCAN algorithm has two parameters:
- minPts: The minimum number of points (a threshold) clustered together for a region to be considered dense.
- eps (ε): A distance measure that will be used to locate the points in the neighborhood of any point

We have set minPts = 100 and eps = 0.3

In [15]:
eps = 0.3
min_samples = 100

clustering = DBSCAN(eps=eps, min_samples=min_samples).fit(df)
DBSCAN_dataset = df.copy()
DBSCAN_dataset.loc[:,'Cluster'] = clustering.labels_

clusters = DBSCAN_dataset.Cluster.value_counts().to_frame()
clusters

Unnamed: 0,Cluster
0,9061
-1,399
1,297
2,243


In [16]:
def make_clusters(df):
    y_pred = clustering.fit_predict(df)
    anomoly_indices = np.where(y_pred == -1)[0]
    normal_indices = np.where(y_pred == 0)[0]

    normal_df = df.iloc[normal_indices]
    anomaly_df = df.iloc[anomoly_indices]

    return clusters, normal_df, anomaly_df, anomoly_indices

#### 4. Plot Clusters
##### Here we define a function to plot all heartbeats categorized as normal and all heartbeats categorized as anomolous for a single patient

Results are saved in anomalies_clustering/HeartBeats/Normal/ and anomalies_clustering/HeartBeats/Abnormal/

In [53]:
def plot_clusters(normal_df, anomaly_df, patient):
    fig = plt.figure()
    plt.title("Normal Clusters for pateint "+ str(patient))
    plt.plot(normal_df.T)
    plt.savefig('anomalies_clustering/HeartBeats/Normal/'+patient+'.png')
    plt.close(fig)

    fig = plt.figure()
    plt.title("Abnormal Clusters for pateint "+ str(patient))
    plt.plot(anomaly_df.T)
    plt.savefig('anomalies_clustering/HeartBeats/Abnormal/'+patient+'.png')
    plt.close(fig)

#### 5. Plot Anomolous segments
##### Here we define a function to plot the anomolous segments at the orignal positions in the signals

The individual heartbeats are then reconstructed back to segments of 20 seconds, with the anolomous part of the signals highlighted in red. Results are saved in anomalies_clustering/HeartBeats/

In [17]:
def plot_anomolous_segments(df, patient, anomoly_indices):
    os.makedirs('anomalies_clustering/HeartBeats/'+ patient, exist_ok=True)
    print(patient)
    for patient_sample in range(0, int(len(df)/20), 20):
        fig = plt.figure(figsize=(18,8))
        arr =[]
        col = []
        length = len(df.iloc[0].values)
        for i in range(patient_sample, patient_sample+20):
            arr.append(df.iloc[i].values)

            if i in anomoly_indices:
                col.append('r')
            else: col.append('b')

        arr = list(np.concatenate(arr).flat)
        x = np.arange(0,len(arr))

        plt.plot(x, arr)

        start = 0
        end  = 0

        for i in range(len(col)):
            if col[i] == 'r':
                start = i*length
                end =start + length
                x = np.arange(start,end)
                plt.plot(x, arr[start:end], c='red')

        plt.savefig('anomalies_clustering/HeartBeats/'+str(patient)+'/'+str(patient_sample)+'.png')
        plt.close(fig)


##### Now we will call all the defined functions for each patient

In [None]:
for patient in ecg_filt:

    df = pd.DataFrame()
    for key in epochs[patient].keys():
        df[key] = epochs[patient][key]["Signal"]
    df = df.T

    clusters, normal_df, anomaly_df, anomoly_indices = make_clusters(df)
    plot_clusters(normal_df, anomaly_df, patient)
    plot_anomolous_segments(df, patient, anomoly_indices)