# Loading the data and exploring its shape and values

This notebook is Part 2 of my analysis of the ECG Hearbeat dataset.  In this version I'll be focusing on building Deep Learning models compared to the original version which I tried to use "standard" machine leraning models to establish a baseline for whether its worth it to use Deep Learning or not.

The baseline version can be found [here](https://www.kaggle.com/basharalkuwaiti/ecg-heartbeat-categorization-baseline)

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from sklearn.utils import resample

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/heartbeat/ptbdb_abnormal.csv
/kaggle/input/heartbeat/ptbdb_normal.csv
/kaggle/input/heartbeat/mitbih_test.csv
/kaggle/input/heartbeat/mitbih_train.csv


In [None]:
mit_test = pd.read_csv('/kaggle/input/heartbeat/mitbih_test.csv',header=None)
mit_train = pd.read_csv('/kaggle/input/heartbeat/mitbih_train.csv', header=None)
ptb_abnormal = pd.read_csv('/kaggle/input/heartbeat/ptbdb_abnormal.csv', header=None)
ptb_normal = pd.read_csv('/kaggle/input/heartbeat/ptbdb_normal.csv', header=None)

In [None]:
mit_test.head()

In [None]:
mit_train.head()

In [None]:
ptb_abnormal.head()

In [None]:
ptb_normal.head()

In [None]:
mit_test.rename(columns={187:"Class"}, inplace=True)
mit_train.rename(columns={187:"Class"}, inplace=True)
ptb_abnormal.rename(columns={187:"Class"}, inplace=True)
ptb_normal.rename(columns={187:"Class"}, inplace=True)

Looking at how many classes are there in each dataset
The MIT dataset has 5 clases:
* 0 = N  (Normal Beat)
* 1 = S  (Supraventricular premature beat)
* 2 = V  (Premature ventricular contraction)
* 3 = F  (Fusion of ventricular and normal beat)
* 4 = Q  (Unclassifiable beat)

Compared to the PTB dataset which is 1 for abnormal and 0 for normal


In [None]:
print ("MIT Train classes: \n", mit_train["Class"].value_counts())
print ("\nMIT Test classes: \n", mit_test["Class"].value_counts())
print ("\nPTB Abnormal classes: \n", ptb_abnormal["Class"].value_counts())
print ("\nPTB Normal classes: \n", ptb_normal["Class"].value_counts())

In [None]:
# Setting Dictionary to define the type of Heartbeat for both datasets
MIT_Outcome = {0. : 'Normal Beat',
               1. : 'Supraventricular premature beat',
               2. : 'Premature ventricular contraction',
               3. : 'Fusion of ventricular and normal beat',
               4. : 'Unclassifiable beat'}
PTB_Outcome = {0. : 'Normal',
               1. : 'Abnormal'}

# Generating Plots of some of the samples in the dataset

In [None]:
#Plotting 10 random samples from the MIT training dataset with their classification
plt.figure(figsize=(25,10))
np_count = np.linspace(0,186,187)
np_time = np.tile(np_count,(10,1))
rnd = np.random.randint(0,mit_train.shape[0],size=(10,))


for i in range(np_time.shape[0]):
    ax = plt.subplot(2,5,i+1)
    ax.plot(mit_train.iloc[rnd[i],np_time[i,:]])
    ax.set_title(MIT_Outcome[mit_train.loc[rnd[i],'Class']])

plt.show()


In [None]:
#Plotting 10 random samples from the PTB training dataset with their classification
plt.figure(figsize=(25,10))
rnd = np.random.randint(0,ptb_normal.shape[0],size=(5,))
rnd1 = np.random.randint(0,ptb_abnormal.shape[0], size=(5,))

for i in range(np_time.shape[0]):
    ax = plt.subplot(2,5,i+1)
    if (i < 5):
        ax.plot(ptb_normal.iloc[rnd[i],np_time[i,:]])
        ax.set_title(PTB_Outcome[ptb_normal.loc[rnd[i],'Class']])
    else:
        ax.plot(ptb_abnormal.iloc[rnd1[i-5],np_time[i,:]])
        ax.set_title(PTB_Outcome[ptb_abnormal.loc[rnd1[i-5],'Class']])

plt.show()

# Deep Learning Analysis

This is the where the notebooks are different.  Tha analysis above is similar to the [Baseline](https://www.kaggle.com/basharalkuwaiti/ecg-heartbeat-categorization-baseline) version

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import normalize
from sklearn.metrics import classification_report

import tensorflow as tf
import tensorflow_addons as tfa
from tensorflow import keras
from tensorflow.keras.layers import Dense, Conv1D, MaxPool1D, Flatten, Dropout, InputLayer, LSTM, GRU, BatchNormalization, Bidirectional
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.optimizers import SGD

In [None]:
#Preparing the training, validation and test sets for the PTB Data set
#ptb_abnormal = resample(ptb_abnormal,replace=True,n_samples=ptb_normal.shape[0],random_state=42)
ptb_full = pd.concat([ptb_normal, ptb_abnormal], axis=0).reset_index()
ptb_full.drop(columns='index', inplace=True)
ptb_full = ptb_full.sample(ptb_full.shape[0], random_state=42)
train_ptb, test_ptb, out_train_ptb, out_test_ptb = train_test_split(ptb_full.iloc[:,:187], ptb_full.iloc[:,-1], test_size=0.15, random_state=42)
train_ptb, valid_ptb, out_train_ptb, out_valid_ptb = train_test_split(train_ptb, out_train_ptb, test_size=0.2, random_state=42 )

In [None]:
plt.figure(figsize=(25,10))
rnd = np.random.randint(0,train_ptb.shape[0],size=(10,))

for i in range(np_time.shape[0]):
    ax = plt.subplot(2,5,i+1)
    ax.plot(train_ptb.iloc[rnd[i],np_time[i,:]])
    ax.set_title(PTB_Outcome[out_train_ptb.iloc[rnd[i]]])

plt.show()

In [None]:
normal, abnormal = np.bincount(ptb_full.loc[:,'Class'])
norm_weight = (1/normal) * ((normal+abnormal)/2)
abnorm_weight = (1/abnormal) * ((normal+abnormal)/2)
class_weight = {0: norm_weight, 1: abnorm_weight}

In [None]:
print("Traing dataset size: ", train_ptb.shape)
print("Validation dataset size: ", valid_ptb.shape)
print("Test dataset size: ", test_ptb.shape)

In [None]:
#Normalizing the training & test data 
train_ptb = normalize(train_ptb, axis=0, norm='max')
valid_ptb = normalize(valid_ptb, axis=0, norm='max')
test_ptb = normalize(test_ptb, axis=0, norm='max')

In [None]:
max_length = 15
x_train_ptb = train_ptb.reshape(len(train_ptb),train_ptb.shape[1],1)
x_valid_ptb = valid_ptb.reshape(len(valid_ptb),valid_ptb.shape[1],1)
x_test_ptb = test_ptb.reshape(len(test_ptb),test_ptb.shape[1],1)
out_train_ptb = out_train_ptb.values.reshape(len(out_train_ptb), 1)
out_valid_ptb = out_valid_ptb.values.reshape(len(out_valid_ptb), 1)
out_test_ptb = out_test_ptb.values.reshape(len(out_test_ptb), 1)

In [None]:
x_train_ptb.shape

In [None]:
plt.figure(figsize=(25,10))
rnd = np.random.randint(0,x_train_ptb.shape[0],size=(10,))

for i in range(np_time.shape[0]):
    ax = plt.subplot(2,5,i+1)
    ax.plot(np_time[i,:], x_train_ptb[rnd[i],:,0])
    ax.set_title(PTB_Outcome[out_train_ptb[rnd[i],0]])

plt.show()

In [None]:
print("Traing dataset size: ", x_train_ptb.shape , " -- Y size: ", out_train_ptb.shape)
print("Validation dataset size: ", x_valid_ptb.shape , " -- Y size: ", out_valid_ptb.shape)
print("Test dataset size: ", x_test_ptb.shape , " -- Y size: ", out_test_ptb.shape)

In [None]:
tf.keras.backend.clear_session()

#Function to build Convolutional 1D Networks
def build_conv1d_model (input_shape=(x_train_ptb.shape[1],1)):
    model = keras.models.Sequential()
    model.add(InputLayer(input_shape=input_shape))
    
    model.add(Conv1D(256,7, padding='same'))
    model.add(BatchNormalization())
    model.add(tf.keras.layers.ReLU())
    model.add(MaxPool1D(5,padding='same'))

    model.add(Conv1D(128,7, padding='same'))
    model.add(BatchNormalization())
    model.add(tf.keras.layers.ReLU())
    model.add(MaxPool1D(5,padding='same'))

 #   model.add(Conv1D(64,7, padding='same'))
 #   model.add(BatchNormalization())
 #   model.add(tf.keras.layers.ReLU())
 #   model.add(MaxPool1D(5,padding='same'))

    model.add(Flatten())
 #   model.add(Dense(512, activation='relu'))
 #   model.add(Dropout(0.5))
 #   model.add(Dense(256, activation='relu'))
 #   model.add(Dropout(0.5))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(1, activation="sigmoid"))
    model.compile(optimizer=SGD(lr=0.1e-6), loss="binary_crossentropy", metrics=[tfa.metrics.F1Score(2,"micro")])
    return model

In [None]:
checkpoint_cb = ModelCheckpoint("conv1d_ptb.h5", save_best_only=True)

earlystop_cb = EarlyStopping(patience=5, restore_best_weights=True)

model_conv1d_ptb= build_conv1d_model(input_shape=(x_train_ptb.shape[1], x_train_ptb.shape[2]))
model_conv1d_ptb.summary()

In [None]:
history_conv1d_ptb = model_conv1d_ptb.fit(x_train_ptb, out_train_ptb, epochs=40, batch_size=32, 
                                          class_weight=class_weight, validation_data=(x_valid_ptb, out_valid_ptb),  
                                          callbacks=[checkpoint_cb, earlystop_cb])

In [None]:
model_conv1d_ptb.evaluate(x_test_ptb,out_test_ptb)

In [None]:
conv1d_pred_ptb = model_conv1d_ptb.predict (x_test_ptb)

In [None]:
print(classification_report(out_test_ptb, conv1d_pred_ptb > 0.5, target_names=[PTB_Outcome[i] for i in PTB_Outcome]))

In [None]:
m = tf.keras.metrics.binary_accuracy(out_test_ptb, conv1d_pred_ptb).numpy()
print("Binaary Accuracy:  ", m.sum()/len(m))

In [None]:
# Use a log scale on y-axis to show the wide range of values.
plt.figure(figsize=(25,12))
plt.plot(history_conv1d_ptb.epoch, history_conv1d_ptb.history['loss'],
           color='r', label='Train loss')
plt.plot(history_conv1d_ptb.epoch, history_conv1d_ptb.history['val_loss'],
           color='b', label='Val loss' , linestyle="--")
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.plot(history_conv1d_ptb.epoch, history_conv1d_ptb.history['f1_score'],
           color='g', label='Train F1')
plt.plot(history_conv1d_ptb.epoch, history_conv1d_ptb.history['val_f1_score'],
           color='c', label='Val F1' , linestyle="--")
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [26]:
def build_LSTM_model (n_hidden=1, n_neurons=512, dropout=0.5, input_shape=(x_train_ptb.shape[1],1)):
    orig_neurons = n_neurons
    model = keras.models.Sequential()
    model.add(InputLayer(input_shape=input_shape))
    
    model.add(LSTM(128, return_sequences=True, dropout=dropout, recurrent_dropout = dropout))
    model.add(LSTM(128, return_sequences=True, dropout=dropout, recurrent_dropout = dropout))
    model.add(LSTM(128, dropout=dropout, recurrent_dropout=dropout))
    
    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(dropout))
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(dropout))
    
    model.add(Dense(1, activation="sigmoid"))
    model.compile(optimizer="adam", loss="binary_crossentropy", metrics=tfa.metrics.F1Score(2,"micro"))
    return model

In [27]:
checkpoint_cb = ModelCheckpoint("lstm_ptb.h5", save_best_only=True)

earlystop_cb = EarlyStopping(patience=5, restore_best_weights=True)

model_lstm_ptb = build_LSTM_model(n_neurons = 128, n_hidden=2, dropout=0.2, input_shape=(x_train_ptb.shape[1], x_train_ptb.shape[2]))
model_lstm_ptb.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 187, 128)          66560     
_________________________________________________________________
lstm_1 (LSTM)                (None, 187, 128)          131584    
_________________________________________________________________
lstm_2 (LSTM)                (None, 128)               131584    
_________________________________________________________________
flatten_1 (Flatten)          (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 512)               66048     
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 512)              

In [28]:
history = model_lstm_ptb.fit(x_train_ptb, out_train_ptb, epochs=40, batch_size=32, 
                             class_weight=class_weight, validation_data=(x_valid_ptb, out_valid_ptb),  
                             callbacks=[checkpoint_cb, earlystop_cb])

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40


In [29]:
model_lstm_ptb.evaluate(x_test_ptb,out_test_ptb)



[0.6720266938209534, 0.8397554755210876]

In [30]:
LSTM_pred_ptb = model_lstm_ptb.predict (x_test_ptb)
LSTM_pred_ptb = np.rint(LSTM_pred_ptb.reshape(len(LSTM_pred_ptb)))

print(classification_report(out_test_ptb, LSTM_pred_ptb, target_names=[PTB_Outcome[i] for i in PTB_Outcome]))

              precision    recall  f1-score   support

      Normal       0.00      0.00      0.00       603
    Abnormal       0.72      1.00      0.84      1580

    accuracy                           0.72      2183
   macro avg       0.36      0.50      0.42      2183
weighted avg       0.52      0.72      0.61      2183



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
