# Using a Pre-trained ECG Classifier

This notebook implements the pre-trained ECG classifier described in this [Nature Communications](https://www.nature.com/articles/s41467-020-15432-4). The code for this is from my [cloned version](https://github.com/chapmanbe/automatic-ecg-diagnosis) of the original GitHub repository.

The pretrained models were obtained from this [Dropbox link](https://www.dropbox.com/s/5ar6j8u9v9a0rmh/model.zip?dl=0). Downloading from zenodo was too slow.



In [None]:
import h5py
import math
import pandas as pd
from tensorflow.keras.utils import Sequence
import numpy as np
from tensorflow.keras.models import load_model
from tensorflow.keras.optimizers import Adam
from datasets import ECGSequence

## Data are stored in an HDF5 file

- There is a single record named `tracings`
- Model predicts 6 diseases, not mutually exclusive

### Description of tracings data from the GitHub repository

>shape = (N, 4096, 12). The input tensor should contain the 4096 points of the ECG tracings sampled at 400Hz (i.e., a signal of approximately 10 seconds). Both in the training and in the test set, when the signal was not long enough, we filled the signal with zeros, so 4096 points were attained. The last dimension of the tensor contains points of the 12 different leads. The leads are ordered in the following order: {DI, DII, DIII, AVR, AVL, AVF, V1, V2, V3, V4, V5, V6}. All signal are represented as 32 bits floating point numbers at the scale 1e-4V: so if the signal is in V it should be multiplied by 1000 before feeding it to the neural network model.



In [None]:
tdata = h5py.File("./data/test_data/ecg_tracings.hdf5", "r")

In [None]:
type(tdata['tracings'])

In [None]:
tdata['tracings'].shape

## There are a variety of annotations available in `data/annotations`

- Using `gold_standard.csv`

In [None]:
annotations = pd.read_csv("data/annotations/gold_standard.csv")

In [None]:
annotations

In [None]:
def report(gld, rslt):
    r = ""
    for x in zip(gld.items(), rslt):
        r = r+f"({x[0][0].ljust(5)}, {x[0][1]})={int(100*x[1]):3d}%\n"
    return r
        

## The model assumes a 3D tensor

- So have to be a little more verbose in pulling out a single sequence to preserve that 3D shape

In [None]:
seq0 = tdata['tracings'][0:1,:,:]

In [None]:
seq0.shape

## This is the default model from the paper

In [None]:
 # Import model
model = load_model("/Users/brian/Dropbox/model/model.hdf5", compile=False)
model.compile(loss='binary_crossentropy', optimizer=Adam())

In [None]:
for i in range(0,827):
    print(f"CASE: {i:3d}")
    seqi = seq0 = tdata['tracings'][i:i+1,:,:]
    y_score = model.predict(seq0,  verbose=0)
    print(report(annotations.loc[i,:], y_score[0]))
    print("-"*42)
    