# Evaluation on an external dataset

In order to evaluate PeakSwift against NeuroKit, an external dataset is required that covers a wide range of individuals and conditions. Therefore, a dataset described as "a 12-lead electrocardiogram database for arrhythmia research with more than 10,000 patients" is used. As we are interested in single lead ECG signals, the script downloads the dataset, extracts the first lead of the ECGs and saves it in a suitable .csv format so that it can be fed into PeakWatch for peak detection and subsequent analysis.

In [29]:
import requests
import zipfile
import glob
from tqdm import tqdm
import numpy as np
import json

In [2]:
URL = "https://figshare.com/ndownloader/files/15651326"
response = requests.get(URL)

In [40]:
sampling_rate_dataset = 512.0

In [3]:
open("data.zip", "wb").write(response.content)

754581129

In [4]:
with zipfile.ZipFile('./data.zip', 'r') as zip_ref:
    zip_ref.extractall('./')

In [5]:
files = glob.glob("./ECGData/*.csv")
M = np.zeros((len(files), 5000))

In [15]:
i = 0
for k in tqdm(files, total=len(files)):
    extracted = np.genfromtxt(k, delimiter=",", skip_header=True)[:, 0]
    M[i, :] = extracted
    i = i+1

100%|█████████████████████████████████████| 10646/10646 [02:54<00:00, 61.01it/s]


In [47]:
jsonResult = []
i = 0
for k in tqdm(M, total=len(M)):
    extracted = M[i,:]
    jsonResult.append({"ecg": extracted.tolist(), "samplingRate": sampling_rate_dataset})
    i = i+1
jsonResult = {"dataset": jsonResult}

100%|███████████████████████████████████| 10646/10646 [00:01<00:00, 8016.67it/s]


In [None]:
jsonText = json.dumps(jsonResult)

In [43]:
f = open(f"./ecg_data_arrhythmia.json", "w")
f.write(jsonText)
f.close()