# ECG machines are confused by athlete hearts

> Misclassification of athlete ECG by GE Marquette SL12 algorithm

This report will demonstrate that an existing that Marquette SL12 from General 
Electric, a clinical decision support software for diagnosis of cardiac health 
from electrocardiogram (ECG), **consistently misdiagnoses elite athletes**.

![](../media/GE-Muse-MAC-VU-360.png)

***Figure: Hardware for taking and processing ECG. Left - GE CAM acquisition module. Right - GE MAC VU360 electrocardiograph.***

TODO: crop GE CAM image to focus on chest diagram

## Notebook setup

In [None]:
#| code-fold: true
#| code-summary: "Click to see packages imported"
import os
import configparser
from pathlib import Path
from typing import TypedDict, List
from enum import Enum

import wfdb
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [None]:
#|include: false
# If the current working directory is the nbs/ folder, change to the project 
# root directory instead.

if Path.cwd().stem == "nbs":
    os.chdir(Path.cwd().parent)
print(f"The current working directory is {Path.cwd()}")

In [None]:
#|include: false
# Import configuration settings, like location of data directory.
config = configparser.ConfigParser()
if not Path("config.ini").exists():
    print("WARNING: Please generate a config.ini file by running scripts/get_datasets.py")
else:
    config.read("config.ini")
    data_dir = Path((config["datasets"]["path"])).expanduser()
    print(f"Datasets are located at {data_dir.resolve()}")

## The norwegian-athlete-ecg dataset

The [Norwegian Endurance Athlete ECG Database](https://physionet.org/content/norwegian-athlete-ecg/1.0.0/) (norwegian-athlete-ecg) contains 12-lead ECG recordings from 28 elite athletes from various sports in Norway. All recordings are 10 seconds resting ECGs recorded with a General Electric (GE) MAC VUE 360 electrocardiograph. All ECGs are interpreted with both the [GE Marquette SL12 algorithm](https://www.gehealthcare.com/products/marquette-12sl) (version 23 (v243)) and one cardiologist with training in interpretation of athlete's ECG. The data was collected at the University of Oslo in February and March 2020.

In [None]:
athlete_ecg_dir = data_dir / "norwegian-athlete-ecg" / "1.0.0"

An example of the data for athlete `ath_001` is shown below.

In [None]:
#| output: show
# 12-lead ECG recording from subject ath_001
record = wfdb.rdrecord(athlete_ecg_dir / "ath_001")
wfdb.plot_wfdb(record=record, title='ath_001 from Norwegian Athlete ECG database')

In [None]:
#| output: show
# Machine (SL12) and Cardiologist (C) interpretation of ath_001 ECG recording
record = wfdb.rdheader(athlete_ecg_dir / "ath_001")
record.__dict__["comments"]

In [None]:
#| code-fold: true
#| code-summary: Put ECG finding reports into a pandas dataframe
class AthleteReport():
    athlete_id: str
    cardiologist: str
    machine: str

reports_list = []
for i in range(1, 29):
    athlete_id = f"ath_00{i}" if i < 10 else f"ath_0{i}"
    record = wfdb.rdheader(athlete_ecg_dir / athlete_id)
    comments = record.__dict__["comments"]
    report: AthleteReport = {
        "athlete_id": athlete_id,
        "cardiologist": comments[1],
        "machine": comments[0],
    }
    reports_list.append(report)
athlete_ecg_df = pd.DataFrame(reports_list)

In [None]:
#| output: show
athlete_ecg_df.head()

## Findings from ECG reports

In the norwegian-athlete-ecg dataset, findings in ECG reports are delimited by 
a comma (`,`). However, some machine findings also make use of a comma to make 
a follow-up comment on a finding. This is not done in any of the human 
cardiologist reports in the dataset.

***Table: Examples of findings with follow-up comment***

| Finding with follow-up comment | Record |
|-|-|
| `Minimal voltage criteria for LVH, may be normal variant` | `ath_024` |
| `ST elevation, probably due to early repolarization` | `ath_024` |
| `ST elevation, consider early repolarization, pericarditis, or injury` | `ath_027` |

Follow-up comments from SL12 all seem to start with a lower-case letter, so 
they can be detected this way.

Some findings are also combined into one sentence using a conjunction word such 
as "and". 

e.g. `Sinus bradycardia and sinus arrhythmia and first degree AV block`

In [None]:
#| code-fold: true
#| code-summary: Click to see function for extracting a list of findings from a single line report

def extract_findings(report: str, follow_on: bool=True, split_and=True) -> List[str]:
    """Extract a list of all findings in a single line cardiologist report
    """
    comments = report.split(': ', maxsplit=1)[1].split(', ')
    
    # Also split multiple findings in a single comment joined by 'and'.
    # e.g. Sinus bradycardia and sinus arrhythmia and first degree AV block
    if split_and:
        temp = []
        for comment in comments:
            for segment in comment.split('and'):
                temp.append(segment)
        comments = temp

    # Cleanup (e.g. remove leading/trailing whitespace)
    comments[:] = list(map(str.strip, comments))

    if not follow_on:
        return comments     # i.e. assume every comment is a new finding

    # Combine follow-on comments with parent comment to produce full finding 
    # for SL12 machine comments.
    #
    # e.g. ST elevation, consider early repolarization, pericarditis, or injury
    findings = []
    for i, comment in enumerate(comments):
        if comment[0].isupper() or comment[0] == '*':
            findings.append(comment)
        else:
            findings[-1] = ''.join([findings[-1], ", ", comment])
    return findings

In [None]:
#| output: show
# Example usage of `extract_findings()`
report = athlete_ecg_df.loc[23].machine
extract_findings(report)

In [None]:
# Find every unique finding in dataset
unique_findings_sl12 = []
unique_findings_c = []
for i in range(1, 29):
    athlete_id = f"ath_00{i}" if i < 10 else f"ath_0{i}"
    record = wfdb.rdheader(athlete_ecg_dir / athlete_id)
    comments = record.__dict__["comments"]

    # Machine algorithm findings
    findings_sl12 = extract_findings(comments[0])
    for finding in findings_sl12:
        if finding not in unique_findings_sl12:
            unique_findings_sl12.append(finding)
    
    # Cardiologist findings
    findings_c = extract_findings(comments[1], follow_on=False)
    for finding in findings_c:
        if finding not in unique_findings_c:
            unique_findings_c.append(finding)


In [None]:
#| output: show
unique_findings_c

In [None]:
#| output: show
unique_findings_sl12

TODO: Explain how we're classifying different abnormalities.

TODO: Explain which abnormalities are relevant for athlete misdiagnosis.

In [None]:
# Classifying findings by the type of abnormality

class AbnormalityClass(Enum):
    # overall = "Overall ECG recording"   # Normal/Abnormal/Borderline etc.
    rhythm = "Rhythm"                   # e.g. sinus rhythm
    conduction = "Conduction"           # e.g. bundle branch block, AV block
    ischemia = "Ischemia"               # e.g. ST-segment, T-wave inversion
    structural = "Structural"           # e.g. chamber enlargement, hypertrophy
    measurement = "Measurement"         # e.g. axis deviation, wide QRS, PR interval
    equipment = "Equipment"             # e.g. Misplaced electrodes
    other = "Other"


## Disagreement between machine and cardiologist findings

TODO: Explanation of why athlete is likely to be misinterpreted. ECG interpretation guidelines for athletes.

### Overall finding

It is common practice to give an ECG recording or segment of a recording an 
overall finding of "normal", "borderline" or "abnormal" to aid clinical 
decision-making.

In the norwegian-athlete-ecg dataset, human cardiologists classified 2 
recordings as "borderline", and the remaining 26 as "normal". Observing the 
difference in overall finding between human cardiologist and SL12 algorithm 
gives an idea of the difficulty of interpretation posed by each recording.

In [None]:
#| code-fold: true
#| code-summary: Extract overall ECG findings from dataset and measure disagreement between C and SL12

# The final finding in each report is an "overall" classification for the 
# entire ECG recording.

# We can use the difference between machine and cardiologist `OverallFinding`
# values to quantify if the disagreement is "small" or "large".

class OverallFinding(Enum):
    Unknown = -99
    Normal = 0
    Borderline = 1
    Abnormal = 2

def classifyOverallFinding(findings: List[str]) -> OverallFinding:
    """Classifies the overall finding for an ECG recording.

    Assumes that the final finding in `findings` list comments on overall 
    finding.
    """
    overall = findings[-1].lower()
    if overall.find("abnormal") != -1:
        return OverallFinding.Abnormal
    elif overall.find("borderline") != -1:
        return OverallFinding.Borderline
    elif overall.find("normal") != -1:
        return OverallFinding.Normal
    else:
        return OverallFinding.Unknown

# Quantify the "overall disagreement" between cardiologist and SL12 algorithm.
count_0 = 0     # Agree
count_1 = 0     # Disagree (small)
count_2 = 0     # Disagree (large)

count_sl12_normal = 0
count_sl12_borderline = 0
count_sl12_abnormal = 0

for i in range(1, 29):
    athlete_id = f"ath_00{i}" if i < 10 else f"ath_0{i}"

    record = wfdb.rdheader(athlete_ecg_dir / athlete_id)
    comments = record.__dict__["comments"]

    findings_sl12 = extract_findings(comments[0])
    findings_c = extract_findings(comments[1])

    overall_sl12 = classifyOverallFinding(findings_sl12)
    overall_c = classifyOverallFinding(findings_c)

    if (overall_sl12.value - overall_c.value) == 0:
        count_0 += 1
    elif (overall_sl12.value - overall_c.value) == 1:
        count_1 += 1
    elif (overall_sl12.value - overall_c.value) == 2:
        count_2 += 1
    
    if (overall_sl12 == OverallFinding.Normal):
        count_sl12_normal += 1
    elif (overall_sl12 == OverallFinding.Borderline):
        count_sl12_borderline += 1
    elif (overall_sl12 == OverallFinding.Abnormal):
        count_sl12_abnormal += 1

    # print(f"{athlete_id} disagreement = {overall_sl12.value - overall_c.value}\tc = {overall_c.name}")


In [None]:
#| output: show
#| code-fold: true
#| code-summary: Plot difference between C and SL12 overall findings
sns.set_theme(style="ticks")

fig, (ax1, ax2) = plt.subplots(1, 2)
plt.suptitle("Difference in overall findings")

#
# Stacked bar charts showing ratio of normal/borderline/abnormal findings
#

colors = sns.color_palette('pastel')[5:8]
report_source = ["Cardiologist", "SL12"]
overall_finding_labels = ["Normal", "Borderline", "Abnormal"]
overall_findings_data = {
    "Normal": np.array([26, count_sl12_normal]) / 28.0,
    "Borderline": np.array([2, count_sl12_borderline]) / 28.0,
    "Abnormal": np.array([0, count_sl12_abnormal]) / 28.0,
}

# Iteratively build stacked bar chart, one finding class at a time.
bot_c = 0
bot_sl12 = 0
color_index = 0
for finding, counts in overall_findings_data.items():
    ax1.bar(
        report_source, 
        counts, 
        label=finding, 
        bottom=[bot_c, bot_sl12], 
        color=colors[color_index],
    )
    bot_c += counts[0]
    bot_sl12 += counts[1]
    color_index += 1

ax1.set_title("a) Proportion of overall findings")
ax1.set_ylabel("Proportion of athletes in dataset")
ax1.legend(loc="lower center")

# 
# Pie chart summarising overall disagreement
#

colors = sns.color_palette('pastel')[2:5]

ax2.pie(
    [count_0, count_1, count_2],
    colors = colors,
    autopct='%.0f%%',
    pctdistance=0.6,
)

ax2.legend(
    ["Agree", "Disagree (small)", "Disagree (large)"],
    bbox_to_anchor=(0.9, 0.05),
)
ax2.set_title("b) Disagreement between \nC & SL12 reports")

plt.show()

***Figure: Difference in overall ECG recording finding between Cardiologist (C) and Machine (SL12) reports.***

In [None]:
#| code-fold: true
#| code-summary: Add overall finding labels to pandas dataframe
athlete_ecg_df = athlete_ecg_df.assign(
    overall_c=list(map(lambda x: classifyOverallFinding( extract_findings(x, follow_on=False) ).name, athlete_ecg_df.cardiologist))
)

athlete_ecg_df = athlete_ecg_df.assign(
    overall_sl12=list(map(lambda x: classifyOverallFinding( extract_findings(x, follow_on=True) ).name, athlete_ecg_df.machine))
)

In [None]:
athlete_ecg_df.head()

### Rhythm findings

TODO: Disagreement between cardiologist and machine for individual abnormality 
classes.

In [None]:
class SinusRhythmFinding(Enum):
    Normal = 0
    Bradycardia = 1     # Slow rhythm
    Tachycardia = 2     # Fast rhythm
    Arrhythmia = 3      # Other abnormal rhythm

def classifySinusRhythmFindings(comments: List[str]) -> List[SinusRhythmFinding]:
    """Classifies sinus rhythm findings from an ECG findings report

    Multiple rhythm findings could be present in a single comment. 
    e.g. 'Sinus bradycardia with marked sinus arrhythmia'

    Returns a list of `SinusRhythmFinding`.
    """
    findings = []
    for c in comments:
        c = c.lower()
        if c.find("sinus") != -1:
            if c.find("arrhythmia") != -1:
                findings.append(SinusRhythmFinding.Arrhythmia)
            if c.find("bradycardia") != -1:
                findings.append(SinusRhythmFinding.Bradycardia)
            if c.find("tachycardia") != -1:
                findings.append(SinusRhythmFinding.Tachycardia)
            if (c.find("normal") != -1) and not (c.find("abnormal") != -1):
                findings.append(SinusRhythmFinding.Normal)
    return findings

In [None]:
# Example usage
comments = extract_findings(athlete_ecg_df.sample()['machine'].values[0])
sinus_findings = classifySinusRhythmFindings(comments)
print(sinus_findings)
print(comments)

In [None]:
#| code-fold: true
#| code-summary: Add sinus rhythm finding labels to pandas dataframe

# Get a list of sinus rhythm findings for every cardiologist record
sinus_rhythm_findings_c = list(map(lambda x: classifySinusRhythmFindings( extract_findings(x, follow_on=False) ), athlete_ecg_df.cardiologist))
sinus_rhythm_findings_sl12 = list(map(lambda x: classifySinusRhythmFindings( extract_findings(x, follow_on=True) ), athlete_ecg_df.machine))

# Convert to boolean flag for each finding
normal_flags = []
bradycardia_flags = []
tachycardia_flags = []
arrhythmia_flags = []
normal_flags_sl12 = []
bradycardia_flags_sl12 = []
tachycardia_flags_sl12 = []
arrhythmia_flags_sl12 = []
for lst in sinus_rhythm_findings_c:
    normal_flags.append(True) if SinusRhythmFinding.Normal in lst else normal_flags.append(False)
    bradycardia_flags.append(True) if SinusRhythmFinding.Bradycardia in lst else bradycardia_flags.append(False)
    tachycardia_flags.append(True) if SinusRhythmFinding.Tachycardia in lst else tachycardia_flags.append(False)
    arrhythmia_flags.append(True) if SinusRhythmFinding.Arrhythmia in lst else arrhythmia_flags.append(False)
for lst in sinus_rhythm_findings_sl12:
    normal_flags_sl12.append(True) if SinusRhythmFinding.Normal in lst else normal_flags_sl12.append(False)
    bradycardia_flags_sl12.append(True) if SinusRhythmFinding.Bradycardia in lst else bradycardia_flags_sl12.append(False)
    tachycardia_flags_sl12.append(True) if SinusRhythmFinding.Tachycardia in lst else tachycardia_flags_sl12.append(False)
    arrhythmia_flags_sl12.append(True) if SinusRhythmFinding.Arrhythmia in lst else arrhythmia_flags_sl12.append(False)

# Add each finding to dataframe as a new column
athlete_ecg_df = athlete_ecg_df.assign(
    sinus_rhythm_normal_c=normal_flags,
    sinus_rhythm_bradycardia_c=bradycardia_flags,
    sinus_rhythm_tachycardia_c=tachycardia_flags,
    sinus_rhythm_arrhythmia_c=arrhythmia_flags,
    sinus_rhythm_normal_sl12=normal_flags_sl12,
    sinus_rhythm_bradycardia_sl12=bradycardia_flags_sl12,
    sinus_rhythm_tachycardia_sl12=tachycardia_flags_sl12,
    sinus_rhythm_arrhythmia_sl12=arrhythmia_flags_sl12,
)

In [None]:
athlete_ecg_df.head()

## Exporting the labelled dataset