# CS 598: Deep Learning for Healthcare
- Instructor: Dr. Jimeng Sun
- Team Members:
  - Carmelita Valimento
  - William Su
  - Austin Harmon

## Final Project
Replication Paper #42: “Investigating Sleep Apnea in Children with a Specialized Multi-Modal Transformer”


## GitHub Repository Link
- Original Paper: https://github.com/healthylaife/Pediatric-Apnea-Detection
- Team 4 Repository: https://github.com/dlh-team-4/t4

## Demo Video Link
- https://drive.google.com/file/d/13CFhCvvBsJvxAmK8qwxcH7Boe2T_jbjg/view?usp=drive_link

## Notice
- Since we are dealing with large datasets, we only included snippets of the parts being requested in each section for reference. All output logs are embedding in the notebooks checked into our GitHub repository.

# Introduction
Sleep apnea in children poses a significant health concern, affecting approximately one to five percent of children in the United States. Unlike adult sleep apnea, pediatric sleep apnea presents distinct clinical causes and characteristics, demanding specialized attention. However, research dedicated to pediatric sleep apnea detection has been comparatively limited, especially concerning at-home testing tools and algorithmic approaches for automatic detection. Addressing this gap is crucial due to the potential adverse effects of untreated pediatric sleep apnea on both physical and mental health.

Polysomnography, the gold standard for diagnosing sleep-related breathing disorders, including apnea and hypopnea, presents numerous challenges, such as complexity, cost, and the need for clinical involvement. Consequently, there is a growing interest in developing accessible and effective diagnostic methods, particularly for pediatric populations, to mitigate the limitations of traditional polysomnography.

Recent advancements in artificial intelligence (AI) have spurred research into machine learning-based approaches for diagnosing sleep apnea without relying on polysomnography. While some studies have shown promising results in adults, research focusing on pediatric populations remains scarce. This scarcity underscores the importance of the present study, which introduces a machine learning-based model specifically tailored for detecting apnea events in children using commonly collected sleep signals.

The paper proposed a machine learning-based model for detecting obstructive sleep apnea-hypopnea syndrome (OSAHS) in pediatric patients using commonly collected sleep signals. It aimed to address the gap in pediatric sleep apnea detection by presenting a method that could achieve adult-level performance in detecting OSAHS patterns in children. It also introduced a customized transformer-based architecture for detecting OSAHS, which showed superior performance compared to existing methods. It utilized a novel data representation technique to handle polysomnography modalities, enabling the model to effectively process signals from different sources.

The study extensively explored the role of different combinations of common modalities and demonstrated that using only two easier-to-collect signals (ECG and SpO2) could achieve close to maximum performance. The proposed method outperformed state-of-the-art methods across two public datasets, as determined by the F1-score and AUROC measures. Additionally, using only ECG and SpO2 signals, which are easier to collect at home, the method achieved very competitive results, addressing concerns about collecting various sleep signals from children outside the clinic. The paper contributes significantly to the research regime by addressing the gap in pediatric sleep apnea detection, which has been understudied compared to adult sleep apnea, introducing a novel machine learning-based approach specifically tailored for pediatric OSAHS detection, which can potentially improve the accessibility of pediatric sleep apnea testing and treatment interventions, and demonstrating the feasibility of achieving adult-level performance in detecting OSAHS in children, thereby advancing the field towards more accessible and timely diagnosis and treatment of pediatric sleep disorders.



# Scope of Reproducibility:

We covered two hypotheses presented by the original paper:

1.   Hypothesis 1: It is possible to achieve adult-level performance in detecting obstructive sleep apnea-hypopnea syndrome (OSAHS) in pediatric populations using machine learning-based methods.
2.   Hypothesis 2: It is possible to achieve polysomnography (PSG)-level performance using only two easier-to-collect signals: ECG and SpO2.

We want to verify that using ECG and SpO2 signal data, we can employ machine learning-based methods to detect OSAHS in pediatric populations. The paper included two types of runs -- one with demographics and one without. We limited our scope to replicating the paper without the demographics. Our test plan is to screen the complete polysomnography edfs from CHAT and NCHDB and use the preprocessed datasets to train and test separate Multi-Modal Tranformer models for each dataset.

# Methodology

# Environment

The version of python being used is Python version: 3.9.7.

The original code was merged into several notebook files. Each file represented a section of the program that could be run individually. While we initially started with mounting from Google Drive, we eventually resorted to running locally due to upload limitations.

To run the notebook files, it was necessary to import multiple libraries, including pytorch, tensorflow, keras, numpy, pandas, glob, os, random, scipy, and biosppy. Some of the notebooks have unversioned installation commands in them, but we took note of the versions we actually used to make future replications easier:
- mne: 1.7.0
- six: 1.16.0
- numpy: 1.22.4
- pandas: 1.2.5
- matplotlib: 3.8.4
- scipy: 1.7.3
- tensorflow (used by keras and tensorflow_addons): 2.9.1
- keras: 2.9.0
- tensorflow_addons: 0.22.0
- torch (PyTorch): 1.10.0
- miniconda 24.3.0
- cudatoolkit 11.2
- cudnn 8.1.0

For ease of installation, we also included a `requirements.txt` file in our GitHub repository so the following command can be run locally:

In [None]:
pip3 install -r ./requirements.txt

##  Data

### Source
The data used in this study was obtained from National Sleep Research Resource (NSRR). We are using the Childhood Adenotonsillectomy Trial (CHAT) data. A formal data access request was made by filling out a form with information about our research project. After approval, an access token was provided, which is used for downloading the necessary dataset files. The appropriate files were uploaded to Google Drive and used by our program.

### Installation
1. To install the data, we requested access by first taking the UIUC HIPAA training and then presenting our use case to NSRR for approval.
2. Once approved, we used [nsrr-gem](https://github.com/nsrr/nsrr-gem/blob/master/README.md#prerequisites) to download the datasets we needed.
  - For the CHAT data:
    - We executed the command:
      `nsrr download chat/polysomnography/edfs/baseline`
    - We also needed the NSRR annotations for each edf file downloaded using the previous command:
      `nsrr download chat/polysomnography/annotations-events-nsrr/baseline`
  - For the NCH data, we executed these commands:
    - `nsrr download nchsdb/sleep_data --fast --file=".*\.edf$"`
    - `nsrr download nchsdb/sleep_data --fast --file=".*\.tsv$"`
    - `nsrr download nchsdb/sleep_data --fast --file=".*\.annot$"`
3. Once downloaded, we took the dataset directory and supplied it to the directory variables in our python notebook.

### Statistics
1. CHAT Dataset - Childhood Adenotonsillectomy Trial (CHAT) is a multi-center, single-blind, randomized, controlled trial designed to test whether after a 7-month observation period, children, ages 5 to 9.9 years, assessed at baseline and at 7-months with standardized full polysomnography with central scoring at the Brigham and Women’s Sleep Reading Center, will show greater levels of neurocognitive functioning, specifically in the attention-executive functioning domain, than children randomized to watchful waiting plus supportive care (WWSC). In total, 1,447 children had screening polysomnographs and 464 were randomized to treatment.
  - EDF file + Annotation XML file pairs (under `chat/polysomnography/edfs/baseline`)
    - Total number of pairs downloaded: 453
    - Total size in memory: 477 GB
    - Total number of files that passed criteria checking and produced a .npz file: 330 files
    - Total size in memory of generated .npz files during preprocessing: 120 GB
    - Total size in memory of generated .npz files during dataloading: 21.4 GB
      
2. NCH Dataset - Sleep Data bank introduced by the Nationwide Children's Hospital (NCH) and Carnegie Mellon University (CMU). This dataset has 3,984 pediatric sleep studies on 3,673 unique patients conducted at NCH in Columbus, Ohio, USA between 2017 and 2019, along with the patients' longitudinal clinical data.
  -  EDF file + Annotation XML file + TSV file
    - Total number of 3-tuples downloaded: 3984 (for each *.annot, *.edf, *.tsv)
    - Total size in memory: 2.08 TB
    - Total number of files that passed criteria checking and produced a .npz file:
    - Total size in memory of generated .npz files during preprocessing: 28
    - Total size in memory of generated .npz files during dataloading: 8.66 GB

### Data process
After the data was downloaded, the information was preprocessed to produce .npz files. Compressing the data in another format was a necessary step to prepare the data for the data loader. The data loader produces additional compressed files that are designed to be used with the main training function.

This is reflected by the order of running the notebooks:

 1. **CHAT_Preprocessor.ipynb** - takes the raw .edf and .xml files from the CHAT dataset and compresses each one into its own .npz file.
 2. **CHAT_Data_Loader.ipynb** - takes compressed signal data in the .npz files from the previous notebook, labels them using by analyzing the ECG signal data, compresses them further into K number of files, representative of each fold.
 3.  **NCH_Preprocessor.ipynb** - takes the raw .edf, .annot and .tsv files from the NCH dataset and compresses each one into its own .npz file.
 4. **NCH_Data_Loader.ipynb** - takes .npz files from the previous notebook and compresses them further into K number of files, representative of each fold. Initially, it uses a reference file `AHI.csv`, that presumable contains the computed Apnea-Hypopnea Index for each signal data file (.edf), but no AHI variable was available in the NSRR NCHDB dataset. Since we do not know how this was computed nor was there any AHI computation code checked into the orignal repository, we changed our approach to using the same ECG analysis that was also used in analyzing the CHAT dataset.
 5.  **Trainer_Evaluator.ipynb** - builds the `{}_model.pt` model file, trains it, and evaluates its performance against a given set of metrics. The summary of results are writted in a `results` folder generated in the same directory as the notebook. We had to build separate models because the channel structures for the NCH and CHAT signal data are different.


## Preprocessing Code
NOTE: this is a code snippet, we ran all the code locally and the notebooks with embedded results are all checked into our GitHub repository.

The CHAT preprocessor is available [here](https://github.com/dlh-team-4/t4/blob/main/CHAT_Preprocessor.ipynb).
The NCH preprocessor is available [here](https://github.com/dlh-team-4/t4/blob/main/NCH_Preprocessor.ipynb).

In [None]:
# CHAT_Preprocessor.ipynb

# Define directories for input data and output preprocessed data
PSG_DIR = "./data/chat/"
OUT_DIR = './data/chat/preprocessed'

# Set constants for data processing
THRESHOLD = 3
NUM_WORKER = 1
FREQ = 128.0  # Target frequency to which the data should be resampled
EPOCH_LENGTH = 30.0  # Length of each epoch in seconds

# Define list of channels used in the study
channels = [
    'E1', 'E2', 'F3', 'F4', 'C3', 'C4', 'M1', 'M2', 'O1', 'O2',
    'ECG1', 'ECG3', 'CANNULAFLOW', 'AIRFLOW', 'CHEST', 'ABD', 'SAO2', 'CAP'
]

# Dictionaries mapping event names from the XML to numerical codes
APNEA_EVENT_DICT = {
    "Obstructive apnea|Obstructive Apnea": 2,
    "Central apnea|Central Apnea": 2,
}
HYPOPNEA_EVENT_DICT = {
    "Hypopnea|Hypopnea": 1,
}
POS_EVENT_DICT = APNEA_EVENT_DICT.copy()
POS_EVENT_DICT.update(HYPOPNEA_EVENT_DICT)
NEG_EVENT_DICT = {
    'Stage 1 sleep|1': 0,
    'Stage 2 sleep|2': 0,
    'Stage 3 sleep|3': 0,
    'REM sleep|5': 0,
}
WAKE_DICT = {
    "Wake": 10,
    "Wake|0": 10
}

# Placeholder function to pass data through unchanged
def identity(df):
    return df

# Function to parse XML annotations into a pandas DataFrame
def parseScoredEvents(annotation_path):
    with open(annotation_path, "r") as f:
        xml_data = f.read()
    root = ET.fromstring(xml_data)
    scored_events = []
    for scored_event in root.find('ScoredEvents'):
        event_data = {
            'event_type': scored_event.find('EventType').text,
            'description': scored_event.find('EventConcept').text,
            'onset': scored_event.find('Start').text,
            'duration': scored_event.find('Duration').text,
            'clock_time': scored_event.find('ClockTime').text if scored_event.find('ClockTime') is not None else None,
            'signal_location': scored_event.find('SignalLocation').text if scored_event.find('SignalLocation') is not None else None
        }
        scored_events.append(event_data)
    df = pd.DataFrame(scored_events)
    return df.drop(df.index[0])

# Function to load EDF file, attach annotations, and preprocess the data
def load_study_chat(edf_path, annotation_path, annotation_func, preload=False, exclude=[], verbose='CRITICAL'):
    raw = mne.io.read_raw_edf(input_fname=edf_path, exclude=exclude, preload=preload, verbose=verbose)
    df = annotation_func(parseScoredEvents(annotation_path))
    annotations = mne.Annotations(df.onset, df.duration, df.description)
    raw.set_annotations(annotations)
    raw.rename_channels({name: name.upper() for name in raw.info['ch_names']})
    return raw

# Main preprocessing function
def preprocess(path, annotation_modifier, out_dir):
    print(path)
    raw = load_study_chat(path[0], path[1], annotation_modifier, verbose=True)
    if not all([name in raw.ch_names for name in channels]):
        print("study " + os.path.basename(path[0]) + " skipped since insufficient channels")
        return 0

    try:
        apnea_events, event_ids = mne.events_from_annotations(raw, event_id=POS_EVENT_DICT, chunk_duration=1.0)
    except ValueError as e:
        print(str(e))
        print("No Chunk found!")
        return 0

    # Attempt to identify different types of sleep events
    is_apnea_available, is_hypopnea_available = True, True
    try:
        apnea_events, event_ids = mne.events_from_annotations(raw, event_id=APNEA_EVENT_DICT, chunk_duration=1.0)
    except ValueError:
        is_apnea_available = False
    try:
        hypopnea_events, event_ids = mne.events_from_annotations(raw, event_id=HYPOPNEA_EVENT_DICT, chunk_duration=1.0)
    except ValueError:
        is_hypopnea_available = False
    wake_events, event_ids = mne.events_from_annotations(raw, event_id=WAKE_DICT, chunk_duration=1.0)

    # Processing and labeling the data according to detected events
    sfreq = raw.info['sfreq']
    tmax = EPOCH_LENGTH - 1. / sfreq
    raw = raw.pick_channels(channels, ordered=True)
    fixed_events = mne.make_fixed_length_events(raw, id=0, duration=EPOCH_LENGTH, overlap=0.)
    try:
        epochs = mne.Epochs(raw, fixed_events, event_id=[0], tmin=0, tmax=tmax, baseline=None, preload=True, proj=False)
        epochs.load_data()
    except AssertionError:
        return 0
    if sfreq != FREQ:
        epochs = epochs.resample(FREQ, npad='auto', n_jobs=8)
    data = epochs.get_data()
    if is_apnea_available:
        apnea_events_set = set((apnea_events[:, 0] / sfreq).astype(int))
    if is_hypopnea_available:
        hypopnea_events_set = set((hypopnea_events[:, 0] / sfreq).astype(int))
    wake_events_set = set((wake_events[:, 0] / sfreq).astype(int))

    starts = (epochs.events[:, 0] / sfreq).astype(int)
    labels_apnea = []
    labels_hypopnea = []
    labels_not_awake = []
    total_apnea_event_second = 0
    total_hypopnea_event_second = 0
    for seq in range(data.shape[0]):
        epoch_set = set(range(starts[seq], starts[seq] + int(EPOCH_LENGTH)))
        if is_apnea_available:
            apnea_seconds = len(apnea_events_set.intersection(epoch_set))
            total_apnea_event_second += apnea_seconds
            labels_apnea.append(apnea_seconds)
        else:
            labels_apnea.append(0)
        if is_hypopnea_available:
            hypopnea_seconds = len(hypopnea_events_set.intersection(epoch_set))
            total_hypopnea_event_second += hypopnea_seconds
            labels_hypopnea.append(hypopnea_seconds)
        else:
            labels_hypopnea.append(0)
        labels_not_awake.append(len(wake_events_set.intersection(epoch_set)) == 0)

    # Save the processed data to a compressed .npz file
    np.savez_compressed(
        out_dir + '\\' + os.path.basename(path[0]) + "_" + str(total_apnea_event_second) + "_" + str(total_hypopnea_event_second),
        data=data, labels_apnea=labels_apnea, labels_hypopnea=labels_hypopnea)
    return data.shape[0]

# Set log file for MNE processing information
mne.set_log_file('log.txt', overwrite=False)

# Iterate through each EDF file in the directory and preprocess using the 'identity' annotation modifier
edf_files = glob.glob(PSG_DIR + "*.edf")
for edf_file in edf_files:
    annot_file = edf_file.replace(".edf", "-nsrr.xml")  # Construct the corresponding XML file path
    preprocess((edf_file, annot_file), identity, OUT_DIR)  # Process each file pair

In [None]:
# NCH_Preprocessor.ipynb

# Directory where polysomnography (PSG) data files are stored
PSG_DIR = "./data/nch/"
# Directory where preprocessed data will be saved
OUT_DIR = './data/nch/preprocessed'

# Various configuration constants for data processing
THRESHOLD = 3
NUM_WORKER = 8  # Number of workers for parallel processing
FREQ = 128.0  # Sampling frequency to which the data will be resampled
EPOCH_LENGTH = 30.0  # Duration of each epoch in seconds
SN = 3984  # Serial number, possibly for tracking or identification

# List of channels to be included in the analysis
channels = [
    "EOG LOC-M2", "EOG ROC-M1", "EEG C3-M2", "EEG C4-M1", "ECG EKG2-EKG",
    "RESP PTAF", "RESP AIRFLOW", "RESP THORACIC", "RESP ABDOMINAL", "SPO2", "CAPNO"
]

# Dictionaries for event identification and labeling
APNEA_EVENT_DICT = {
    "Obstructive Apnea": 2, "Central Apnea": 2, "Mixed Apnea": 2, "apnea": 2,
    "obstructive apnea": 2, "central apnea": 2, "Apnea": 2
}
HYPOPNEA_EVENT_DICT = {
    "Obstructive Hypopnea": 1, "Hypopnea": 1, "hypopnea": 1, "Mixed Hypopnea": 1, "Central Hypopnea": 1
}
POS_EVENT_DICT = {
    **APNEA_EVENT_DICT, **HYPOPNEA_EVENT_DICT
}
NEG_EVENT_DICT = {
    'Sleep stage N1': 0, 'Sleep stage N2': 0, 'Sleep stage N3': 0, 'Sleep stage R': 0
}
WAKE_DICT = {
    "Sleep stage W": 10
}

# Functions to handle and modify annotations
def identity(df):
    """Returns the DataFrame as is without any modifications."""
    return df

def apnea2bad(df):
    """Replaces any apnea-related descriptions with 'badevent'."""
    df = df.replace(r'.*pnea.*', 'badevent', regex=True)
    print("bad replaced!")
    return df

def wake2bad(df):
    """Replaces 'Sleep stage W' with 'badevent'."""
    return df.replace("Sleep stage W", 'badevent')

def change_duration(df, label_dict=POS_EVENT_DICT, duration=EPOCH_LENGTH):
    """Adjusts the duration for events defined in label_dict to the specified epoch length."""
    for key in label_dict:
        df.loc[df.description == key, 'duration'] = duration
    print("change duration!")
    return df

def load_study_chat(edf_path, annotation_path, annotation_func, preload=False, exclude=[], verbose='CRITICAL'):
    """Loads EDF files, applies annotations, and processes according to the specified functions."""
    raw = mne.io.read_raw_edf(input_fname=edf_path, exclude=exclude, preload=preload, verbose=verbose)
    df = annotation_func(pd.read_csv(annotation_path, sep='\t'))
    annotations = mne.Annotations(df.onset, df.duration, df.description)
    raw.set_annotations(annotations)
    raw.rename_channels({name: name.upper() for name in raw.info['ch_names']})
    return raw

def preprocess(path, annotation_modifier, out_dir):
    """Main preprocessing function that loads, checks, and processes each study."""
    print(path)
    raw = load_study_chat(path[0], path[1], annotation_modifier, verbose=True)

    # Check if all necessary channels are present
    if not all([name in raw.ch_names for name in channels]):
        print("study " + os.path.basename(path[0]) + " skipped since insufficient channels")
        return 0

    # Try to extract and label apnea and hypopnea events
    try:
        apnea_events, event_ids = mne.events_from_annotations(raw, event_id=POS_EVENT_DICT, chunk_duration=1.0)
    except ValueError as e:
        print(str(e))
        print("No Chunk found!")
        return 0

    print(str(datetime.now().time().strftime("%H:%M:%S")) + ' --- Processing %s' % os.path.basename(path[0]))

    # Attempt to capture apnea, hypopnea, and wake events
    try:
        apnea_events, event_ids = mne.events_from_annotations(raw, event_id=APNEA_EVENT_DICT, chunk_duration=1.0)
    except ValueError:
        is_apnea_available = False

    try:
        hypopnea_events, event_ids = mne.events_from_annotations(raw, event_id=HYPOPNEA_EVENT_DICT, chunk_duration=1.0)
    except ValueError:
        is_hypopnea_available = False

    wake_events, event_ids = mne.events_from_annotations(raw, event_id=WAKE_DICT, chunk_duration=1.0)

    sfreq = raw.info['sfreq']
    tmax = EPOCH_LENGTH - 1. / sfreq
    raw = raw.pick_channels(channels, ordered=True)
    fixed_events = mne.make_fixed_length_events(raw, id=0, duration=EPOCH_LENGTH, overlap=0.)

    # Create epochs from the raw data
    try:
        epochs = mne.Epochs(raw, fixed_events, event_id=[0], tmin=0, tmax=tmax, baseline=None, preload=True, proj=False)
        epochs.load_data()
    except AssertionError:
        return 0

    if sfreq != FREQ:
        epochs = epochs.resample(FREQ, npad='auto', n_jobs=8)

    data = epochs.get_data()
    if is_apnea_available:
        apnea_events_set = set((apnea_events[:, 0] / sfreq).astype(int))
    if is_hypopnea_available:
        hypopnea_events_set = set((hypopnea_events[:, 0] / sfreq).astype(int))
    wake_events_set = set((wake_events[:, 0] / sfreq).astype(int))

    starts = (epochs.events[:, 0] / sfreq).astype(int)
    labels_apnea = []
    labels_hypopnea = []
    labels_not_awake = []
    total_apnea_event_second = 0
    total_hypopnea_event_second = 0

    # Label each epoch based on the presence of specific events
    for seq in range(data.shape[0]):
        epoch_set = set(range(starts[seq], starts[seq] + int(EPOCH_LENGTH)))
        if is_apnea_available:
            apnea_seconds = len(apnea_events_set.intersection(epoch_set))
            total_apnea_event_second += apnea_seconds
            labels_apnea.append(apnea_seconds)
        else:
            labels_apnea.append(0)

        if is_hypopnea_available:
            hypopnea_seconds = len(hypopnea_events_set.intersection(epoch_set))
            total_hypopnea_event_second += hypopnea_seconds
            labels_hypopnea.append(hypopnea_seconds)
        else:
            labels_hypopnea.append(0)

        labels_not_awake.append(len(wake_events_set.intersection(epoch_set)) == 0)

    # Filter out epochs corresponding to wake times and reformat the data
    data = data[labels_not_awake, :, :]
    labels_apnea = list(compress(labels_apnea, labels_not_awake))
    labels_hypopnea = list(compress(labels_hypopnea, labels_not_awake))

    # Save the processed data in a compressed format
    np.savez_compressed(
        out_dir + '\\' + os.path.basename(path[0]) + "_" + str(total_apnea_event_second) + "_" + str(total_hypopnea_event_second),
        data=data, labels_apnea=labels_apnea, labels_hypopnea=labels_hypopnea)

    return data.shape[0]

# Set log file for recording processing details
mne.set_log_file('log.txt', overwrite=False)

# Process each .edf file found in the PSG directory
edf_files = glob.glob(PSG_DIR + "*.edf")
for edf_file in edf_files:
    annot_file = edf_file.replace(".edf", ".tsv")  # Construct annotation file path
    preprocess((edf_file, annot_file), identity, OUT_DIR)  # Process and handle data


## Dataloading Code
NOTE: this is a code snippet, we ran all the code locally and the notebooks with embedded results are all checked into our GitHub repository.

The CHAT preprocessor is available [here](https://github.com/dlh-team-4/t4/blob/main/CHAT_Data_Loader.ipynb). The NCH preprocessor is available [here](https://github.com/dlh-team-4/t4/blob/main/NCH_Data_Loader.ipynb).

In [None]:
# CHAT_Data_Loader.ipynb

# Configuration settings for processing the signals
CHAT_SIGS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]  # Indices of signals to process
chat_s_count = len(CHAT_SIGS)  # Number of signals

CHAT_THRESHOLD = 3  # Not used in the provided snippet but could be a threshold value for processing
CHAT_PREPROCESSED_PATH = 'D:\\data_chat_baseline_30x128_test'  # Directory where preprocessed data is stored
CHAT_FREQ = 128  # Sampling frequency of the data
CHAT_EPOCH_LENGTH = 30  # Duration of each epoch in seconds
CHAT_ECG_SIG = 8  # Index of the ECG signal in the array
CHAT_OUT_PATH = "D:\\"  # Output directory for the results

# ...

# Define a function to extract respiration rate intervals (RRI) from an ECG signal
def extract_rri(signal, ir, CHUNK_DURATION):
    tm = np.arange(0, CHUNK_DURATION, step=1 / float(ir))  # Generate a time array for interpolation

    # Filter the ECG signal to isolate the QRS complex
    filtered, _, _ = st.filter_signal(signal=signal, ftype="FIR", band="bandpass", order=int(0.3 * CHAT_FREQ),
                                      frequency=[3, 45], sampling_rate=CHAT_FREQ)
    (rpeaks,) = hamilton_segmenter(signal=filtered, sampling_rate=CHAT_FREQ)  # Detect R peaks
    (rpeaks,) = correct_rpeaks(signal=filtered, rpeaks=rpeaks, sampling_rate=CHAT_FREQ, tol=0.05)  # Correct R peak detection

    # If an acceptable number of R-peaks are detected, proceed with further processing
    if 4 < len(rpeaks) < 200:
        rri_tm, rri_signal = rpeaks[1:] / float(CHAT_FREQ), np.diff(rpeaks) / float(CHAT_FREQ)
        ampl_tm, ampl_signal = rpeaks / float(CHAT_FREQ), signal[rpeaks]
        rri_interp_signal = splev(tm, splrep(rri_tm, rri_signal, k=3), ext=1)
        amp_interp_signal = splev(tm, splrep(ampl_tm, ampl_signal, k=3), ext=1)

        # Return interpolated signals, clipped to reasonable values
        return np.clip(rri_interp_signal, 0, 2) * 100, np.clip(amp_interp_signal, -0.001, 0.002) * 10000, True
    else:
        # Return zero arrays if the number of R-peaks is too few or too many
        return np.zeros((CHAT_FREQ * CHAT_EPOCH_LENGTH)), np.zeros((CHAT_FREQ * CHAT_EPOCH_LENGTH)), False

# Function to load and process data
def load_data(path):
    root_dir = os.path.expanduser(path)
    file_list = os.listdir(root_dir)  # List all files in the preprocessed directory
    length = len(file_list)

    # Create folds for cross-validation, assuming data split for 5-fold cross-validation
    study_event_counts = [i for i in range(0, length)]
    folds = []
    for i in range(5):
        folds.append(study_event_counts[i::5])

    x = []  # List to hold data arrays
    y_apnea = []  # List to hold apnea labels
    y_hypopnea = []  # List to hold hypopnea labels
    counter = 0
    for idx, fold in enumerate(folds):
        first = True
        for patient in fold:
            rri_succ_counter = 0
            rri_fail_counter = 0
            counter += 1
            print(counter)
            study_data = np.load(CHAT_PREPROCESSED_PATH + "\\" + file_list[patient - 1], allow_pickle = True)  # Load study data

            signals = study_data['data']
            labels_apnea = study_data['labels_apnea']
            labels_hypopnea = study_data['labels_hypopnea']

            y_c = labels_apnea + labels_hypopnea  # Combine labels for apnea and hypopnea
            neg_samples = np.where(y_c == 0)[0]  # Identify negative samples
            pos_samples = list(np.where(y_c > 0)[0])  # Identify positive samples
            ratio = len(pos_samples) / len(neg_samples)  # Calculate the ratio of positive to negative samples
            neg_survived = [s for s in neg_samples if random.random() < ratio]  # Randomly down-sample negative cases
            samples = neg_survived + pos_samples
            signals = signals[samples, :, :]
            labels_apnea = labels_apnea[samples]
            labels_hypopnea = labels_hypopnea[samples]

            # Initialize data array for this batch
            data = np.zeros((signals.shape[0], CHAT_EPOCH_LENGTH * CHAT_FREQ, chat_s_count + 2))
            for i in range(signals.shape[0]):  # Process each epoch
                data[i, :, -1], data[i, :, -2], status = extract_rri(signals[i, CHAT_ECG_SIG, :], CHAT_FREQ,
                                                                     float(CHAT_EPOCH_LENGTH))

                if status:
                    rri_succ_counter += 1
                else:
                    rri_fail_counter += 1

                for j in range(chat_s_count):  # Copy signal data to the respective positions
                    data[i, :, j] = signals[i, CHAT_SIGS[j], :]

            # Aggregate data and labels
            if first:
                aggregated_data = data
                aggregated_label_apnea = labels_apnea
                aggregated_label_hypopnea = labels_hypopnea
                first = False
            else:
                aggregated_data = np.concatenate((aggregated_data, data), axis=0)
                aggregated_label_apnea = np.concatenate((aggregated_label_apnea, labels_apnea), axis=0)
                aggregated_label_hypopnea = np.concatenate((aggregated_label_hypopnea, labels_hypopnea), axis=0)
            print(rri_succ_counter, rri_fail_counter)

        x.append(aggregated_data)
        y_apnea.append(aggregated_label_apnea)
        y_hypopnea.append(aggregated_label_hypopnea)

    return x, y_apnea, y_hypopnea

x, y_apnea, y_hypopnea = load_data(CHAT_PREPROCESSED_PATH)  # Load and process data

# Save the processed data to files and print the shape of the data and labels for each fold
for i in range(5):
        print(x[i].shape, y_apnea[i].shape, y_hypopnea[i].shape)
        np.savez_compressed(CHAT_OUT_PATH + "_" + str(i), x=x[i], y_apnea=y_apnea[i], y_hypopnea=y_hypopnea[i])


In [None]:
# NCH_Data_Loader.ipynb

# Configuration settings
NCH_SIGS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]  # Indices of signals to process
nch_s_count = len(NCH_SIGS)  # Number of signals to process

NCH_THRESHOLD = 3  # Threshold value, possibly for signal processing or data selection
NCH_PREPROCESSED_PATH = "./data/nch/preprocessed"  # Directory containing preprocessed data files
NCH_FREQ = 128  # Sampling frequency of the data
NCH_EPOCH_LENGTH = 30  # Duration of each epoch in seconds
NCH_ECG_SIG = 4  # Index of the ECG signal in the dataset
NCH_OUT_PATH = "./data/nch/dataloading/"  # Output directory for the processed data

# ...

# Define a function to extract Respiration Rate Interval (RRI) from ECG signals
def extract_rri(signal, ir, CHUNK_DURATION):
    # Time vector for interpolation
    tm = np.arange(0, CHUNK_DURATION, step=1 / float(ir))

    # Bandpass filter the ECG signal to isolate heartbeats
    filtered, _, _ = st.filter_signal(signal=signal, ftype="FIR", band="bandpass",
                                      order=int(0.3 * NCH_FREQ), frequency=[3, 45],
                                      sampling_rate=NCH_FREQ)

    # Detect and correct R peaks using the Hamilton segmenter algorithm
    (rpeaks,) = hamilton_segmenter(signal=filtered, sampling_rate=NCH_FREQ)
    (rpeaks,) = correct_rpeaks(signal=filtered, rpeaks=rpeaks, sampling_rate=NCH_FREQ, tol=0.05)

    # Only process segments with an appropriate number of detected R-peaks
    if 4 < len(rpeaks) < 200:
        # Calculate RRI and amplitude signals
        rri_tm, rri_signal = rpeaks[1:] / float(NCH_FREQ), np.diff(rpeaks) / float(NCH_FREQ)
        ampl_tm, ampl_signal = rpeaks / float(NCH_FREQ), signal[rpeaks]

        # Interpolate RRI and amplitude signals
        rri_interp_signal = splev(tm, splrep(rri_tm, rri_signal, k=3), ext=1)
        amp_interp_signal = splev(tm, splrep(ampl_tm, ampl_signal, k=3), ext=1)

        # Return interpolated and scaled signals
        return np.clip(rri_interp_signal, 0, 2) * 100, np.clip(amp_interp_signal, -0.001, 0.002) * 10000, True
    else:
        # Return zero arrays if the number of R-peaks is too few or too many
        return np.zeros((NCH_FREQ * NCH_EPOCH_LENGTH)), np.zeros((NCH_FREQ * NCH_EPOCH_LENGTH)), False

# Function to load data, balance the dataset, and organize it for analysis
def load_data(path):
    root_dir = os.path.expanduser(path)  # Convert path to absolute path
    file_list = os.listdir(root_dir)  # List files in the directory
    length = len(file_list)

    # Divide files into folds for cross-validation
    study_event_counts = [i for i in range(0, length)]
    folds = [study_event_counts[i::5] for i in range(5)]

    x = []
    y_apnea = []
    y_hypopnea = []
    counter = 0
    for idx, fold in enumerate(folds):
        first = True
        for patient in fold:
            rri_succ_counter = 0
            rri_fail_counter = 0
            counter += 1
            print(counter)
            study_data = np.load(NCH_PREPROCESSED_PATH + "\\" + file_list[patient - 1])

            signals = study_data['data']
            labels_apnea = study_data['labels_apnea']
            labels_hypopnea = study_data['labels_hypopnea']

            # Balance the dataset by undersampling negative samples
            y_c = labels_apnea + labels_hypopnea
            neg_samples = np.where(y_c == 0)[0]
            pos_samples = list(np.where(y_c > 0)[0])
            ratio = len(pos_samples) / len(neg_samples)
            neg_survived = [s for s in neg_samples if random.random() < ratio]
            samples = neg_survived + pos_samples
            signals = signals[samples, :, :]
            labels_apnea = labels_apnea[samples]
            labels_hypopnea = labels_hypopnea[samples]

            # Prepare data for each epoch including RRI extraction
            data = np.zeros((signals.shape[0], NCH_EPOCH_LENGTH * NCH_FREQ, nch_s_count + 2))
            for i in range(signals.shape[0]):
                data[i, :, -1], data[i, :, -2], status = extract_rri(signals[i, NCH_ECG_SIG, :], NCH_FREQ, float(NCH_EPOCH_LENGTH))

                if status:
                    rri_succ_counter += 1
                else:
                    rri_fail_counter += 1

                for j in range(nch_s_count):
                    data[i, :, j] = signals[i, NCH_SIGS[j], :]

            # Aggregate data for the current fold
            if first:
                aggregated_data = data
                aggregated_label_apnea = labels_apnea
                aggregated_label_hypopnea = labels_hypopnea
                first = False
            else:
                aggregated_data = np.concatenate((aggregated_data, data), axis=0)
                aggregated_label_apnea = np.concatenate((aggregated_label_apnea, labels_apnea), axis=0)
                aggregated_label_hypopnea = np.concatenate((aggregated_label_hypopnea, labels_hypopnea), axis=0)
            print(rri_succ_counter, rri_fail_counter)

        x.append(aggregated_data)
        y_apnea.append(aggregated_label_apnea)
        y_hypopnea.append(aggregated_label_hypopnea)

    return x, y_apnea, y_hypopnea

# Load and process data
x, y_apnea, y_hypopnea = load_data(NCH_PREPROCESSED_PATH)

# Output the shape of the processed datasets and save them to files
for i in range(5):
    print(x[i].shape, y_apnea[i].shape, y_hypopnea[i].shape)
    np.savez_compressed(NCH_OUT_PATH + "_" + str(i), x=x[i], y_apnea=y_apnea[i], y_hypopnea=y_hypopnea[i])


## Model : Multi-modal Transformer (Fayyaz et al., 2023)

A model statistics report was produced as a result of running the provided model.py code. According to the report, there are six Conv1D layers and three SeparableConv1D layers. Six Dropout layers were counted, as well as nine Dense layers. There is one MultiHeadAttention mechanism present. The total number of parameters is 389284, which is also equal to the number of trainable parameters. The untrained model occupies 1.49 MB of storage.

### Citation to the Original Paper

- Fayyaz H, Strang A, Beheshti R. Bringing At-home Pediatric Sleep Apnea Testing Closer to Reality: A Multi-modal Transformer Approach. Proc Mach Learn Res. 2023 Aug;219:167-185. PMID: 38344396; PMCID: PMC10854997.

### Citation to the Original Paper
- Original Paper: https://github.com/healthylaife/Pediatric-Apnea-Detection

### Architecture
The model architecture is designed for processing sequential data to indentify complex patterns in high-dimensional data spaces. It begins with an input layer that normalizes data per instance, followed by a transformation into a series of patches to simulate a vision Transformer-like mechanism. These patches are then encoded and processed through multiple Transformer encoder layers, which include multi-head attention for capturing contextual relationships, and dense layers with non-linear activation for advanced feature transformation. The architecture utilizes skip connections and layer normalization to stabilize training and enhance learning effectiveness. This setup culminates in a global average pooling layer that prepares a condensed feature representation for the final prediction output, making the model highly capable of handling complex, multi-modal datasets.

| Layer (type)                    | Output Shape          | Param # | Connected to                        |
|---------------------------------|-----------------------|---------|-------------------------------------|
| input_5 (InputLayer)            | (None, 3840, 4)       | 0       | []                                  |
| instance_normalization_2 (InstanceNormalization) | (None, 3840, 4)  | 0   | ['input_5[0][0]']                |
| patches_1 (Patches)             | (None, None, 768)     | 0       | ['instance_normalization_2[0][0]'] |
| patch_encoder_1 (PatchEncoder)  | (None, None, 32)      | 24,608  | ['patches_1[0][0]']                 |
| multi_head_attention_2 (MultiHeadAttention) | (None, None, 32) | 16,800 | ['patch_encoder_1[0][0]', 'patch_encoder_1[0][0]'] |
| add_4 (Add)                     | (None, None, 32)      | 0       | ['multi_head_attention_2[0][0]', 'patch_encoder_1[0][0]'] |
| layer_normalization_3 (LayerNormalization) | (None, None, 32) | 64    | ['add_4[0][0]']                    |
| dense_20 (Dense)                | (None, None, 64)      | 2,112   | ['layer_normalization_3[0][0]']    |
| tf.nn.gelu_6 (TFOpLambda)       | (None, None, 64)      | 0       | ['dense_20[0][0]']                 |
| dropout_13 (Dropout)            | (None, None, 64)      | 0       | ['tf.nn.gelu_6[0][0]']             |
| dense_21 (Dense)                | (None, None, 32)      | 2,080   | ['dropout_13[0][0]']               |
| tf.nn.gelu_7 (TFOpLambda)       | (None, None, 32)      | 0       | ['dense_21[0][0]']                 |
| dropout_14 (Dropout)            | (None, None, 32)      | 0       | ['tf.nn.gelu_7[0][0]']             |
| add_5 (Add)                     | (None, None, 32)      | 0       | ['dropout_14[0][0]', 'add_4[0][0]'] |
| multi_head_attention_3 (MultiHeadAttention) | (None, None, 32) | 16,800 | ['add_5[0][0]', 'add_5[0][0]']    |
| add_6 (Add)                     | (None, None, 32)      | 0       | ['multi_head_attention_3[0][0]', 'add_5[0][0]'] |
| layer_normalization_4 (LayerNormalization) | (None, None, 32) | 64    | ['add_6[0][0]']                    |
| dense_22 (Dense)                | (None, None, 64)      | 2,112   | ['layer_normalization_4[0][0]']    |
| tf.nn.gelu_8 (TFOpLambda)       | (None, None, 64)      | 0       | ['dense_22[0][0]']                 |
| dropout_15 (Dropout)            | (None, None, 64)      | 0       | ['tf.nn.gelu_8[0][0]']             |
| dense_23 (Dense)                | (None, None, 32)      | 2,080   | ['dropout_15[0][0]']               |
| tf.nn.gelu_9 (TFOpLambda)       | (None, None, 32)      | 0       | ['dense_23[0][0]']                 |
| dropout_16 (Dropout)            | (None, None, 32)      | 0       | ['tf.nn.gelu_9[0][0]']             |
| add_7 (Add)                     | (None, None, 32)      | 0       | ['dropout_16[0][0]', 'add_6[0][0]'] |
| multi_head_attention_4 (MultiHeadAttention) | (None, None, 32) | 16,800 | ['add_7[0][0]', 'add_7[0][0]']    |
| add_8 (Add)                     | (None, None, 32)      | 0       | ['multi_head_attention_4[0][0]', 'add_7[0][0]'] |
| layer_normalization_5 (LayerNormalization) | (None, None, 32) | 64    | ['add_8[0][0]']                    |
| dense_24 (Dense)                | (None, None, 64)      | 2,112   | ['layer_normalization_5[0][0]']    |
| tf.nn.gelu_10 (TFOpLambda)      | (None, None, 64)      | 0       | ['dense_24[0][0]']                 |
| dropout_17 (Dropout)            | (None, None, 64)      | 0       | ['tf.nn.gelu_10[0][0]']            |
| dense_25 (Dense)                | (None, None, 32)      | 2,080   | ['dropout_17[0][0]']               |
| tf.nn.gelu_11 (TFOpLambda)      | (None, None, 32)      | 0       | ['dense_25[0][0]']                 |
| dropout_18 (Dropout)            | (None, None, 32)      | 0       | ['tf.nn.gelu_11[0][0]']            |
| add_9 (Add)                     | (None, None, 32)      | 0       | ['dropout_18[0][0]', 'add_8[0][0]'] |
| multi_head_attention_5 (MultiHeadAttention) | (None, None, 32) | 16,800 | ['add_9[0][0]', 'add_9[0][0]']    |
| add_10 (Add)                    | (None, None, 32)      | 0       | ['multi_head_attention_5[0][0]', 'add_9[0][0]'] |
| layer_normalization_6 (LayerNormalization) | (None, None, 32) | 64    | ['add_10[0][0]']                   |
| dense_26 (Dense)                | (None, None, 64)      | 2,112   | ['layer_normalization_6[0][0]']    |
| tf.nn.gelu_12 (TFOpLambda)      | (None, None, 64)      | 0       | ['dense_26[0][0]']                 |
| dropout_19 (Dropout)            | (None, None, 64)      | 0       | ['tf.nn.gelu_12[0][0]']            |
| dense_27 (Dense)                | (None, None, 32)      | 2,080   | ['dropout_19[0][0]']               |
| tf.nn.gelu_13 (TFOpLambda)      | (None, None, 32)      | 0       | ['dense_27[0][0]']                 |
| dropout_20 (Dropout)            | (None, None, 32)      | 0       | ['tf.nn.gelu_13[0][0]']            |
| add_11 (Add)                    | (None, None, 32)      | 0       | ['dropout_20[0][0]', 'add_10[0][0]'] |
| layer_normalization_7 (LayerNormalization) | (None, None, 32) | 64    | ['add_11[0][0]']                   |
| global_average_pooling1d_1 (GlobalAveragePooling1D) | (None, 32) | 0  | ['layer_normalization_7[0][0]']  |
| dense_28 (Dense)                | (None, 256)           | 8,448   | ['global_average_pooling1d_1[0][0]'] |
| tf.nn.gelu_14 (TFOpLambda)      | (None, 256)           | 0       | ['dense_28[0][0]']                 |
| dropout_21 (Dropout)            | (None, 256)           | 0       | ['tf.nn.gelu_14[0][0]']            |
| dense_29 (Dense)                | (None, 128)           | 32,896  | ['dropout_21[0][0]']               |
| tf.nn.gelu_15 (TFOpLambda)      | (None, 128)           | 0       | ['dense_29[0][0]']                 |
| dropout_22 (Dropout)            | (None, 128)           | 0       | ['tf.nn.gelu_15[0][0]']            |
| dense_30 (Dense)                | (None, 1)             | 129     | ['dropout_22[0][0]']               |

#### Core Components and Data Flow
1. **Input Layer**: The model starts with an InputLayer that accepts a specific input shape, which in this case, is a sequence with a length of 3840 and 4 features. This layer is purely for accepting input data, and no computations are performed here.
2. **Instance Normalization**: Immediately after the input, an InstanceNormalization layer normalizes the data. Unlike batch normalization that works across batch dimensions, instance normalization normalizes across each channel in each data instance. This helps in stabilizing the learning process and is particularly useful in tasks involving high internal covariate shift.
3. **Patches Creation**: The Patches layer takes the normalized data and breaks it into patches. This is typical in models that process data similarly to Vision Transformers, where the input is split into numerous patches to be processed individually. Each patch then serves as an input token for the Transformer encoder.
4. **Patch Encoder**: The PatchEncoder layer maps these patches into a higher-dimensional space, turning each patch into an embedded representation that the Transformer can process. This is akin to word embeddings in natural language processing.
5. **Transformer Encoder Layers**:
Multi-Head Attention: Several MultiHeadAttention layers allow the model to focus on different parts of the input sequence, gathering contextual information from different positions simultaneously. This is key in capturing the dependencies and relationships between different parts of the input.
Skip Connections (Add Layers): Each attention layer is followed by an Add layer, implementing skip connections that help in gradient flow during backpropagation, preventing the vanishing gradient problem common in deep networks.
6. **Layer Normalization**: Each Add layer is followed by LayerNormalization which helps stabilize the network's training by normalizing the outputs of each layer.
Feed-Forward Network: Post attention, the data passes through a series of Dense layers with GELU activation and Dropout. These layers act as feed-forward neural networks that transform the attention-augmented features further. This portion of the network is crucial for learning non-linear transformations and relationships in the data.
7. **Output Preparation**:
Global Average Pooling: Before the final output, a GlobalAveragePooling1D layer condenses the sequence of processed data into a single vector, capturing the essence of the entire sequence in a fixed-size output. This is especially useful in reducing model complexity and preparing the data for final output.
8. **Output Layers**: The condensed data vector is then processed through more Dense layers and non-linear activations to finally output a prediction.


### Training objectives
The purpose of training the model is to update the parameters until a minimized loss function is achieved. The parameters will be updated by adjusting the connection weights and output biases. The goal of training the model is to obtain an accurate prediction of a sleep apnea event, based on sleep signals, including SpO2 and ECG.

The ultimate goal of training is to improve the prediction accuracy of sleep apnea/hypopnea occurences using ECG and SpO2 signals.


### Model Creation Code
NOTE: this is a code snippet, we ran all the code locally and the notebooks with embedded results are all checked into our GitHub repository.

The notebook with the results from the Hybrid Transformer model (Hu et al., 2022) is found [here](https://github.com/dlh-team-4/t4/blob/main/Trainer_Evaluator.ipynb).

The notebook with the results from the Multi-Modal Transformer model (Fayyaz et al., 2023) is found [here](https://github.com/dlh-team-4/t4/blob/main/Trainer_Evaluator_Transformer.ipynb).

In [None]:
# Define the `Patches` class to break down images into smaller patches
class Patches(Layer):
    def __init__(self, patch_size):
        super(Patches, self).__init__()
        self.patch_size = patch_size  # Size of each patch to extract

    def call(self, input):
        # Reshape the input to add a new axis
        input = input[:, tf.newaxis, :, :]
        batch_size = tf.shape(input)[0]  # Get the batch size
        # Extract patches from the input image
        patches = tf.image.extract_patches(
            images=input,
            sizes=[1, 1, self.patch_size, 1],
            strides=[1, 1, self.patch_size, 1],
            rates=[1, 1, 1, 1],
            padding="VALID",
        )
        patch_dims = patches.shape[-1]
        # Reshape the extracted patches
        patches = tf.reshape(patches, [batch_size, -1, patch_dims])
        return patches

# Define the `PatchEncoder` class to project image patches into an embedding space
class PatchEncoder(Layer):
    def __init__(self, num_patches, projection_dim, l2_weight):
        super(PatchEncoder, self).__init__()
        self.projection_dim = projection_dim
        self.l2_weight = l2_weight
        self.num_patches = num_patches
        self.projection = Dense(units=projection_dim, kernel_regularizer=L2(l2_weight),
                                bias_regularizer=L2(l2_weight))
        self.position_embedding = tf.keras.layers.Embedding(
            input_dim=num_patches, output_dim=projection_dim)

    def call(self, patch):
        positions = tf.range(start=0, limit=self.num_patches, delta=1)
        encoded = self.projection(patch)
        return encoded

# Define a function to build a multi-layer perceptron (MLP)
def mlp(x, hidden_units, dropout_rate, l2_weight):
    for _, units in enumerate(hidden_units):
        x = Dense(units, activation=None, kernel_regularizer=L2(l2_weight), bias_regularizer=L2(l2_weight))(x)
        x = tf.nn.gelu(x)
        x = Dropout(dropout_rate)(x)
    return x

# Define a function to create a transformer-based model
def create_transformer_model(input_shape, num_patches, projection_dim, transformer_layers, num_heads, transformer_units, mlp_head_units, num_classes, drop_out, reg, l2_weight, demographic=False):
    if reg:
        activation = None
    else:
        activation = 'sigmoid'
    inputs = Input(shape=input_shape)
    patch_size = input_shape[0] / num_patches
    # Normalize input features and possibly handle demographic information
    if demographic:
        normalized_inputs = tfa.layers.InstanceNormalization(axis=-1, epsilon=1e-6, center=False, scale=False,
                                                             beta_initializer="glorot_uniform",
                                                             gamma_initializer="glorot_uniform")(inputs[:, :, :-1])
        demo = inputs[:, :12, -1]
    else:
        normalized_inputs = tfa.layers.InstanceNormalization(axis=-1, epsilon=1e-6, center=False, scale=False,
                                                             beta_initializer="glorot_uniform",
                                                             gamma_initializer="glorot_uniform")(inputs)

    # Extract patches and encode them
    patches = Patches(patch_size=patch_size)(normalized_inputs)
    encoded_patches = PatchEncoder(num_patches=num_patches, projection_dim=projection_dim, l2_weight=l2_weight)(patches)
    # Apply transformer layers
    for i in range(transformer_layers):
        x1 = encoded_patches
        attention_output = MultiHeadAttention(
            num_heads=num_heads, key_dim=projection_dim, dropout=drop_out, kernel_regularizer=L2(l2_weight),
            bias_regularizer=L2(l2_weight))(x1, x1)
        x2 = Add()([attention_output, encoded_patches])
        x3 = LayerNormalization(epsilon=1e-6)(x2)
        x3 = mlp(x3, transformer_units, drop_out, l2_weight)
        encoded_patches = Add()([x3, x2])

    x = LayerNormalization(epsilon=1e-6)(encoded_patches)
    x = GlobalAveragePooling1D()(x)
    features = mlp(x, mlp_head_units, 0.0, l2_weight)

    logits = Dense(num_classes, kernel_regularizer=L2(l2_weight), bias_regularizer=L2(l2_weight),
                   activation=activation)(features)

    return tf.keras.Model(inputs=inputs, outputs=logits)

if __name__ == "__main__":
    config = {
        "model_name": "hybrid",
        "regression": False,

        "transformer_layers": 4,
        "drop_out_rate": 0.25,
        "num_patches": 20,
        "transformer_units": 32,
        "regularization_weight": 0.001,
        "num_heads": 4,
        "epochs": 100,
        "channels": [14, 18, 19, 20],
    }
    model = get_model(config)
    model.build(input_shape=(1, 30 * DATA_FREQ, 10))
    print(model.summary())

    # Save the model for later use
    torch.save(model, f"./{config['model_name']}_model.pt")


# Training

## Hyperparameters

Many hyperparameters can be found in the “config” variable in the program that calls the training function, which is train(). The training of both datasets utilizes a dropout rate of 0.25, a regularization weight of 0.001, and 100 epochs. Additionally, the number of transformer layers were set to 5 and the number of patches were set to 30. A table is presented below with more detailed information about the parameter settings.

<table>
    <tr>
        <th>Hyperparameter</th>
        <th>Value</th>
    </tr>
    <tr>
        <td>transformer_layers</td>
        <td>4</td>
    </tr>
    <tr>
        <td>drop_out_rate</td>
        <td>0.25</td>
    </tr>
    <tr>
        <td>num_patches</td>
        <td>20</td>
    </tr>
        <tr>
        <td>transformer_units</td>
        <td>32</td>
    </tr>
        <tr>
        <td>regularization_weight</td>
        <td>0.001</td>
    </tr>
    <tr>
        <td>num_heads</td>
        <td>4</td>
    </tr>
    <tr>
        <td>epochs</td>
        <td>100</td>
    </tr>
    <tr>
        <td>channels</td>
        <td>[14,18,19,20]</td>
    </tr>



    
</table>

## Computational Requirements

Initially, when testing and debugging, the CPU runtime on Google Colab was sufficient. When we arrived at the training stage, the CPU runtime was determined to be too time intensive. One epoch would complete after about 30 minutes. Therefore, we determined it was necessary to utilize a team member's available GPU (RTX 4090) for faster training epochs, which brought down the duration to 26s for chat and 1s for nch per epoch. All in all, we executed 2 trials (CHAT and NCH) with different configurations, and a total of <4 hours GPU hrs used while running around 60 epochs. (While we initially set 200 epochs, the actual run only reached 78 epochs for CHAT and 54 epochs for NCH).

## Training Code
NOTE: this is a code snippet, we ran all the code locally and the notebooks with embedded results are all checked into our GitHub repository.

The notebook with the results from the Hybrid Transformer model (Hu et al., 2022) is found [here](https://github.com/dlh-team-4/t4/blob/main/Trainer_Evaluator.ipynb).

The notebook with the results from the Multi-Modal Transformer model (Fayyaz et al., 2023) is found [here](https://github.com/dlh-team-4/t4/blob/main/Trainer_Evaluator_Transformer.ipynb).

In [None]:
# Generic Training Function

# Define a learning rate scheduler function that reduces the learning rate by half every 5 epochs after 50 epochs
def lr_schedule(epoch, lr):
    if epoch > 50 and (epoch - 1) % 5 == 0:
        lr *= 0.5
    return lr

# Generator function that loads .npz files from a directory, yielding batches of data
def data_generator(directory):
    for filename in os.listdir(directory):
        if filename.endswith('.npz'):
            data = np.load(os.path.join(directory, filename), allow_pickle=True)
            x = data['x']  # Extracts the feature array
            y = data['y_apnea'] + data['y_hypopnea']  # Sum of apnea and hypopnea labels
            del data  # Remove the reference to data to help with memory management
            gc.collect()  # Explicitly invoke garbage collection
            yield x, y  # Yield a tuple of features and labels for training

# Main training function
def train(config, fold=None):
    # Preallocate memory for training and label data for each fold
    x = [None]*FOLD
    y = [None]*FOLD
    data_gen = data_generator(config["data_path"])  # Initialize the data generator
    for i in range(FOLD):
        x[i], y[i] = next(data_gen)  # Load data for each fold
        x[i], y[i] = shuffle(x[i], y[i])  # Shuffle the data to prevent order bias
        x[i] = np.nan_to_num(x[i], nan=1)  # Replace NaN values with 1 in the feature set
        if config["regression"]:
            y[i] = np.sqrt(y[i])  # Apply square root transformation for regression problems
            y[i][y[i] != 0] += 2  # Increment non-zero entries by 2
        else:
            y[i] = np.where(y[i] >= THRESHOLD, 1, 0)  # Apply a threshold to create binary labels

        x[i] = x[i][:, :, config["channels"]]  # Select the specified channels from the data

    del data_gen  # Clean up the generator
    gc.collect()  # Collect garbage to free memory

    # Determine which folds to use for training based on input parameters
    folds = range(FOLD) if fold is None else [fold]

    for fold in folds:
        first = True
        for i in range(5):
            if i != fold:
                if first:
                    x_train = x[i]
                    y_train = y[i]
                    first = False
                else:
                    # Aggregate training data from different folds
                    x_train = np.concatenate((x_train, x[i]))
                    y_train = np.concatenate((y_train, y[i]))

        # Initialize the model
        model = get_model(config)
        if config["regression"]:
            model.compile(optimizer="adam", loss=BinaryCrossentropy())
        else:
            model.compile(optimizer="adam", loss=BinaryCrossentropy(),
                          metrics=[keras.metrics.Precision(), keras.metrics.Recall()])

        early_stopper = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
        lr_scheduler = LearningRateScheduler(lr_schedule)

        print(f"train {x_train.shape}")  # Print the shape of the training dataset
        # Fit the model
        model.fit(x=x_train, y=y_train, batch_size=512, epochs=config["epochs"], validation_split=0.1,
                  callbacks=[early_stopper, lr_scheduler])

        # Save the trained model
        model.save(config["model_path"] + str(fold))
        keras.backend.clear_session()  # Clear the TensorFlow session to free memory


In [None]:
# CHAT Training

# Mapping from signal types to indices, clarifying how each signal is processed and stored
sig_dict_chat = {
    "EOG": [0, 1],  # Electrooculography signals
    "EEG": [4, 5],  # Electroencephalography signals for two channels
    "ECG": [15, 16],  # Electrocardiography signals, including RRI (Respiratory Rate Index) and Ramp
    "Resp": [9, 10],  # Respiratory signals: Cannula flow and airflow
    "SPO2": [13],  # Oxygen saturation level
    "CO2": [14],  # Carbon dioxide levels
}

# List of combinations of channels to use in different model configurations
channel_list_chat = [
    ["ECG", "SPO2"],
]

# Loop through each channel combination specified in channel_list_chat
for ch in channel_list_chat:
    chs = []  # To collect indices of the channels from sig_dict_chat
    chstr = ""  # To create a string representation of the channels for naming purposes
    for name in ch:
        chstr += name  # Append the name of the channel to the string
        chs += sig_dict_chat[name]  # Append channel indices to the list

    print(chstr, chs)  # Output the combined name of the channels and their indices

    # Configuration for the training of the model
    config = {
        "data_path": "D:\\chat_128_FREQ_45\\",  # Path where the training data is stored
        "model_path": "C:\\Users\William\\Documents\\Development\\Python projects\\Deeplearning healthcare\\Pediatric-Apnea-Detection\\chat_Transformer_model.pt",
        "model_name": "Transformer_chat_" + chstr,  # Name of the model, including the channel combination
        "regression": False,  # Indicates if the model should perform regression; False implies classification

        "transformer_layers": 5,  # Number of transformer layers in the model
        "drop_out_rate": 0.25,  # Dropout rate for regularization
        "num_patches": 30,  # Number of patches to divide the input into for the Transformer
        "transformer_units": 32,  # Size of the Transformer embeddings
        "regularization_weight": 0.001,  # L2 regularization weight
        "num_heads": 4,  # Number of attention heads in the Transformer model
        "epochs": 200,  # Number of epochs to train the model
        "channels": chs,  # Channels selected for training based on the current combination
    }

    train(config, 0)  # Call the training function with the specified configuration and fold index


In [None]:
# NCH Training

# Dictionary mapping signal types to their corresponding channel indices
sig_dict_nch = {
    "EOG": [0, 1],  # Electrooculography signals from two channels
    "EEG": [2, 3],  # Electroencephalography signals from two channels
    "RESP": [5, 6],  # Two respiratory signals
    "SPO2": [9],     # Oxygen saturation
    "CO2": [10],     # Carbon dioxide levels
    "ECG": [11, 12], # Electrocardiogram signals, including RRI and Ramp
    "DEMO": [13],    # Demographic data, which might include patient-specific information
}

# List of signal combinations to be used for training different model configurations
channel_list_nch = [
    ["ECG", "SPO2"],
]

# Loop through each specified channel combination to configure and train a model
for ch in channel_list_nch:
    chs = []  # Initialize an empty list to hold the indices of channels
    chstr = ""  # Initialize an empty string to concatenate signal names for model naming
    for name in ch:
        chstr += name  # Concatenate signal names to form part of the model name
        chs += sig_dict_nch[name]  # Add channel indices for the current signal name to the list

    # Configuration dictionary for the model training
    config = {
        "data_path": "D:\\nch_128_FREQ_45\\",  # Directory containing the training data
        "model_path": "C:\\Users\\William\\Documents\\Development\\Python projects\\Deeplearning healthcare\\Pediatric-Apnea-Detection\\nch_Transformer_model.pt",
        "model_name": "Transformer_nch_" + chstr,  # Constructed model name indicating type and channel combination
        "regression": False,  # Specifies if the model is for regression; False indicates classification

        # Model parameters
        "transformer_layers": 5,  # Number of transformer layers
        "drop_out_rate": 0.25,  # Dropout rate to prevent overfitting
        "num_patches": 30,  # Number of patches the input is divided into
        "transformer_units": 32,  # Size of each transformer unit
        "regularization_weight": 0.001,  # L2 regularization to penalize large weights
        "num_heads": 4,  # Number of attention heads in the transformer model
        "epochs": 200,  # Total number of training epochs
        "channels": chs,  # Channels used from the data, based on the current combination
    }
    # Call the training function, passing in the configuration and the specific fold to train on (fold 0 in this case)
    train(config, 0)


# Evaluation
According to the original paper, the evaluation was conducted using AUROC and F1 scores. The original author's results from both the CHAT and NCH datasets were found in Table 3. In our project, we intended to replicate these results as close as possible, using the same evaluation approaches.

## Metrics descriptions
<table>
    <tr>
        <th>Metric</th>
        <th>Description</th>
    </tr>
    <tr>
        <td>AUROC (Area under the receiver operating characteristic curve)</td>
        <td> The capability of discerning different classes. The ideal AUROC score is 1.0.</td>
    </tr>
    <tr>
        <td>F1 Score</td>
        <td> The average of precision and recall. The ideal F1 score is 1.0.</td>
    </tr>
</table>


## Evaluation Code

NOTE: this is a code snippet, we ran all the code locally and the notebooks with embedded results are all checked into our GitHub repository.

The notebook with the results from the Hybrid Transformer model (Hu et al., 2022) is found [here](https://github.com/dlh-team-4/t4/blob/main/Trainer_Evaluator.ipynb).

The notebook with the results from the Multi-Modal Transformer model (Fayyaz et al., 2023) is found [here](https://github.com/dlh-team-4/t4/blob/main/Trainer_Evaluator_Transformer.ipynb).

In [None]:
# Generic Evaluation Function

class Result:
    def __init__(self):
        # Initialize lists to hold metric values across tests
        self.accuracy_list = []
        self.sensitivity_list = []
        self.specificity_list = []
        self.f1_list = []
        self.auroc_list = []
        self.auprc_list = []
        self.precision_list = []

    def add(self, y_test, y_predict, y_score):
        # Calculate confusion matrix elements
        C = confusion_matrix(y_test, y_predict, labels=(1, 0))
        TP, TN, FP, FN = C[0, 0], C[1, 1], C[1, 0], C[0, 1]

        # Calculate and append performance metrics
        acc = (TP + TN) / (TP + TN + FP + FN)
        sn = TP / (TP + FN)
        sp = TN / (TN + FP)
        pr = TP / (TP + FP)
        f1 = f1_score(y_test, y_predict)
        auc = roc_auc_score(y_test, y_score)
        auprc = average_precision_score(y_test, y_score)

        # Multiply by 100 for percentage format
        self.accuracy_list.append(acc * 100)
        self.precision_list.append(pr * 100)
        self.sensitivity_list.append(sn * 100)
        self.specificity_list.append(sp * 100)
        self.f1_list.append(f1 * 100)
        self.auroc_list.append(auc * 100)
        self.auprc_list.append(auprc * 100)

    def get(self):
        # Generate a formatted string of results for output
        out_str = "=========================================================================== \n"
        # Adds each metric's list to the output string
        metrics = ["accuracy", "precision", "sensitivity", "specificity", "f1", "auroc", "auprc"]
        for metric, values in zip(metrics, [
            self.accuracy_list, self.precision_list, self.sensitivity_list,
            self.specificity_list, self.f1_list, self.auroc_list, self.auprc_list]):
            out_str += f"{metric.upper()}: {values} \n"
            out_str += f"Mean {metric.capitalize()}: {np.mean(values):.2f} ± {np.std(values):.3f} \n"

        return out_str

    def print(self):
        # Print formatted results
        print(self.get())

    def visualize(self):
        # Visualization of metrics using matplotlib
        metrics = {
            'Accuracy': self.accuracy_list,
            'Precision': self.precision_list,
            'Recall (Sensitivity)': self.sensitivity_list,
            'Specificity': self.specificity_list,
            'F1 Score': self.f1_list,
            'AUROC': self.auroc_list,
            'AUPRC': self.auprc_list
        }
        fig, axes = plt.subplots(nrows=4, ncols=2, figsize=(14, 22))
        axes = axes.flatten()
        for ax, (metric_name, values) in zip(axes, metrics.items()):
            ax.plot(values, label=f'{metric_name}', marker='o', linestyle='-')
            ax.set_title(metric_name)
            ax.legend(loc='best')
        plt.tight_layout()
        plt.show()

    def save(self, path, config):
        # Save results to a file
        os.makedirs(os.path.dirname(path), exist_ok=True)
        with open(path, "w+") as file:
            file.write(str(config))
            file.write("\n")
            file.write(self.get())

# ...

# Define a sigmoid function which is commonly used as an activation function in neural networks
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Define a function to test the model with a given configuration, possibly across specific folds of data
def test(config, fold=None):
    # Initialize lists to store data for each fold
    x = [None] * FOLD
    y = [None] * FOLD
    # Create a data generator to fetch batches of data
    data_gen = data_generator(config["data_path"])

    # Iterate over each fold to retrieve and preprocess the data
    for i in range(FOLD):
        x[i], y[i] = next(data_gen)  # Load next batch of data
        x[i], y[i] = shuffle(x[i], y[i])  # Shuffle the data to ensure random distribution
        x[i] = np.nan_to_num(x[i], nan=-1)  # Convert NaN values to -1 in the dataset
        y[i] = np.where(y[i] >= THRESHOLD, 1, 0)  # Binarize the output based on a predefined threshold
        x[i] = x[i][:, :, config["channels"]]  # Select specific channels as defined in the configuration

    # Clean up resources and run garbage collection to free memory
    del data_gen
    gc.collect()

    # Initialize a result object to store performance metrics
    result = Result()
    # Determine the range of folds to test based on the input parameter 'fold'
    folds = range(FOLD) if fold is None else [fold]

    # Iterate through the specified folds to perform testing
    for fold in folds:
        x_test = x[fold]  # Select the test data for the current fold
        if config.get("test_noise_snr"):
            # Optionally add noise to the data for robustness testing
            x_test = add_noise_to_data(x_test, config["test_noise_snr"])

        y_test = y[fold]  # Select the test labels for the current fold

        # Load the model specific to the current fold
        model = tf.keras.models.load_model(config["model_path"] + str(fold), compile=False)

        # Predict outcomes using the model
        predict = model.predict(x_test)
        y_score = predict  # Probabilities of predictions
        # Generate binary predictions from the probabilities
        y_predict = np.where(predict > 0.5, 1, 0)

        # Add the results of the current test to the result object
        result.add(y_test, y_predict, y_score)

    # Output the results to the console
    result.print()
    # Visualize the results using plots
    result.visualize()
    # Save the results to a file
    result.save("./results/" + config["model_name"] + ".txt", config)

    # Clean up variables to free memory
    del x_test, y_test, model, predict, y_score, y_predict


In [None]:
# CHAT Evaluation

# Dictionary mapping signal types to specific channel indices as used in data processing
sig_dict_chat = {
    "EOG": [0, 1],
    "EEG": [4, 5],
    "ECG": [15, 16],
    "Resp": [9, 10],
    "SPO2": [13],
    "CO2": [14],
}

# List defining specific combinations of channels to be used for different model configurations
channel_list_chat = [
    ["ECG", "SPO2"],
]

# Iterating through each channel combination to set up and evaluate the model
for ch in channel_list_chat:
    chs = []  # List to store channel indices for the current configuration
    chstr = ""  # String to concatenate channel names for display and model identification
    for name in ch:
        chstr += name  # Append the name of each channel to the string
        chs += sig_dict_chat[name]  # Extend the list of channel indices by fetching from the dictionary

    # Output the combination of channels being used for the current model configuration
    print(chstr, chs)

    # Configuration dictionary specifying model settings and paths
    config = {
        "data_path": "D:\\chat_128_FREQ_45\\",
        "model_path": "C:\\Users\\William\\Documents\\Development\\Python projects\\Deeplearning healthcare\\Pediatric-Apnea-Detection\\chat_Transformer_model.pt",
        "model_name": "Transformer_chat_" + chstr,
        "regression": False,
        "transformer_layers": 5,
        "drop_out_rate": 0.25,
        "num_patches": 30,
        "transformer_units": 32,
        "regularization_weight": 0.001,
        "num_heads": 4,
        "epochs": 200,
        "channels": chs,
    }

    # Call the test function passing the configuration and specifying fold '0' (this could be any or all folds)
    test(config, 0)


In [None]:
# NCH Evaluation

# Define a dictionary mapping signal types to specific channel indices
sig_dict_nch = {
    "EOG": [0, 1],
    "EEG": [2, 3],
    "RESP": [5, 6],
    "SPO2": [9],
    "CO2": [10],
    "ECG": [11, 12],
    "DEMO": [13],
}

# List of signal combinations to evaluate with the model
channel_list_nch = [
    ["ECG", "SPO2"],
]

# Iterate through each channel combination to set up and train models
for ch in channel_list_nch:
    chs = []  # List to hold indices for selected channels
    chstr = ""  # String to hold concatenated names for model identification
    for name in ch:
        chstr += name  # Append channel name to identifier string
        chs += sig_dict_nch[name]  # Append channel indices to list

    # Display the selected channels and their corresponding indices
    print(chstr, chs)

    # Configuration dictionary for model training
    config = {
        "data_path": "D:\\nch_128_FREQ_45\\",  # Data directory
        "model_path": "C:\\Users\\William\\Documents\\Development\\Python projects\\Deeplearning healthcare\\Pediatric-Apnea-Detection\\nch_Transformer_model.pt",  # Model save path
        "model_name": "Transformer_nch_" + chstr,  # Model name including channel info
        "regression": False,  # Task type (classification here)

        # Model parameters
        "transformer_layers": 5,
        "drop_out_rate": 0.25,
        "num_patches": 30,
        "transformer_units": 32,
        "regularization_weight": 0.001,
        "num_heads": 4,
        "epochs": 200,
        "channels": chs,
    }

    # Initiate evaluation with the current configuration
    test(config, 0)


# Results
We ran the training with different configurations:
1. Using the Hybrid Transformer Model (Hu et al., 2022)
2. Using the Multi-modal Transformer Model (Fayyaz et al., 2023) - Complete run, but with preprocessing errors for the NCH dataset
3. Rerun of the NCH dataset using the Multi-modal Transformer Model (Fayyaz et al., 2023) - Partial run, with fixes to the preprocessing errors for the NCH dataset.

## Table of Results

<table>
    <tr>
        <th>Model</th>
        <th>Dataset</th>
        <th>Accuracy</th>
        <th>Precision</th>
        <th>Recall</th>
        <th>Specifity</th>
        <th>F1</th>
        <th>AUROC</th>
        <th>AUPRC</th>
    </tr>
    <tr>
        <td rowspan="2">Hybrid Transformer Model</td>
        <td>CHAT</td>
        <td>75.32</td>
        <td>71.98</td>
        <td>83.28</td>
        <td>67.28</td>
        <td>77.22</td>
        <td>84.25</td>
        <td>83.71</td>
    </tr>
    <tr>
        <td> NCH<br>(preprocessing <br>not fixed)</td>
        <td>76.08</td>
        <td>72.92</td>
        <td>85.37</td>
        <td>66.17</td>
        <td>78.65</td>
        <td>85.50</td>
        <td>84.13</td>
    </tr>
    <tr>
        <td rowspan="2">Multi-Modal Transformer Model</td>
        <td>CHAT</td>
        <td>83.70</td>
        <td>83.89</td>
        <td>83.60</td>
        <td>83.80</td>
        <td>83.75</td>
        <td>90.58</td>
        <td>89.58</td>
    </tr>
    <tr>
        <td> NCH<br>(preprocessing <br>not fixed)</td>
        <td>75.36</td>
        <td>73.89</td>
        <td>80.84</td>
        <td>69.52</td>
        <td>77.20</td>
        <td>81.56</td>
        <td>77.59</td>
    </tr>
</table>


## Claims

With our F1 Score for the Multi-Modal Transformer model being 0.8 and values very close to the original paper, it shows that ECG signals can be used to detect Apnea/Hypopnea in children. However, because we have a preprocessing error for NCH, we cannot say the same for SPO2 because CHAT signal data does not include it.

## Ablation Study
The original code base for NCH uses an AHI file (.csv) computed from the signal to identify the presence of apnea/hypopnea. Since the computation code for it was not checked in with the codebase, we modified our approach for training the NCH dataset to use the ECG in labeling the signal data, a similar approach used in preprocessing the CHAT dataset.

We were originally going to experiment between the different frequency values used in the configuration but because running even a signle epoch is time consuming, we were not able to execute this plan.


## Model comparison

The original paper compares the model's results with four different models, each presented in a separate research paper, with an idential goal of detecting sleep apnea. Chang et al. proposed a CNN approach to sleep apnea detection. Chen et al. created a fusion network combines with various CNNs. Zarei et al. approached the task using a CNN with an LSTM (Long short-term memory). Hu et al. implemented a hybrid transformer approach.

In reference to the CHAT dataset, the original multi-modal transformer model resulted in an F1 score of 83.9 and an AUROC of 90.6, which was higher than the previous four approaches presented. In reference to the NCH dataset, the original multi-modal transformer model resulted in an F1 score of 82.9 and an AUROC of 90.7, which was also higher than the previous four approaches presented.

Our results for CHAT yielded an F1 score of 83.75 and an AUROC of 90.58, which was very close to the values reported by the original paper, but for NCH, it was off by a wide margin, with an F1 score of 77.2 and an AUROC of 81.56.

<table border="1">
    <tr>
        <th>Method</th>
        <th>Total params</th>
        <th colspan="2">CHAT</th>
        <th colspan="2">NCH Data Bank</th>
    </tr>
    <tr>
        <td></td>
        <td></td>
        <td>F1</td>
        <td>AUROC</td>
        <td>F1</td>
        <td>AUROC</td>
    </tr>
    <tr>
        <td>CNN (Chang et al., 2020)</td>
        <td></td>
        <td>77.5 (0.8)</td>
        <td>86.8 (1.0)</td>
        <td>77.2 (1.1)</td>
        <td>86.4 (1.2)</td>
    </tr>
    <tr>
        <td>SE-MSCNN (Chen et al., 2022)</td>
        <td></td>
        <td>73.9 (2.1)</td>
        <td>82.9 (1.8)</td>
        <td>73.0 (2.4)</td>
        <td>82.2 (1.9)</td>
    </tr>
    <tr>
        <td>CNN+LSTM (Zarei et al., 2022)</td>
        <td></td>
        <td>81.7 (0.9)</td>
        <td>89.7 (0.7)</td>
        <td>81.7 (0.8)</td>
        <td>89.4 (0.6)</td>
    </tr>
    <tr>
        <td>Hybrid Transformer (Hu et al., 2022)</td>
        <td>454,820</td>
        <td>81.3 (1.0)</td>
        <td>89.6 (0.5)</td>
        <td>81.0 (0.9)</td>
        <td>89.4 (0.7)</td>
    </tr>
    <tr>
        <td>Multi-modal Transformer (Fayyaz et al., 2023)</td>
        <td>150,369</td>
        <td>83.1 (1.0)</td>
        <td>90.0 (0.8)</td>
        <td>82.6 (0.5)</td>
        <td>90.4 (0.4)</td>
    </tr>
    <tr>
        <td>Multi-modal Transformer (Replication)</td>
        <td>150,369</td>
        <td>83.75 (0.8)</td>
        <td>90.58</td>
        <td>77.20 (0.8)</td>
        <td>81.56</td>
    </tr>
</table>


# Discussion

## Assessment and next steps
Our results had two trials. The first one is testing the model against the CHAT dataset, and for this, we were able to achieve a slightly better result but very close to the original. So for this, we assess that it is reproducible. As for the second trial, where we tested the model against the NCH dataset, and our results were off by a wide margin (see model comparison section). There was an error in preprocessing that we had to fix very close to the deadline and we did not have enough time to rerun against the whole NCH dataset. This should be the immediate next step for us.

## Factors for Success during Replication

These following are the factors that made it easier for us to replicate the study:

- **Quick Dataset Access Approval**: Unlike the other dataset providers we have asked before the paper selection who made it clear that access grants take several months to get approved, NSRR approved our access request within 2 weeks of requesting. It was also helpful the UIUC had HIPAA trainings that were readily available because it was one of the prerequisites of NSRR.
- **Availability of the Original Codebase**: The original code was clearly written and the intent was apparent. It was a good starting point for us, even though it did not work right out of the box.
- **GPU Availability**: one of our members had a performant GPU which made our preprocessing and training code run faster and with the complete CHAT and NCHDB dataset from NSRR.

## Challenges

We encountered several challenges as data formats, software libraries, and data structures evolve over time. Here's a discussion of the difficulties we faced:

- **Dataset Size**: The signal files we had to parse were huge. The .edf files were around half a gigabyte in size each. This also meant that the code may run really slowly unless more performant hardware is used. Also, we initially set 200 epochs, the actual run only reached 78 epochs for CHAT and 54 epochs for NCH because of memory limitations.
- **Processing Bottleneck**: Since only one teammate had a GPU which can store and process the whole NCH and CHAT dataset, it was a bottleneck in producing valid results. We had some findings that we had to fix a bug in preprocessing very close to the submission day and we did not have enough time to run it properly due to this bottleneck.
- **Changes in File Formats**: The file format used for storing data has changed (e.g., from .tsv to .xml), so we had to change the way data is parsed and processed.
- **Missing Data or Code**: Our main difficulty with NCH is that it the AHI file which was referenced in the original codebase was not available to us, or if it was programmatically computed from the EDF files, the script was not checked in. This led us to change our approach very close to the deadline.
- **Altered Field Names**: Changes in the names of fields within datasets require a thorough examination of how data is now structured versus how it was in the original study.
- **Lack of Documentation**: We had to infer how the data in the CHAT dataset is mapped to the data in NCH because it was not explicitly mentioned anywhere in the NSRR documentation.
- **Modification in Tokens**: In data processing, especially in text or event data, specific tokens or markers seem to have changed (e.g., "Wake" to "Wake|0"). We had to understand how the tokens were used in the context of the analysis and update the scripts.
- **Updates in Libraries and Functions**: Software libraries evolve, and functions may become deprecated, moved, or removed in newer versions. We had to update a lot of imports and in some cases, downgrade the versions of the libraries we used.

## Suggestions for Improvement

Here are some of our suggestions to the authors of the study or to other reproducers on how to improve the reproducibility:
- **Regular Codebase Maintenance**: Ensuring that the scripts checked into the repository are at the very least consistent in terms of dimensions. e.g. Preprocessing code produces (3840, 3) dimension but the training code expects (1920, 3).
- **Documentation Quality Improvements**: The codebase includes some annotations, but might be useful to elaborate in terms of intent. It would also be optimal to include the actual versions used in running the code to prevent conflicts between libraries.
- **Including Pre-trained Models in the Repository**: Because of the intricate process needed for obtaining data, building the model and training it, it would have been beneficial if a pre-trained model is included in the GitHub repository.
- **Better Processing Hardware**: Debugging results that are far from the orignal would be faster if performant processing hardware



# References

1. Fayyaz H, Strang A, Beheshti R. Bringing At-home Pediatric Sleep Apnea Testing Closer to Reality: A Multi-modal Transformer Approach. Proc Mach Learn Res. 2023 Aug;219:167-185. PMID: 38344396; PMCID: PMC10854997.

2. Lee H, Li B, DeForte S, Splaingard ML, Huang Y, Chi Y, Linwood SL. A large collection of real-world pediatric sleep studies. Sci Data. 2022 Jul 19;9(1):421. doi: 10.1038/s41597-022-01545-6. PMID: 35853958; PMCID: PMC9296671.

3. Marcus CL, Moore RH, Rosen CL, Giordani B, Garetz SL, Taylor HG, Mitchell RB, Amin R, Katz ES, Arens R, Paruthi S, Muzumdar H, Gozal D, Thomas NH, Ware J, Beebe D, Snyder K, Elden L, Sprecher RC, Willging P, Jones D, Bent JP, Hoban T, Chervin RD, Ellenberg SS, Redline S; Childhood Adenotonsillectomy Trial (CHAT). A randomized trial of adenotonsillectomy for childhood sleep apnea. N Engl J Med. 2013 Jun 20;368(25):2366-76. doi: 10.1056/NEJMoa1215881. Epub 2013 May 21. PMID: 23692173; PMCID: PMC3756808.

4. Zhang GQ, Cui L, Mueller R, Tao S, Kim M, Rueschman M, Mariani S, Mobley D, Redline S. The National Sleep Research Resource: towards a sleep data commons. J Am Med Inform Assoc. 2018 Oct 1;25(10):1351-1358. doi: 10.1093/jamia/ocy064. PMID: 29860441; PMCID: PMC6188513.

5. Chen Xianhui, Chen Ying, Ma Wenjun, Fan Xiaomao, and Li Ye. Toward sleep apnea detection with lightweight multi-scaled fusion network. Knowledge-Based Systems, 247: 108783, 2022.

6. Zarei Asghar, Beheshti Hossein, and Asl Babak Mohammadzadeh. Detection of sleep apnea using deep neural networks and single-lead ecg signals. Biomedical Signal Processing and Control, 71:103125, 2022.

7. Hu Shuaicong, Cai Wenjie, Gao Tijie, and Wang Mingjie. A hybrid transformer model for obstructive sleep apnea detection based on self-attention mechanism using single-lead ecg. IEEE Transactions on Instrumentation and Measurement, 71:1-11, 2022.

8. Chang HY, Yeh CY, Lee CT, Lin CC. A Sleep Apnea Detection System Based on a One-Dimensional Deep Convolution Neural Network Model Using Single-Lead Electrocardiogram. Sensors (Basel). 2020 Jul 26;20(15):4157. doi: 10.3390/s20154157. PMID: 32722630; PMCID: PMC7435835.