# ECG Signal Preprocessing

This notebook performs preprocessing of ECG signals from the MIT-BIH Arrhythmia Database, including signal filtering, R-peak detection, and beat segmentation.

Short note: install the `wfdb` package (needed only once in the environment) to read MIT-BIH records.

In [2]:
!pip install wfdb



### Setup
Installing and importing required libraries for ECG signal processing

Short note: import signal processing, plotting, and utility libraries used throughout preprocessing.

In [3]:
import wfdb
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import butter, filtfilt
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from tqdm import tqdm
import os

Short note: download the MIT-BIH database locally using wfdb (skip if already present in `data/`).

In [5]:
print("Downloading MIT-BIH Arrhythmia Database...")
download_path = 'data/mit-bih-arrhythmia-database-1.0.0' 
wfdb.dl_database('mitdb', dl_dir=download_path)
print(f"Download complete! Files saved to: {download_path}")

Downloading MIT-BIH Arrhythmia Database...
Generating record list for: 100
Generating record list for: 101
Generating record list for: 102
Generating record list for: 103
Generating record list for: 104
Generating record list for: 105
Generating record list for: 106
Generating record list for: 107
Generating record list for: 108
Generating record list for: 109
Generating record list for: 111
Generating record list for: 112
Generating record list for: 113
Generating record list for: 114
Generating record list for: 115
Generating record list for: 116
Generating record list for: 117
Generating record list for: 118
Generating record list for: 119
Generating record list for: 121
Generating record list for: 122
Generating record list for: 123
Generating record list for: 124
Generating record list for: 200
Generating record list for: 201
Generating record list for: 202
Generating record list for: 203
Generating record list for: 205
Generating record list for: 207
Generating record list for: 2

### Data Download
Downloading the MIT-BIH Arrhythmia Database, a standard dataset for ECG analysis

Short note: define bandpass filter utilities for denoising ECG signals (0.5–40 Hz typical).

In [6]:
def butter_bandpass(lowcut, highcut, fs, order=4):
    nyq = 0.5 * fs
    low = lowcut / nyq
    high = highcut / nyq
    b, a = butter(order, [low, high], btype='band')
    return b, a

def bandpass_filter(data, lowcut=0.5, highcut=40, fs=360, order=4):
    b, a = butter_bandpass(lowcut, highcut, fs, order=order)
    y = filtfilt(b, a, data)
    return y

### Signal Filtering Functions
Implementing bandpass filter for noise removal from ECG signals (0.5-40 Hz)

Short note: map raw annotation symbols into grouped beat categories used for modeling.

In [7]:
def label_group(label):
    """Group beat types into main categories"""
    if label in ['N', 'L', 'R', 'e', 'j']:
        return 'N'  # Normal
    elif label in ['A', 'a', 'J', 'S']:
        return 'S'  # Supraventricular
    elif label in ['V', 'E']:
        return 'V'  # Ventricular
    elif label in ['F']:
        return 'F'  # Fusion
    else:
        return 'Q'  # Other/Unknown

Short note: segment beats around annotated R-peaks and normalize each beat (z-score).

In [None]:
# Segmentation parameters
window_size = 250  # total length
pre, post = 100, 150  # samples before and after R-peak

all_X = []
all_y = []

record_files = [f.split('.')[0] for f in os.listdir(download_path) if f.endswith('.dat')]
print(f"\nFound {len(record_files)} records to process")

for rec_name in tqdm(sorted(record_files), desc="Processing records"):
    try:
        record = wfdb.rdrecord(os.path.join(download_path, rec_name))
        annotation = wfdb.rdann(os.path.join(download_path, rec_name), 'atr')

        signal = record.p_signal[:, 0]
        filtered_signal = bandpass_filter(signal, fs=record.fs)

        ann_samples = annotation.sample
        ann_symbols = annotation.symbol

        segments = []
        labels = []

        for i, r_peak in enumerate(ann_samples):
            # Skip if segment would be incomplete
            if r_peak - pre < 0 or r_peak + post >= len(filtered_signal):
                continue

            beat_segment = filtered_signal[r_peak - pre:r_peak + post]
            segments.append(beat_segment)
            labels.append(label_group(ann_symbols[i]))

        if len(segments) > 0:
            segments = np.array(segments)
            labels = np.array(labels)

            # Normalize each segment (z-score normalization)
            X = (segments - np.mean(segments, axis=1, keepdims=True)) / ",
0.000001
,
,
,
,
,
        continue