Markdown - Title and Intro

In [None]:
# 🫀 ECG Preprocessing (Day 1)

This notebook shows the preprocessing pipeline for our arrhythmia detection project.
We demonstrate:

1. Load raw ECG signal
2. Apply **band-pass filter (0.5–40 Hz)**
3. Segment into **10-second windows**
4. Plot ECG with **R-peak detection**
5. Save processed windows for model training

This ensures our models are trained on **clean, consistent ECG segments**.


Python Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.signal import butter, filtfilt, find_peaks
from scripts.make_windows import bandpass_filter, make_windows


## 1. Load ECG Signal

We start with a raw ECG file (`test_ecg.csv`) which contains a column named `ecg`.  
This simulates the MIT-BIH arrhythmia dataset format.


In [None]:
df = pd.read_csv("../test_data/test_ecg.csv")  # adjust if needed
signal = df["ecg"].values
fs = 360  # MIT-BIH default sampling rate

plt.figure(figsize=(12, 4))
plt.plot(signal[:2000])
plt.title("Raw ECG (first 2000 samples)")
plt.xlabel("Samples")
plt.ylabel("Amplitude")
plt.show()


## 2. Band-Pass Filtering (0.5–40 Hz)

To remove baseline wander (low-frequency drift) and high-frequency noise,  
we apply a **Butterworth band-pass filter** between 0.5–40 Hz.


In [None]:
filt_signal = bandpass_filter(signal, fs=fs, lowcut=0.5, highcut=40)

plt.figure(figsize=(12, 4))
plt.plot(filt_signal[:2000])
plt.title("Filtered ECG (0.5–40 Hz)")
plt.xlabel("Samples")
plt.ylabel("Amplitude")
plt.show()


## 3. Window Segmentation

We segment the ECG into **10-second windows** with a **5-second step size**  
to prepare consistent inputs for model training.


In [None]:
windows = make_windows(filt_signal, fs=fs, win_sec=10, step_sec=5)
print("Shape of windows:", windows.shape)

plt.figure(figsize=(12, 4))
plt.plot(windows[0])
plt.title("Example 10-second window")
plt.show()


## 4. R-Peak Detection (demo)

We use `scipy.signal.find_peaks` to detect R-peaks.  
This helps derive features like **RR intervals** and **HRV** (heart rate variability).


In [None]:
example = windows[0]
peaks, _ = find_peaks(example, distance=fs*0.6)  # ~60 BPM min spacing

plt.figure(figsize=(12, 4))
plt.plot(example, label="ECG")
plt.plot(peaks, example[peaks], "rx", label="R-peaks")
plt.legend()
plt.title("R-peak detection (demo)")
plt.show()


## 5. Save Processed Windows

We export the segmented, filtered windows as a CSV  
so they can be used for training and evaluation.


In [None]:
out_df = pd.DataFrame(windows)
out_df.to_csv("../data/processed/windows.csv", index=False)
print("✅ Processed windows saved to ../data/processed/windows.csv")
