# 1. Preprocessing

From the Dataset page: https://physionet.org/content/pulse-transit-time-ppg/1.1.0/

It should be noted that the sensor data is unfiltered to retain all timing information and to allow direct experimentation, whereas most published PPG data is filtered. To convert the data, we found that **removing the DC component by subtracting the output of a centered mean rolling gaussian window and adding a 0.75-5Hz bandpass** provided results similar to filtered datasets.

---

#### 1. What’s the DC component in PPG?

* PPG is made up of:

  * **DC (slow-varying baseline):** due to tissue absorption, venous blood, skin, etc. Changes very slowly (seconds–minutes scale).
  * **AC (pulsatile component):** due to arterial blood volume changes with each heartbeat (\~1 Hz at 60 bpm). This is the part we want for HR and SpO₂.

So “DC removal” = separating the **slow baseline** from the **fast pulsations**.

---

#### 2. Why `window = 250` (≈0.5s)?

* Sampling frequency `fs = 500 Hz` → 500 samples per second.
* A `window = 250` samples corresponds to **0.5 seconds of signal**.

Why 0.5s?

* Heartbeat is typically 0.6–1.0s (60–100 bpm).
* By choosing a smoothing window smaller than a heartbeat (0.5s), the filter captures **slow variations (baseline drift)** without erasing the main pulsatile wave.
* If the window were too large (say 2s), the baseline would follow the heartbeat too much → not true DC removal.
* If too small (say 50 ms), it would eat into the AC component → distorting the peaks.

---

#### 3. Why Gaussian filter here?

* `gaussian_filter1d` with `sigma=window` smooths the signal, giving a “local average” baseline.
* Subtracting this smoothed signal removes low-frequency baseline wander, leaving pulsations.

---

#### 4. Intuition

* A rolling 0.5s baseline ≈ “average level of the signal around the current point.”
* Subtracting it removes slow tissue absorption trends and motion drift, but keeps beat-to-beat pulsations intact.

---

**In short: 0.5s was chosen because it’s short enough not to wipe out the heartbeat waveform, but long enough to capture and remove the slow baseline drift.**

---

#### 5. Why **order = 3** in `bandpass_filter`?

* **Filter order** = steepness of the transition band (how fast it attenuates outside 0.75–5 Hz).
* Higher order = steeper cutoff but also:

  * More computation
  * Greater risk of **phase distortion** (shifting peaks in time).

For PPG:

* HR is \~0.75–3 Hz (45–180 bpm).
* We just need to suppress drift (<0.5 Hz) and noise (>5 Hz).
* **Order=3 (3rd order Butterworth)** = balance:

  * Enough steepness to reject baseline drift + motion noise.
  * Low enough to keep waveform shape stable.

If you go too high (say order 8–10), the filter can ring and distort morphology.

---

#### 6. Why **filtfilt**?

* Normal filtering (e.g., `lfilter`) introduces **phase shift** (waveform is shifted in time). That’s a disaster if you’re detecting peaks (systolic/diastolic).
* **`filtfilt`** = forward + backward filtering:

  * Runs the signal through the filter forward, then again backward.
  * Cancels out the phase shift → output is **zero-phase filtered**.
  * Preserves the true timing of peaks and notches.

---

* **Butterworth** → flat, smooth passband, good for biomedical signals.

---


#### 7. Why consider only `pleth_1` and `pleth_2` for SpO2 estimation?

SpO2 is estimated from a photoplethysmograph (PPG) signal by comparing the absorption of **red** and **infrared** light by the blood, as oxygenated and deoxygenated hemoglobin absorb these wavelengths differently. This comparison is used to calculate the ratio of AC (pulsatile) to DC (non-pulsatile) components of the PPG signal from both light wavelengths, which is then used in an empirical equation to determine the SpO2 level.

`pleth_1` & `pleth_2` (distal, fingertip):
* Usually stronger pulsatile signals (more blood perfusion at fingertip).
* Commonly used in commercial pulse oximeters.
* Better for clean SpO₂ estimation.

`pleth_4` & `pleth_5` (proximal, finger base):
* Signals can be weaker or noisier because perfusion is lower compared to fingertip.
* Still useful for redundancy, or if fingertip signals are noisy.
* Could be used to test robustness of the model across sensor placements.

---
---

# 2. Feature Extraction

#### Physiological context: PPG waveform

The **photoplethysmography (PPG) waveform** reflects blood volume changes in the microvascular bed of tissue (often finger/ear sensors).
A single cardiac cycle produces a characteristic shape:

* **Systolic peak (main peak):**
  The tallest peak in the waveform. It corresponds to the maximum blood volume during **systole** (when the heart contracts and pumps blood into the arteries).

    `systolic_peaks, _ = find_peaks(ppg, distance=0.5*fs, prominence=0.5) `-> enforced minimum 0.5s window between peaks as `fs=500Hz=1s`

    **Q. Why 0.5s?**

    It’s based on physiological constraints:
    
    Normal human heart rate is 60–100 bpm → each beat is 0.6–1.0 seconds apart.
    
    Even at high HR (e.g. 120 bpm), the R–R interval (distance between two peaks) is ~0.5s.
    
    By setting `distance=0.5s`, we ensure we don’t detect multiple systolic peaks within one beat.

---

* **Dicrotic notch:**
  A small downward deflection after the systolic peak. It’s associated with **aortic valve closure** and the beginning of diastole.

---

* **Diastolic peak (secondary peak):**
  A smaller “second bump” after the dicrotic notch, representing reflective waves in the arterial system during **diastole**.

  A physiologic diastolic peak usually appears about **0.1–0.4 seconds after the notch** (roughly 20–40% of the RR interval). For a candidate to be accepted, it’s sensible to require it to be **at least ~10–15% of the RR before the next systolic peak**.

---
So each cycle looks like:
Rise → **Systolic peak** → Dip (**Dicrotic notch**) → Small **Diastolic peak** (local maxima) -> Next **Systolic Peak**.

---

* **Heart rate:**
  Heart rate is derived from systolic-to-systolic spacing.
  
  Compute distance between consecutive systolic peaks -> divide by fs and convert to seconds (RR interval = time between beats)
  
  HR = 60 / mean(RR interval) → beats per minute.
