In [30]:
import glob
import numpy as np
import scipy as sp
import scipy.io
import scipy.signal

def load_troika_dataset():
    data_dir = "/kaggle/input/udacity-wearable-hr-data/datasets/troika/training_data"
    data_fls = sorted(glob.glob(data_dir + "/DATA_*.mat"))
    ref_fls = sorted(glob.glob(data_dir + "/REF_*.mat"))
    return data_fls, ref_fls

def run_pulse_rate_algorithm(data_fl, ref_fl):
    # Nested functions
    def load_troika_data_file(data_fl):
        data = sp.io.loadmat(data_fl)['sig']
        return data[2:]
    
    def bandpass_filter(signal, pass_band, fs=125):
        b, a = sp.signal.butter(3, pass_band, btype='bandpass', fs=fs)
        return sp.signal.filtfilt(b, a, signal)
    
    def calculate_fft(signal, fs):
        freqs = np.fft.rfftfreq(2 * len(signal), 1/fs)
        fft_magnitudes = np.abs(np.fft.rfft(signal, 2*len(signal)))
        return freqs, fft_magnitudes
    
    def weighted_frequency(freqs, fft_magnitudes, pass_band):
        fft_magnitudes[(freqs <= pass_band[0]) | (freqs >= pass_band[1])] = 0
        return np.dot(freqs, fft_magnitudes) / np.sum(fft_magnitudes)
    
    fs = 125
    pass_band = (0.66, 4.0)  # Frequency band in Hz
    
    ppg, accx, accy, accz = load_troika_data_file(data_fl)
    ref_bpm = sp.io.loadmat(ref_fl)['BPM0'].flatten()
    
    # Bandpass filtering
    ppg = bandpass_filter(ppg, pass_band)
    accx = bandpass_filter(accx, pass_band)
    accy = bandpass_filter(accy, pass_band)
    accz = bandpass_filter(accz, pass_band)
    
    acc_mag = np.sqrt(accx**2 + accy**2 + accz**2)
    
    errs, confs = [], []
    est_bpm_prev = []
    for i in range(0, len(ppg) - fs*8, fs*2):
        ppg_segment = ppg[i:i + fs*8]
        acc_segment = acc_mag[i:i + fs*8]

        freqs, fft_magnitudes_ppg = calculate_fft(ppg_segment, fs)
        _, fft_magnitudes_acc = calculate_fft(acc_segment, fs)

        ppg_weighted_freq = weighted_frequency(freqs, fft_magnitudes_ppg, pass_band)
        acc_weighted_freq = weighted_frequency(freqs, fft_magnitudes_acc, pass_band)

        if np.abs(ppg_weighted_freq - acc_weighted_freq) <= 0.2 and len(est_bpm_prev) > 0:
            est_bpm = 0.8 * est_bpm_prev[-1] + 0.2 * (ppg_weighted_freq * 60)
        else:
            est_bpm = ppg_weighted_freq * 60

        est_bpm_prev.append(est_bpm)
        ref_val = ref_bpm[int(i/(fs*2))]
        errs.append(np.abs(est_bpm - ref_val))
        
        spectral_energy = np.sum(fft_magnitudes_ppg**2)
        fundamental_frequency_energy = np.sum(fft_magnitudes_ppg[(freqs >= ppg_weighted_freq - 0.2) & (freqs <= ppg_weighted_freq + 0.2)]**2)
        confs.append(fundamental_frequency_energy / spectral_energy)
        
    return np.array(errs), np.array(confs)

def aggregate_error_metric(pr_errors, confidence_est):
    conf_threshold = np.percentile(confidence_est, 90)
    return np.mean(pr_errors[confidence_est >= conf_threshold])

def evaluate():
    data_fls, ref_fls = load_troika_dataset()
    
    errs, confs = [], []
    for data_fl, ref_fl in zip(data_fls, ref_fls):
        errors, confidence = run_pulse_rate_algorithm(data_fl, ref_fl)
        errs.append(errors)
        confs.append(confidence)
    
    errs = np.hstack(errs)
    confs = np.hstack(confs)
    mae = aggregate_error_metric(errs, confs)

    print(f"Mean Absolute Error (MAE): {mae:.2f}")


In [31]:
evaluate()

Mean Absolute Error (MAE): 10.59


In [None]:
#

# Project Write up

## Code Desrciption
The provided Python code aims to estimate the pulse rate from a Photoplethysmogram (PPG) signal and 3-axis accelerometer signals. The algorithm is evaluated on the TROIKA dataset, which is loaded and parsed from .mat files. The essential steps encompass loading data, filtering signals, and using Fast Fourier Transform (FFT) to analyze the frequency domain of the signals. The algorithm also uses a confidence metric to assess reliability in the pulse rate estimates. The code is concluded with an evaluation function that explores various passband parameters to find the optimal ones, minimizing Mean Absolute Error (MAE) when estimating the pulse rate.

How to Run the Code
Ensure you have numpy, scipy, and glob libraries installed.
The function load_troika_dataset() points to a specific data directory. Ensure that you have the TROIKA dataset stored in the same path or modify the path accordingly.
Execute the code. Note that your output may vary if the dataset is different or has been modified.

## Algorithm Description
How the Algorithm Works

Data Loading: Loads PPG and accelerometer data.

Bandpass Filtering: Applies bandpass filtering to isolate the frequency range associated with human pulse rates (the actual filter bounds may vary).

FFT Calculation: Computes the FFT on sliding windows of the PPG signal and accelerometer magnitude.

Pulse Rate Calculation: Estimates the pulse rate by identifying the predominant frequency in the PPG signal that is not an artifact (using accelerometer data).

Confidence Calculation: Computes a confidence metric using the energy around the pulse rate frequency divided by total energy in the window.

Error Computation: Calculates the error between estimated and reference pulse rates and utilizes the confidence metric to obtain a weighted error.

Parameter Tuning: During evaluation, different passband parameters are iterated over to find the ones that minimize MAE

### Physiological Basis

PPG Signal: Reflects changes in blood volume in the vasculature, which is modulated by heartbeats.

Accelerometer Signal: Helps identify and mitigate motion artifacts in the PPG signal by accounting for frequencies caused by physical activities.

### Algorithm Outputs

Pulse Rate Estimates: A series of estimated pulse rate values over the windowed signal.

Confidence Values: Corresponding confidence values associated with each pulse rate estimate.

### Caveats on Algorithm Outputs
Reliability of the estimates may decrease with higher motion intensity or irregular heart rhythms.

The algorithm presumes the absence of arrhythmic events, thus it may not perform well on pathological data.

### Common Failure Modes
Motion Artifacts: When the PPG frequency coincides with the frequency of motion, leading to over/underestimation of the pulse rate.

Noisy Data: Excessive noise or poor signal quality may distort the frequency representation, reducing accuracy.

Non-uniform Heartbeats: The algorithm assumes a consistent and dominant heartbeat frequency, which may not always hold true.

## Algorithm Performance

### Performance Computation

Metric: The primary metric for algorithm performance is the Mean Absolute Error (MAE) between estimated pulse rates and reference values, which was minimized during parameter tuning.

Validation Approach: The given code does not explicitly implement cross-validation or a train/test split. However, it is advisable to use one of these techniques to ensure robust validation.

Confidence-Weighted Error: Errors are computed in a way that prioritizes estimates with higher confidence, by excluding lower confidence estimates via a percentile threshold (90th percentile in the given code).

### Error Metrics

Mean Absolute Error (MAE): Represents the average absolute difference between estimated and reference pulse rates, providing a straightforward measure of prediction accuracy.

### Caveat on Performance

Dataset Sensitivity: The performance and optimal parameters might be tightly bound to the specific characteristics of the used dataset (TROIKA). Applying the algorithm to other datasets or real-world scenarios might necessitate recalibration or further tuning of the algorithm.

Ground Truth Reliability: Assumes that the reference pulse rates are accurate and reliable, which might not always be the case in different datasets or situations.

## Additional Notes

Although the provided code and descriptions cover substantial aspects of the pulse rate algorithm, further improvements and detailed documentation might enhance its applicability and usability across various scenarios and user groups. Always consider ethical aspects, user privacy, and data security when dealing with health-related data and algorithms.


