### Study Correlation Plan

For the purpose of getting the HRV data, we will use the library Neurokit2 to handle the proceess to get the data short window and the full one.

### Flow of the Study

- Takes the Windowed version of the data (30 seconds, 1 minute and 2 minute)
- Calculate the HRV Metrics / Features
- Take the signal of the full length
- Take the study correlation

### HRV Metrics that we're going to use

| **Domain**     | **HRV Feature** | **Unit** | **Description**                                                                 |
|----------------|------------------|----------|----------------------------------------------------------------------------------|
| **Time**       | MeanNN           | ms       | Mean NN interval (Time it takes between each peak to peak) in milis              |
|                | SDNN             | ms       | Standard deviation of the RR intervals                                           |
|                | pNN50            | %        | NN50 count divided by the total number of all RR intervals                       |
|                | RMSSD            | ms       | Root mean square of successive RR interval differences                           |
|                | MeanHR           | bpm      | Mean heart rate                                                                  |
| **Frequency**  | LF               | ms²      | Power of low frequency band (0.04–0.15 Hz)                                       |
|                | HF               | ms²      | Power of high frequency band (0.15–0.4 Hz)                                       |
|                | LF/HF            | -        | Ratio of LF to HF                                                                |
| **Non-linear** | CSI              | -        | Cardiac sympathetic index                                                        |
|                | CVI              | -        | Cardiac vagal index                                                              |
|                | SD1              | -        | Standard deviation of Poincaré plot projection on the line perpendicular to line y=x |
|                | SD2              | -        | Standard deviation of Poincaré plot projection on the line y=x                  |


### Setup Requirements

In [1]:
# UST HRV and Normal HRV Correlation Analysis for Stress Detection
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import os
from glob import glob
import warnings
import neurokit2 as nk
warnings.filterwarnings('ignore')

# Set plot style
plt.style.use('ggplot')
sns.set(font_scale=1.2)
sns.set_style("whitegrid")

In [2]:
import scipy 

def preprocess_ppg(signal, fs = 35):
    """ Computes the Preprocessed PPG Signal, this steps include the following:
        1. Moving Average Smoothing
        2. Bandpass Filtering
        
        Parameters:
        ----------
        signal (numpy array): 
            The PPG Signal to be preprocessed
        fs (float): 
            The Sampling Frequency of the Signal
            
        Returns:
        --------
        numpy array: 
            The Preprocessed PPG Signal
    
    """ 

    # 2. Bandpass filter to isolate the cardiac component (0.4-2.5 Hz)
    b_bp, a_bp = scipy.signal.butter(3, [0.7, 2.5], btype='band', fs=fs)
    filtered = scipy.signal.filtfilt(b_bp, a_bp, signal)
    
    return filtered

# 30 Seconds Plot Correlation

For 30 seconds window, the averaging purpose will be done under windowing each short rPPG segment with the **strides** of 15 seconds (means the different between each short window is 15 seconds).

The test will be done under certain scenario of the Task 1, Task 2 UBFC, Physio Rest 2 and Rest 6

In [3]:
root_path = "UBFC-Phys"
subjects = ["s41", "s42", "s43", "s44","s45","s46","s47","s48","s49","s50","s51","s52", "s53","s54","s55","s56"]
tasks = ["T2"]

# Store ground truth and rPPG data
gt_data = {}
rppg_data = {
    'POS': {},
    'LGI': {},
    'OMIT': {},
    'GREEN': {},
    'CHROM': {}
}
# Expected sampling rates (adjust if different for your dataset)
sample_rate_gt = 64  # Hz
sample_rate_video = 35 # Hz


In [4]:
## Process for each subject and task
for subject in subjects:
    for task in tasks:
        subject_task_id = f"{subject}_{task}"

        # Load rPPG signals from different methods
        pos = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_POS_rppg.npy"))
        lgi = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_LGI_rppg.npy"))
        omit = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_OMIT_rppg.npy"))
        green = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_GREEN_rppg.npy"))
        chrom = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_CHROM_rppg.npy"))

        # Load ground truth BVP
        GT = pd.read_csv(os.path.join(root_path, subject, f"bvp_{subject}_{task}.csv")).values
        GT = GT.flatten()

        ## process rPPG signals
        rppg_data["POS"][subject_task_id] = preprocess_ppg(pos, fs=sample_rate_video)
        rppg_data["LGI"][subject_task_id] = preprocess_ppg(lgi, fs=sample_rate_video)
        rppg_data["OMIT"][subject_task_id] = preprocess_ppg(omit, fs=sample_rate_video)
        rppg_data["GREEN"][subject_task_id] = preprocess_ppg(green, fs=sample_rate_video)
        rppg_data["CHROM"][subject_task_id] = preprocess_ppg(chrom, fs=sample_rate_video)
        
        GT = preprocess_ppg(GT, fs=sample_rate_gt)
        gt_data[subject_task_id] = GT

print(f"Done Process the Signals")
    

Done Process the Signals


In [5]:
"""
Steps to reproduce getting the short term of 30 seconds for each subject + averaging:
1. Loop through each subject.
2. For each short rppg segment (30 seconds), compute the hrv metrics with the neurokit2 package and store it.
3. Average the HRV metrics across all segments for each subject.
4. Compare the correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics.
# Note: The above code is a preprocessing step. The next steps would involve calculating HRV metrics and performing correlation analysis.
""" 

## Iterate for each subject and compute HRV metrics
hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
    'PR' : [],
}

## Store the HRV metrics for each rPPG method for each subject
rppg_hrv_metrics = {
    method: {
        subject_id: {
            key: [] for key in hrv_metrics.keys()
        } for subject_id in rppg_data[method].keys()
    } for method in rppg_data.keys()
}

## Iterate through each subject and compute HRV for each segments
for rppg_method in rppg_data.keys():
    for subject_task_id, rppg_signal in rppg_data[rppg_method].items():
        print(f"Processing {subject_task_id} for {rppg_method}")

        ## Applied the window of 30 seconds with stride of 15 seconds
        segment_length = 30 * sample_rate_video
        stride_length = 15 * sample_rate_video
        
        ## Making the segments
        for start in range(0, len(rppg_signal) - segment_length + 1, stride_length):
            segment = rppg_signal[start:start + segment_length]
            ## If the segment is less than the segment length, we skip it
            if len(segment) < segment_length:
                continue

            ## Compute the HRV metrics using neurokit2
            signals, _ = nk.ppg_process(segment, sampling_rate=sample_rate_video)
            peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=sample_rate_video)

            ## Getting the HR and store it in the metrics
            rppg_hrv_metrics[rppg_method][subject_task_id]['PR'].append(signals['PPG_Rate'][0])

            # Getting the HRV Metrics
            ## Time Domain
            hrv_time = nk.hrv_time(peaks, sampling_rate=sample_rate_video)

            ## Add into the hrv_metrics dictionary
            rppg_hrv_metrics[rppg_method][subject_task_id]['MeanNN'].append(hrv_time['HRV_MeanNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SDNN'].append(hrv_time['HRV_SDNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['RMSSD'].append(hrv_time['HRV_RMSSD'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['pNN50'].append(hrv_time['HRV_pNN50'])

            ## Frequency Domain
            hrv_freq = nk.hrv_frequency(peaks, sampling_rate=sample_rate_video, psd_method="welch")
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF'].append(hrv_freq['HRV_LF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['HF'].append(hrv_freq['HRV_HF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF_HF'].append(hrv_freq['HRV_LFHF'])

            ## Non-Linear Domain
            # hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=sample_rate_video)
            # rppg_hrv_metrics[rppg_method][subject_task_id]['SD1'].append(hrv_non_linear['HRV_SD1'])
            # rppg_hrv_metrics[rppg_method][subject_task_id]['SD2'].append(hrv_non_linear['HRV_SD2'])

Processing s41_T2 for POS
Processing s42_T2 for POS
Processing s43_T2 for POS
Processing s44_T2 for POS
Processing s45_T2 for POS
Processing s46_T2 for POS
Processing s47_T2 for POS
Processing s48_T2 for POS
Processing s49_T2 for POS
Processing s50_T2 for POS
Processing s51_T2 for POS
Processing s52_T2 for POS
Processing s53_T2 for POS
Processing s54_T2 for POS
Processing s55_T2 for POS
Processing s56_T2 for POS
Processing s41_T2 for LGI
Processing s42_T2 for LGI
Processing s43_T2 for LGI
Processing s44_T2 for LGI
Processing s45_T2 for LGI
Processing s46_T2 for LGI
Processing s47_T2 for LGI
Processing s48_T2 for LGI
Processing s49_T2 for LGI
Processing s50_T2 for LGI
Processing s51_T2 for LGI
Processing s52_T2 for LGI
Processing s53_T2 for LGI
Processing s54_T2 for LGI
Processing s55_T2 for LGI
Processing s56_T2 for LGI
Processing s41_T2 for OMIT
Processing s42_T2 for OMIT
Processing s43_T2 for OMIT
Processing s44_T2 for OMIT
Processing s45_T2 for OMIT
Processing s46_T2 for OMIT
Proces

In [6]:
### Calculate the average HRV metrics for each segment for each subject per method

hrv_means = {}
for method in rppg_hrv_metrics:
    hrv_means[method] = {}

    for subject in rppg_hrv_metrics[method]:
        hrv_means[method][subject] = {}

        for metric, values in rppg_hrv_metrics[method][subject].items():
            if values:
                hrv_means[method][subject][metric] = np.mean(values)
            else:
                hrv_means[method][subject][metric] = np.nan

print(hrv_means)

{'POS': {'s41_T2': {'MeanNN': 678.6678393260769, 'SDNN': 216.17573965231605, 'RMSSD': 300.93042512586976, 'pNN50': 85.8323782689709, 'LF': nan, 'HF': 0.11476760493117646, 'LF_HF': nan, 'PR': 89.31355615007136}, 's42_T2': {'MeanNN': 785.272153931126, 'SDNN': 267.60892464291595, 'RMSSD': 350.7001624825621, 'pNN50': 86.79671518190453, 'LF': nan, 'HF': 0.12927941086895886, 'LF_HF': nan, 'PR': 76.7351064196896}, 's43_T2': {'MeanNN': 717.7845133163092, 'SDNN': 210.57672242221753, 'RMSSD': 282.80228854776476, 'pNN50': 82.81150320885223, 'LF': nan, 'HF': 0.10707711769536403, 'LF_HF': nan, 'PR': 83.80720868931047}, 's44_T2': {'MeanNN': 827.3653554936976, 'SDNN': 250.7236166364065, 'RMSSD': 336.19176277894474, 'pNN50': 87.75209927616345, 'LF': nan, 'HF': 0.1186506894074248, 'LF_HF': nan, 'PR': 72.66153889324082}, 's45_T2': {'MeanNN': 775.3243047740654, 'SDNN': 259.9465720694076, 'RMSSD': 349.3264700837941, 'pNN50': 89.14793749721981, 'LF': nan, 'HF': 0.12267512669703713, 'LF_HF': nan, 'PR': 77.8

### Getting the GT HRV Metrics

In [7]:
# Compare the Correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics

## Getting the ground truth HRV metrics

gt_hrv_metrics = {
    subject_id: {
        key: [] for key in hrv_metrics.keys()
    } for subject_id in gt_data.keys()
}

# Iterate through each subject and compute the full length HRV metrics for the ground truth
for subject_task_id, gt_signal in gt_data.items():
    print(f"Processing {subject_task_id} for ground truth")

    ## Compute the HRV metrics using neurokit2
    signals, _ = nk.ppg_process(gt_signal, sampling_rate=sample_rate_gt)
    peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=sample_rate_gt)

    ## Getting the HR and store it in the metrics
    gt_hrv_metrics[subject_task_id]['PR'] = signals['PPG_Rate'][0].item()
    
    # Getting the HRV Metrics

    ## Time Domain
    hrv_time = nk.hrv_time(peaks, sampling_rate=sample_rate_gt)

    ## Add into the hrv_metrics dictionary
    gt_hrv_metrics[subject_task_id]['MeanNN'] = (hrv_time['HRV_MeanNN'][0])
    gt_hrv_metrics[subject_task_id]['SDNN'] = (hrv_time['HRV_SDNN'][0])
    gt_hrv_metrics[subject_task_id]['RMSSD'] = (hrv_time['HRV_RMSSD'][0])
    gt_hrv_metrics[subject_task_id]['pNN50'] = (hrv_time['HRV_pNN50'][0])

    ## Frequency Domain
    hrv_freq = nk.hrv_frequency(peaks, sampling_rate=sample_rate_gt, psd_method="welch")
    gt_hrv_metrics[subject_task_id]['LF'] = (hrv_freq['HRV_LF'][0])
    gt_hrv_metrics[subject_task_id]['HF'] = (hrv_freq['HRV_HF'][0])
    gt_hrv_metrics[subject_task_id]['LF_HF'] = (hrv_freq['HRV_LFHF'][0])

    ## Non-Linear Domain
    # hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=sample_rate_gt)
    # gt_hrv_metrics[subject_task_id]['SD1'] = (hrv_non_linear['HRV_SD1'])
    # gt_hrv_metrics[subject_task_id]['SD2'] = (hrv_non_linear['HRV_SD2'])

print(gt_hrv_metrics)

Processing s41_T2 for ground truth
Processing s42_T2 for ground truth
Processing s43_T2 for ground truth
Processing s44_T2 for ground truth
Processing s45_T2 for ground truth
Processing s46_T2 for ground truth
Processing s47_T2 for ground truth
Processing s48_T2 for ground truth
Processing s49_T2 for ground truth
Processing s50_T2 for ground truth
Processing s51_T2 for ground truth
Processing s52_T2 for ground truth
Processing s53_T2 for ground truth
Processing s54_T2 for ground truth
Processing s55_T2 for ground truth
Processing s56_T2 for ground truth
{'s41_T2': {'MeanNN': 787.8579295154185, 'SDNN': 269.85224785512537, 'RMSSD': 316.26943979043244, 'pNN50': 78.8546255506608, 'LF': 0.032217513284142976, 'HF': 0.061322128869085946, 'LF_HF': 0.5253815201511155, 'PR': 76.15586230997728}, 's42_T2': {'MeanNN': 769.598599137931, 'SDNN': 231.77899344565884, 'RMSSD': 330.28720960388716, 'pNN50': 82.32758620689656, 'LF': 0.04979405843996538, 'HF': 0.10549022541304848, 'LF_HF': 0.472025329787627

### Since we already get the Metrics HRV value of the rPPG, let's compare it with the GT to see the correlation

In [8]:
def identify_outliers_iqr(data):
    """Identify outlier indices using the IQR method.
    
    Parameters:
    ----------
    data (list or numpy array): The data to check for outliers.
    
    Returns:
    --------
    numpy array: Boolean mask where True indicates outlier.
    """
    data = np.asarray(data)
    
    if len(data) == 0:
        return np.array([], dtype=bool)
    
    if len(data) == 1:
        return np.array([False])
    
    # Remove any NaN or infinite values before calculating percentiles
    clean_data = data[np.isfinite(data)]
    
    if len(clean_data) < 2:
        return np.array([False] * len(data))
    
    q1 = np.percentile(clean_data, 25)
    q3 = np.percentile(clean_data, 75)
    iqr = q3 - q1
    
    # Handle case where IQR is 0 (all values are the same)
    if iqr == 0:
        return np.array([False] * len(data))
    
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr
    
    outlier_mask = (data < lower_bound) | (data > upper_bound) | ~np.isfinite(data)
    return outlier_mask

# Compute correlation between rPPG methods and ground truth HRV metrics
correlation_results = {}
plot_data = {}  # Store clean data for plotting

for method in hrv_means.keys():
    correlation_results[method] = {}
    plot_data[method] = {}
    
    for metric in hrv_metrics.keys():
        # Collect paired data (subject_id, rppg_value, gt_value)
        paired_data = []
        
        for subject_id in hrv_means[method].keys():
            # Check if both rPPG and GT data exist for this subject and metric
            rppg_available = (subject_id in hrv_means[method] and 
                            metric in hrv_means[method][subject_id])
            gt_available = (subject_id in gt_hrv_metrics and 
                          metric in gt_hrv_metrics[subject_id])
            
            if rppg_available and gt_available:
                rppg_value = hrv_means[method][subject_id][metric]
                gt_value = gt_hrv_metrics[subject_id][metric]
                
                # Handle pandas Series if needed
                if isinstance(gt_value, pd.Series):
                    if not gt_value.empty:
                        gt_value = gt_value.iloc[0]
                    else:
                        continue
                
                # Handle numpy arrays - extract scalar value
                if isinstance(rppg_value, (np.ndarray, list)):
                    if len(rppg_value) > 0:
                        rppg_value = rppg_value[0] if hasattr(rppg_value, '__getitem__') else float(rppg_value)
                    else:
                        continue
                
                if isinstance(gt_value, (np.ndarray, list)):
                    if len(gt_value) > 0:
                        gt_value = gt_value[0] if hasattr(gt_value, '__getitem__') else float(gt_value)
                    else:
                        continue
                
                # Convert to float to ensure scalar values
                try:
                    rppg_value = float(rppg_value)
                    gt_value = float(gt_value)
                except (TypeError, ValueError):
                    print(f"Warning: Could not convert values to float for {subject_id} - {metric}")
                    print(f"  rPPG value type: {type(rppg_value)}, value: {rppg_value}")
                    print(f"  GT value type: {type(gt_value)}, value: {gt_value}")
                    continue
                
                # Check for valid values (now they're guaranteed to be scalars)
                if not np.isnan(rppg_value) and not np.isnan(gt_value):
                    paired_data.append((subject_id, rppg_value, gt_value))
        
        if len(paired_data) < 2:
            print(f"Insufficient data for {method} - {metric}: {len(paired_data)} subjects")
            continue
        
        # Extract values for outlier detection
        subject_ids = [item[0] for item in paired_data]
        rppg_values = np.array([item[1] for item in paired_data])
        gt_values = np.array([item[2] for item in paired_data])
        
        # Debug information
        print(f"Debug - {method} - {metric}:")
        print(f"  Total paired subjects: {len(paired_data)}")
        print(f"  rPPG values shape: {rppg_values.shape}")
        print(f"  GT values shape: {gt_values.shape}")
        print(f"  rPPG values: {rppg_values}")
        print(f"  GT values: {gt_values}")
        
        # Identify outliers in both datasets
        rppg_outliers = identify_outliers_iqr(rppg_values)
        gt_outliers = identify_outliers_iqr(gt_values)
        
        # Combine outlier masks (remove if outlier in either dataset)
        combined_outliers = rppg_outliers | gt_outliers
        
        # Keep only non-outlier subjects
        clean_mask = ~combined_outliers
        clean_rppg_values = rppg_values[clean_mask]
        clean_gt_values = gt_values[clean_mask]
        clean_subject_ids = [subject_ids[i] for i in range(len(subject_ids)) if clean_mask[i]]
        
        print(f"{method} - {metric}: Removed {np.sum(combined_outliers)} outliers, "
              f"kept {len(clean_rppg_values)} subjects")
        
        # Store clean data for plotting
        plot_data[method][metric] = {
            'rppg_values': clean_rppg_values,
            'gt_values': clean_gt_values,
            'subject_ids': clean_subject_ids
        }
        
        # Calculate correlation on clean data
        if len(clean_rppg_values) > 1:
            correlation, p_value = stats.pearsonr(clean_rppg_values, clean_gt_values)
            correlation_results[method][metric] = {
                'correlation': correlation,
                'p_value': p_value,
                'n_subjects': len(clean_rppg_values),
                'removed_subjects': np.sum(combined_outliers),
                'clean_subject_ids': clean_subject_ids
            }
        else:
            print(f"Insufficient clean data for {method} - {metric}")


Debug - POS - MeanNN:
  Total paired subjects: 16
  rPPG values shape: (16,)
  GT values shape: (16,)
  rPPG values: [678.66783933 785.27215393 717.78451332 827.36535549 775.32430477
 783.04569145 631.61224146 783.81197118 801.17519721 706.8704386
 815.93498128 722.75017835 736.8896677  758.69105122 847.8620969
 752.20720085]
  GT values: [787.85792952 769.59859914 680.66777567 856.90789474 730.73979592
 777.78532609 813.35227273 814.58813364 750.65376569 685.22509579
 857.87978469 756.19725738 870.12195122 678.32623106 721.27016129
 773.30163043]
POS - MeanNN: Removed 0 outliers, kept 16 subjects
Debug - POS - SDNN:
  Total paired subjects: 16
  rPPG values shape: (16,)
  GT values shape: (16,)
  rPPG values: [216.17573965 267.60892464 210.57672242 250.72361664 259.94657207
 236.19562917 181.16632322 235.54103626 280.05188031 222.02652671
 213.59881724 243.48576594 224.3540079  252.70484803 399.03575939
 230.55693468]
  GT values: [269.85224786 231.77899345 167.98910822 258.49857148 2

In [9]:
## Print the correlation results
for method, metrics in correlation_results.items():
    print(f"Method: {method}")
    for metric, result in metrics.items():
        print(f"  {metric}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")

Method: POS
  MeanNN: Correlation = 0.1111, p-value = 0.6821
  SDNN: Correlation = -0.2067, p-value = 0.4783
  RMSSD: Correlation = -0.1577, p-value = 0.5903
  pNN50: Correlation = 0.4083, p-value = 0.1660
  HF: Correlation = -0.1148, p-value = 0.6959
  PR: Correlation = 0.2998, p-value = 0.2776


Method: LGI
  MeanNN: Correlation = 0.1105, p-value = 0.6838
  SDNN: Correlation = -0.0828, p-value = 0.7784
  RMSSD: Correlation = 0.1409, p-value = 0.6310
  pNN50: Correlation = 0.3523, p-value = 0.2378
  HF: Correlation = -0.1652, p-value = 0.5898
  PR: Correlation = 0.0913, p-value = 0.7365


Method: OMIT
  MeanNN: Correlation = 0.1037, p-value = 0.7022
  SDNN: Correlation = -0.1184, p-value = 0.6870
  RMSSD: Correlation = 0.1080, p-value = 0.7133
  pNN50: Correlation = 0.2219, p-value = 0.4663
  HF: Correlation = -0.2692, p-value = 0.3738
  PR: Correlation = 0.2651, p-value = 0.3396


Method: GREEN
  MeanNN: Correlation = -0.0272, p-value = 0.9234
  SDNN: Correlation = -0.3559, p-value =

In [10]:
def plot_correlation_scatter(rppg_values, gt_values, method, metric, correlation_info=None):
    """ Plot the correlation scatter plot for rPPG values and ground truth values.
    
    Parameters:
    ----------
    rppg_values (list): List of rPPG values.
    gt_values (list): List of ground truth values.
    method (str): The rPPG method used.
    metric (str): The HRV metric being analyzed.
    correlation_info (dict): Dictionary containing correlation statistics.
    """
    plt.figure(figsize=(10, 8))
    
    # Create scatter plot
    sns.scatterplot(x=rppg_values, y=gt_values, s=80, alpha=0.7)
    
    # Add regression line
    sns.regplot(x=rppg_values, y=gt_values, scatter=False, color='red', 
                line_kws={"linewidth": 2, "label": "Regression Line"})
    
    # Add identity line (perfect correlation)
    min_val = min(min(rppg_values), min(gt_values))
    max_val = max(max(rppg_values), max(gt_values))
    plt.plot([min_val, max_val], [min_val, max_val], '--', color='gray', 
             alpha=0.8, linewidth=1, label='Perfect Correlation')
    
    # Set labels and title
    plt.xlabel(f"{method} {metric}", fontsize=12)
    plt.ylabel(f"Ground Truth {metric}", fontsize=12)
    
    # Add correlation statistics to title if available
    if correlation_info:
        corr = correlation_info.get('correlation', 0)
        p_val = correlation_info.get('p_value', 1)
        n_subj = correlation_info.get('n_subjects', len(rppg_values))
        title = f"{method} - {metric}\nr = {corr:.3f}, p = {p_val:.3f}, n = {n_subj}"
    else:
        # Calculate correlation if not provided
        if len(rppg_values) > 1:
            corr, p_val = stats.pearsonr(rppg_values, gt_values)
            title = f"{method} - {metric}\nr = {corr:.3f}, p = {p_val:.3f}, n = {len(rppg_values)}"
        else:
            title = f"{method} - {metric}"
    
    plt.title(title, fontsize=14, fontweight='bold')
    
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

# # Plot correlation scatter plots using the cleaned data
# print("\n" + "="*50)
# print("GENERATING CORRELATION PLOTS")
# print("="*50)

# for method in plot_data.keys():
#     for metric in plot_data[method].keys():
#         if metric in plot_data[method] and len(plot_data[method][metric]['rppg_values']) > 1:
#             rppg_vals = plot_data[method][metric]['rppg_values']
#             gt_vals = plot_data[method][metric]['gt_values']
            
#             # Get correlation info if available
#             corr_info = correlation_results.get(method, {}).get(metric, None)
            
#             # Create the plot
#             plot_correlation_scatter(rppg_vals, gt_vals, method, metric, corr_info)



In [11]:
# Calculate the top 5 features with the highest correlation for each rPPG method
top_features = {}
for method, metrics in correlation_results.items():
    sorted_metrics = sorted(metrics.items(), key=lambda x: abs(x[1]['correlation']), reverse=True)
    top_features[method] = sorted_metrics[:5]
print("Top 5 Features with Highest Correlation:")
for method, features in top_features.items():
    print(f"Method: {method}")
    for feature, result in features:
        print(f"  {feature}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")
    

Top 5 Features with Highest Correlation:
Method: POS
  pNN50: Correlation = 0.4083, p-value = 0.1660
  PR: Correlation = 0.2998, p-value = 0.2776
  SDNN: Correlation = -0.2067, p-value = 0.4783
  RMSSD: Correlation = -0.1577, p-value = 0.5903
  HF: Correlation = -0.1148, p-value = 0.6959


Method: LGI
  pNN50: Correlation = 0.3523, p-value = 0.2378
  HF: Correlation = -0.1652, p-value = 0.5898
  RMSSD: Correlation = 0.1409, p-value = 0.6310
  MeanNN: Correlation = 0.1105, p-value = 0.6838
  PR: Correlation = 0.0913, p-value = 0.7365


Method: OMIT
  HF: Correlation = -0.2692, p-value = 0.3738
  PR: Correlation = 0.2651, p-value = 0.3396
  pNN50: Correlation = 0.2219, p-value = 0.4663
  SDNN: Correlation = -0.1184, p-value = 0.6870
  RMSSD: Correlation = 0.1080, p-value = 0.7133


Method: GREEN
  SDNN: Correlation = -0.3559, p-value = 0.2327
  pNN50: Correlation = -0.2684, p-value = 0.3535
  PR: Correlation = -0.2555, p-value = 0.3995
  RMSSD: Correlation = -0.0894, p-value = 0.7714
  H

### Check the Bland-Altman, to see the mean bias nad the interlva of the Limit of Aggrement, make sure the point fall within the LoA

In [12]:
# # Check the value of the rPPG and GT with the Bland-Altman plot and 
# # see the measurement agreement between the rPPG methods and the ground truth

# def plot_bland_altman(rppg_values, gt_values, method, metric):
#     """ Plot Bland-Altman plot for rPPG values against ground truth values """
#     mean_diff = np.mean(rppg_values - gt_values)
#     std_diff = np.std(rppg_values - gt_values)

#     plt.figure(figsize=(10, 6))
#     plt.scatter((rppg_values + gt_values) / 2, rppg_values - gt_values, alpha=0.5)
#     plt.axhline(mean_diff, color='red', linestyle='--', label='Mean Difference')
#     plt.axhline(mean_diff + 1.96 * std_diff, color='green', linestyle='--', label='Upper Limit of Agreement')
#     plt.axhline(mean_diff - 1.96 * std_diff, color='blue', linestyle='--', label='Lower Limit of Agreement')
    
#     plt.title(f'Bland-Altman Plot: {method} - {metric}')
#     plt.xlabel('Mean of rPPG and GT Values')
#     plt.ylabel('Difference (rPPG - GT)')
#     plt.legend()
#     plt.grid()
#     plt.show()

# # Plot Bland-Altman plots for each method and metric
# for method in rppg_hrv_metrics.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         for subject_id in rppg_hrv_metrics[method].keys():
#             # Use hrv_means for the rPPG values
#             if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
#                 rppg_values.append(hrv_means[method][subject_id][metric])
            
#             # For ground truth, get the first value from the list or calculate mean
#             if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
#                 if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
#                     gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

#         if len(rppg_values) > 0 and len(gt_values) > 0:
#             plot_bland_altman(np.array(rppg_values), np.array(gt_values), method, metric)


In [13]:
def calculate_bland_altman_stats(rppg_values, gt_values):
    """ Calculate the Bland-Altman statistics 
    
    Parameters:
    ----------
    rppg_values (array): rPPG measurement values
    gt_values (array): Ground truth values
    
    Returns:
    --------
    tuple: mean_diff, std_diff, upper_limit, lower_limit, mean_avg
    """
    rppg_values = np.array(rppg_values)
    gt_values = np.array(gt_values)
    
    # Calculate differences and averages
    differences = rppg_values - gt_values
    averages = (rppg_values + gt_values) / 2
    
    mean_diff = np.mean(differences)
    std_diff = np.std(differences, ddof=1)  # Use sample standard deviation
    mean_avg = np.mean(averages)
    
    # Calculate limits of agreement (1.96 * SD)
    upper_limit = mean_diff + 1.96 * std_diff
    lower_limit = mean_diff - 1.96 * std_diff
    
    return mean_diff, std_diff, upper_limit, lower_limit, mean_avg

def calculate_percentage_difference(rppg_values, gt_values):
    """ Calculate the percentage difference between rPPG and ground truth values 
    
    Parameters:
    ----------
    rppg_values (array): rPPG measurement values
    gt_values (array): Ground truth values
    
    Returns:
    --------
    tuple: mean_percentage_diff, median_percentage_diff
    """
    rppg_values = np.array(rppg_values)
    gt_values = np.array(gt_values)
    
    # Avoid division by zero
    mask = gt_values != 0
    if np.sum(mask) == 0:
        return np.nan, np.nan
    
    # Calculate percentage differences
    percentage_diff = np.abs((rppg_values[mask] - gt_values[mask]) / gt_values[mask]) * 100
    
    return np.mean(percentage_diff), np.median(percentage_diff)

def plot_bland_altman(rppg_values, gt_values, method, metric, stats_info=None):
    """ Plot Bland-Altman plot
    
    Parameters:
    ----------
    rppg_values (array): rPPG measurement values
    gt_values (array): Ground truth values
    method (str): Method name
    metric (str): Metric name
    stats_info (dict): Statistics information
    """
    rppg_values = np.array(rppg_values)
    gt_values = np.array(gt_values)
    
    # Calculate differences and averages
    differences = rppg_values - gt_values
    averages = (rppg_values + gt_values) / 2
    
    # Calculate statistics
    mean_diff, std_diff, upper_limit, lower_limit, _ = calculate_bland_altman_stats(rppg_values, gt_values)
    
    # Create the plot
    plt.figure(figsize=(10, 8))
    
    # Scatter plot
    plt.scatter(averages, differences, alpha=0.7, s=60)
    
    # Mean difference line
    plt.axhline(mean_diff, color='red', linestyle='-', linewidth=2, label=f'Mean Diff: {mean_diff:.3f}')
    
    # Limits of agreement
    plt.axhline(upper_limit, color='red', linestyle='--', linewidth=1.5, label=f'Upper LoA: {upper_limit:.3f}')
    plt.axhline(lower_limit, color='red', linestyle='--', linewidth=1.5, label=f'Lower LoA: {lower_limit:.3f}')
    
    # Zero line
    plt.axhline(0, color='black', linestyle='-', alpha=0.3, linewidth=1)
    
    # Labels and title
    plt.xlabel(f'Average of {method} and Ground Truth {metric}', fontsize=12)
    plt.ylabel(f'{method} - Ground Truth {metric}', fontsize=12)
    
    if stats_info:
        n_subj = stats_info.get('n_subjects', len(rppg_values))
        title = f'Bland-Altman Plot: {method} - {metric}\nn = {n_subj}, SD = {std_diff:.3f}'
    else:
        title = f'Bland-Altman Plot: {method} - {metric}\nn = {len(rppg_values)}, SD = {std_diff:.3f}'
    
    plt.title(title, fontsize=14, fontweight='bold')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()


In [14]:
# Calculate Bland-Altman statistics using the cleaned data from correlation analysis
print("\n" + "="*60)
print("CALCULATING BLAND-ALTMAN STATISTICS")
print("="*60)

bland_altman_results = []

for method in plot_data.keys():
    for metric in plot_data[method].keys():
        if metric in plot_data[method] and len(plot_data[method][metric]['rppg_values']) > 1:
            rppg_vals = plot_data[method][metric]['rppg_values']
            gt_vals = plot_data[method][metric]['gt_values']
            n_subjects = len(rppg_vals)
            
            # Calculate Bland-Altman statistics
            mean_diff, std_diff, upper_limit, lower_limit, mean_avg = calculate_bland_altman_stats(rppg_vals, gt_vals)
            
            # Calculate percentage differences
            mean_perc_diff, median_perc_diff = calculate_percentage_difference(rppg_vals, gt_vals)
            
            # Get correlation info if available
            corr_info = correlation_results.get(method, {}).get(metric, {})
            correlation = corr_info.get('correlation', np.nan)
            p_value = corr_info.get('p_value', np.nan)
            
            bland_altman_results.append({
                'Method': method,
                'Metric': metric,
                'N_Subjects': n_subjects,
                'rPPG_Mean': np.mean(rppg_vals),
                'GT_Mean': np.mean(gt_vals),
                'Mean_Difference': mean_diff,
                'Std_Difference': std_diff,
                'Upper_LoA': upper_limit,
                'Lower_LoA': lower_limit,
                'Mean_Percentage_Diff': mean_perc_diff,
                'Median_Percentage_Diff': median_perc_diff,
                'Correlation': correlation,
                'P_Value': p_value
            })

# Convert to DataFrame
bland_altman_df = pd.DataFrame(bland_altman_results)

# Display results with better formatting
print("\nBland-Altman Analysis Results:")
print("-" * 100)

# Create a formatted display
display_df = bland_altman_df.copy()
for col in ['rPPG_Mean', 'GT_Mean', 'Mean_Difference', 'Std_Difference', 
           'Upper_LoA', 'Lower_LoA', 'Mean_Percentage_Diff', 'Median_Percentage_Diff', 
           'Correlation']:
    if col in display_df.columns:
        display_df[col] = display_df[col].apply(lambda x: f"{x:.3f}" if not np.isnan(x) else "N/A")

# Format p-values
if 'P_Value' in display_df.columns:
    display_df['P_Value'] = display_df['P_Value'].apply(lambda x: f"{x:.4f}" if not np.isnan(x) else "N/A")

print(display_df.to_string(index=False))

# Analysis of methods within acceptable limits
print("\n" + "="*60)
print("ANALYSIS OF METHODS WITHIN ACCEPTABLE LIMITS")
print("="*60)

# Methods within 20% difference
within_20_percent = bland_altman_df[bland_altman_df['Mean_Percentage_Diff'] <= 20.0]
print(f"\nMethods within 20% mean percentage difference ({len(within_20_percent)} out of {len(bland_altman_df)}):")
if len(within_20_percent) > 0:
    print(within_20_percent[['Method', 'Metric', 'Mean_Percentage_Diff', 'Correlation', 'N_Subjects']].to_string(index=False))
else:
    print("No methods found within 20% difference threshold.")

# Methods within 10% difference (more stringent)
within_10_percent = bland_altman_df[bland_altman_df['Mean_Percentage_Diff'] <= 10.0]
print(f"\nMethods within 10% mean percentage difference ({len(within_10_percent)} out of {len(bland_altman_df)}):")
if len(within_10_percent) > 0:
    print(within_10_percent[['Method', 'Metric', 'Mean_Percentage_Diff', 'Correlation', 'N_Subjects']].to_string(index=False))
else:
    print("No methods found within 10% difference threshold.")

# Best performing methods (lowest percentage difference)
print(f"\nTop 5 best performing method-metric combinations (lowest mean percentage difference):")
best_methods = bland_altman_df.nsmallest(5, 'Mean_Percentage_Diff')
print(best_methods[['Method', 'Metric', 'Mean_Percentage_Diff', 'Correlation', 'N_Subjects']].to_string(index=False))

# # Generate Bland-Altman plots
# print("\n" + "="*50)
# print("GENERATING BLAND-ALTMAN PLOTS")
# print("="*50)

# for method in plot_data.keys():
#     for metric in plot_data[method].keys():
#         if metric in plot_data[method] and len(plot_data[method][metric]['rppg_values']) > 1:
#             rppg_vals = plot_data[method][metric]['rppg_values']
#             gt_vals = plot_data[method][metric]['gt_values']
            
#             # Get statistics info
#             stats_info = {'n_subjects': len(rppg_vals)}
            
#             # Create Bland-Altman plot
#             plot_bland_altman(rppg_vals, gt_vals, method, metric, stats_info)

# Summary statistics
print("\n" + "="*60)
print("SUMMARY STATISTICS")
print("="*60)

print(f"Total method-metric combinations analyzed: {len(bland_altman_df)}")
print(f"Mean percentage difference across all combinations: {bland_altman_df['Mean_Percentage_Diff'].mean():.2f}%")
print(f"Median percentage difference across all combinations: {bland_altman_df['Mean_Percentage_Diff'].median():.2f}%")
print(f"Best performing combination: {best_methods.iloc[0]['Method']} - {best_methods.iloc[0]['Metric']} ({best_methods.iloc[0]['Mean_Percentage_Diff']:.2f}%)")



CALCULATING BLAND-ALTMAN STATISTICS

Bland-Altman Analysis Results:
----------------------------------------------------------------------------------------------------
Method Metric  N_Subjects rPPG_Mean GT_Mean Mean_Difference Std_Difference Upper_LoA Lower_LoA Mean_Percentage_Diff Median_Percentage_Diff Correlation P_Value
   POS MeanNN          16   757.829 770.280         -12.451         79.376   143.127  -168.028                7.771                  5.171       0.111  0.6821
   POS   SDNN          14   234.941 221.220          13.721         60.899   133.082  -105.641               25.764                 15.968      -0.207  0.4783
   POS  RMSSD          14   315.320 284.553          30.767         81.093   189.709  -128.174               29.050                 21.985      -0.158  0.5903
   POS  pNN50          13    85.340  78.143           7.197          6.562    20.058    -5.665               10.906                  8.849       0.408  0.1660
   POS     HF          14     0.113

### Conclussion : 30 Seconds window

The study correlation within the 30 seconds rppg hrv metrics compare to the GT shows weak / moderate relation with the GT.

Using the bland-altman itself it shows one feature. The MeanNN (time it takes between each heart beat) have acceptable agreement with the reference based on your 20% threshold.

In [15]:
## Store the rPPG hrv metrics into the csv
output_path = "rest_rppg_hrv_metrics_window-30s.csv"

chrom_hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
    'PR' : [],
}

for subject_id in hrv_means['CHROM'].keys():
    chrom_hrv_metrics['MeanNN'].append(hrv_means['CHROM'][subject_id]['MeanNN'])
    chrom_hrv_metrics['pNN50'].append(hrv_means['CHROM'][subject_id]['pNN50'])
    chrom_hrv_metrics['RMSSD'].append(hrv_means['CHROM'][subject_id]['RMSSD'])
    chrom_hrv_metrics['SDNN'].append(hrv_means['CHROM'][subject_id]['SDNN'])
    chrom_hrv_metrics['LF'].append(hrv_means['CHROM'][subject_id]['LF'])
    chrom_hrv_metrics['HF'].append(hrv_means['CHROM'][subject_id]['HF'])
    chrom_hrv_metrics['LF_HF'].append(hrv_means['CHROM'][subject_id]['LF_HF'])
    chrom_hrv_metrics['PR'].append(hrv_means['CHROM'][subject_id]['PR'])
    
## Convert the chrom_hrv_metrics to a DataFrame
chrom_df = pd.DataFrame(chrom_hrv_metrics)

## Add label Rest to the dataFrame
chrom_df['Label'] = 'Rest'

chrom_df.head()

## Save the DataFrame to a CSV file
chrom_df.to_csv(output_path, index=False)

---

# 1 Minute Plot Correlation

For 1 minute window, the averaging purpose will be done under windowing each short rPPG segment with the **strides** of 30 seconds (means the different between each short window is 30 seconds).

The test will be done under certain scenario of the Task 1, Task 2 UBFC, Physio Rest 2 and Rest 6

In [16]:
root_path = "UBFC-Phys"
subjects = ["s41", "s42", "s43", "s44","s45","s46","s47","s48","s49","s50","s51","s52", "s53","s54","s55","s56"]
tasks = ["T2"]

# Store ground truth and rPPG data
gt_data = {}
rppg_data = {
    'POS': {},
    'LGI': {},
    'OMIT': {},
    'GREEN': {},
    'CHROM': {}
}
# Expected sampling rates (adjust if different for your dataset)
sample_rate_gt = 64  # Hz
sample_rate_video = 35 # Hz


In [17]:
## Process for each subject and task
for subject in subjects:
    for task in tasks:
        subject_task_id = f"{subject}_{task}"

        # Load rPPG signals from different methods
        pos = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_POS_rppg.npy"))
        lgi = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_LGI_rppg.npy"))
        omit = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_OMIT_rppg.npy"))
        green = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_GREEN_rppg.npy"))
        chrom = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_CHROM_rppg.npy"))

        # Load ground truth BVP
        GT = pd.read_csv(os.path.join(root_path, subject, f"bvp_{subject}_{task}.csv")).values
        GT = GT.flatten()

        ## process rPPG signals
        rppg_data["POS"][subject_task_id] = preprocess_ppg(pos, fs=sample_rate_video)
        rppg_data["LGI"][subject_task_id] = preprocess_ppg(lgi, fs=sample_rate_video)
        rppg_data["OMIT"][subject_task_id] = preprocess_ppg(omit, fs=sample_rate_video)
        rppg_data["GREEN"][subject_task_id] = preprocess_ppg(green, fs=sample_rate_video)
        rppg_data["CHROM"][subject_task_id] = preprocess_ppg(chrom, fs=sample_rate_video)
        
        GT = preprocess_ppg(GT, fs=sample_rate_gt)
        gt_data[subject_task_id] = GT

print(f"Done Process the Signals")
    

Done Process the Signals


In [18]:
"""
Steps to reproduce getting the short term of 30 seconds for each subject + averaging:
1. Loop through each subject.
2. For each short rppg segment (30 seconds), compute the hrv metrics with the neurokit2 package and store it.
3. Average the HRV metrics across all segments for each subject.
4. Compare the correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics.
# Note: The above code is a preprocessing step. The next steps would involve calculating HRV metrics and performing correlation analysis.
""" 

## Iterate for each subject and compute HRV metrics
hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
    'SD1': [],
    'SD2': [],
    'PR' : [],
}

## Store the HRV metrics for each rPPG method for each subject
rppg_hrv_metrics = {
    method: {
        subject_id: {
            key: [] for key in hrv_metrics.keys()
        } for subject_id in rppg_data[method].keys()
    } for method in rppg_data.keys()
}

## Iterate through each subject and compute HRV for each segments
for rppg_method in rppg_data.keys():
    for subject_task_id, rppg_signal in rppg_data[rppg_method].items():
        print(f"Processing {subject_task_id} for {rppg_method}")

        ## Applied the window of 30 seconds with stride of 15 seconds
        segment_length = 60 * sample_rate_video  # 30 seconds in samples
        stride_length = 30 * sample_rate_video
        
        ## Making the segments
        for start in range(0, len(rppg_signal) - segment_length + 1, stride_length):
            segment = rppg_signal[start:start + segment_length]
            ## If the segment is less than the segment length, we skip it
            if len(segment) < segment_length:
                continue

            ## Compute the HRV metrics using neurokit2
            signals, _ = nk.ppg_process(segment, sampling_rate=sample_rate_video)
            peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=sample_rate_video)

            ## Getting the HR and store it in the metrics
            rppg_hrv_metrics[rppg_method][subject_task_id]['PR'].append(signals['PPG_Rate'][0])

            # Getting the HRV Metrics
            ## Time Domain
            hrv_time = nk.hrv_time(peaks, sampling_rate=sample_rate_video)

            ## Add into the hrv_metrics dictionary
            rppg_hrv_metrics[rppg_method][subject_task_id]['MeanNN'].append(hrv_time['HRV_MeanNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SDNN'].append(hrv_time['HRV_SDNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['RMSSD'].append(hrv_time['HRV_RMSSD'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['pNN50'].append(hrv_time['HRV_pNN50'])

            ## Frequency Domain
            hrv_freq = nk.hrv_frequency(peaks, sampling_rate=sample_rate_video, psd_method="welch")
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF'].append(hrv_freq['HRV_LF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['HF'].append(hrv_freq['HRV_HF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF_HF'].append(hrv_freq['HRV_LFHF'])

            ## Non-Linear Domain
            hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=sample_rate_video)
            rppg_hrv_metrics[rppg_method][subject_task_id]['SD1'].append(hrv_non_linear['HRV_SD1'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SD2'].append(hrv_non_linear['HRV_SD2'])

Processing s41_T2 for POS
Processing s42_T2 for POS
Processing s43_T2 for POS
Processing s44_T2 for POS
Processing s45_T2 for POS
Processing s46_T2 for POS
Processing s47_T2 for POS
Processing s48_T2 for POS
Processing s49_T2 for POS
Processing s50_T2 for POS
Processing s51_T2 for POS
Processing s52_T2 for POS
Processing s53_T2 for POS
Processing s54_T2 for POS
Processing s55_T2 for POS
Processing s56_T2 for POS
Processing s41_T2 for LGI
Processing s42_T2 for LGI
Processing s43_T2 for LGI
Processing s44_T2 for LGI
Processing s45_T2 for LGI
Processing s46_T2 for LGI
Processing s47_T2 for LGI
Processing s48_T2 for LGI
Processing s49_T2 for LGI
Processing s50_T2 for LGI
Processing s51_T2 for LGI
Processing s52_T2 for LGI
Processing s53_T2 for LGI
Processing s54_T2 for LGI
Processing s55_T2 for LGI
Processing s56_T2 for LGI
Processing s41_T2 for OMIT
Processing s42_T2 for OMIT
Processing s43_T2 for OMIT
Processing s44_T2 for OMIT
Processing s45_T2 for OMIT
Processing s46_T2 for OMIT
Proces

In [19]:
### Calculate the average HRV metrics for each segment for each subject per method

hrv_means = {}
for method in rppg_hrv_metrics:
    hrv_means[method] = {}

    for subject in rppg_hrv_metrics[method]:
        hrv_means[method][subject] = {}

        for metric, values in rppg_hrv_metrics[method][subject].items():
            if values:
                hrv_means[method][subject][metric] = np.mean(values)
            else:
                hrv_means[method][subject][metric] = np.nan

print(hrv_means)

{'POS': {'s41_T2': {'MeanNN': 666.5402151159676, 'SDNN': 214.97786272219406, 'RMSSD': 304.54161188631736, 'pNN50': 87.03410927250596, 'LF': 0.02700504430619084, 'HF': 0.08703422437039894, 'LF_HF': 0.38725620762806234, 'SD1': 216.58072655132997, 'SD2': 213.11418072816505, 'PR': 90.21423197217946}, 's42_T2': {'MeanNN': 782.9803166553627, 'SDNN': 266.35176984491153, 'RMSSD': 355.57159352581675, 'pNN50': 89.42110232892904, 'LF': 0.02802354860692249, 'HF': 0.12346623730672637, 'LF_HF': 0.2343390900163252, 'SD1': 253.1039879995964, 'SD2': 278.3259756748048, 'PR': 76.89249677364475}, 's43_T2': {'MeanNN': 724.5558450863402, 'SDNN': 214.74477450497548, 'RMSSD': 292.04811416910763, 'pNN50': 84.37815326745678, 'LF': 0.03965139691212394, 'HF': 0.11078552857237992, 'LF_HF': 0.39365936358984144, 'SD1': 207.79798588152994, 'SD2': 222.2077558299321, 'PR': 82.93289737810007}, 's44_T2': {'MeanNN': 820.8310382193646, 'SDNN': 253.76474994075065, 'RMSSD': 330.7526537451052, 'pNN50': 89.33562806403783, 'LF'

### Getting the GT HRV Metrics

In [20]:
# Compare the Correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics

## Getting the ground truth HRV metrics

gt_hrv_metrics = {
    subject_id: {
        key: [] for key in hrv_metrics.keys()
    } for subject_id in gt_data.keys()
}

# Iterate through each subject and compute the full length HRV metrics for the ground truth
for subject_task_id, gt_signal in gt_data.items():
    print(f"Processing {subject_task_id} for ground truth")

    ## Compute the HRV metrics using neurokit2
    signals, _ = nk.ppg_process(gt_signal, sampling_rate=sample_rate_gt)
    peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=sample_rate_gt)

    ## Getting the HR and store it in the metrics
    gt_hrv_metrics[subject_task_id]['PR'] = signals['PPG_Rate'][0].item()
    
    # Getting the HRV Metrics

    ## Time Domain
    hrv_time = nk.hrv_time(peaks, sampling_rate=sample_rate_gt)

    ## Add into the hrv_metrics dictionary
    gt_hrv_metrics[subject_task_id]['MeanNN'] = (hrv_time['HRV_MeanNN'][0])
    gt_hrv_metrics[subject_task_id]['SDNN'] = (hrv_time['HRV_SDNN'][0])
    gt_hrv_metrics[subject_task_id]['RMSSD'] = (hrv_time['HRV_RMSSD'][0])
    gt_hrv_metrics[subject_task_id]['pNN50'] = (hrv_time['HRV_pNN50'][0])

    ## Frequency Domain
    hrv_freq = nk.hrv_frequency(peaks, sampling_rate=sample_rate_gt, psd_method="welch")
    gt_hrv_metrics[subject_task_id]['LF'] = (hrv_freq['HRV_LF'][0])
    gt_hrv_metrics[subject_task_id]['HF'] = (hrv_freq['HRV_HF'][0])
    gt_hrv_metrics[subject_task_id]['LF_HF'] = (hrv_freq['HRV_LFHF'][0])

    ## Non-Linear Domain
    # hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=sample_rate_gt)
    # gt_hrv_metrics[subject_task_id]['SD1'] = (hrv_non_linear['HRV_SD1'])
    # gt_hrv_metrics[subject_task_id]['SD2'] = (hrv_non_linear['HRV_SD2'])

print(gt_hrv_metrics)

Processing s41_T2 for ground truth
Processing s42_T2 for ground truth
Processing s43_T2 for ground truth
Processing s44_T2 for ground truth
Processing s45_T2 for ground truth
Processing s46_T2 for ground truth
Processing s47_T2 for ground truth
Processing s48_T2 for ground truth
Processing s49_T2 for ground truth
Processing s50_T2 for ground truth
Processing s51_T2 for ground truth
Processing s52_T2 for ground truth
Processing s53_T2 for ground truth
Processing s54_T2 for ground truth
Processing s55_T2 for ground truth
Processing s56_T2 for ground truth
{'s41_T2': {'MeanNN': 787.8579295154185, 'SDNN': 269.85224785512537, 'RMSSD': 316.26943979043244, 'pNN50': 78.8546255506608, 'LF': 0.032217513284142976, 'HF': 0.061322128869085946, 'LF_HF': 0.5253815201511155, 'SD1': [], 'SD2': [], 'PR': 76.15586230997728}, 's42_T2': {'MeanNN': 769.598599137931, 'SDNN': 231.77899344565884, 'RMSSD': 330.28720960388716, 'pNN50': 82.32758620689656, 'LF': 0.04979405843996538, 'HF': 0.10549022541304848, 'LF_

### Since we already get the Metrics HRV value of the rPPG, let's compare it with the GT to see the correlation

In [21]:
def identify_outliers_iqr(data):
    """Identify outlier indices using the IQR method.
    
    Parameters:
    ----------
    data (list or numpy array): The data to check for outliers.
    
    Returns:
    --------
    numpy array: Boolean mask where True indicates outlier.
    """
    data = np.asarray(data)
    
    if len(data) == 0:
        return np.array([], dtype=bool)
    
    if len(data) == 1:
        return np.array([False])
    
    # Remove any NaN or infinite values before calculating percentiles
    clean_data = data[np.isfinite(data)]
    
    if len(clean_data) < 2:
        return np.array([False] * len(data))
    
    q1 = np.percentile(clean_data, 25)
    q3 = np.percentile(clean_data, 75)
    iqr = q3 - q1
    
    # Handle case where IQR is 0 (all values are the same)
    if iqr == 0:
        return np.array([False] * len(data))
    
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr
    
    outlier_mask = (data < lower_bound) | (data > upper_bound) | ~np.isfinite(data)
    return outlier_mask

# Compute correlation between rPPG methods and ground truth HRV metrics
correlation_results = {}
plot_data = {}  # Store clean data for plotting

for method in hrv_means.keys():
    correlation_results[method] = {}
    plot_data[method] = {}
    
    for metric in hrv_metrics.keys():
        # Collect paired data (subject_id, rppg_value, gt_value)
        paired_data = []
        
        for subject_id in hrv_means[method].keys():
            # Check if both rPPG and GT data exist for this subject and metric
            rppg_available = (subject_id in hrv_means[method] and 
                            metric in hrv_means[method][subject_id])
            gt_available = (subject_id in gt_hrv_metrics and 
                          metric in gt_hrv_metrics[subject_id])
            
            if rppg_available and gt_available:
                rppg_value = hrv_means[method][subject_id][metric]
                gt_value = gt_hrv_metrics[subject_id][metric]
                
                # Handle pandas Series if needed
                if isinstance(gt_value, pd.Series):
                    if not gt_value.empty:
                        gt_value = gt_value.iloc[0]
                    else:
                        continue
                
                # Handle numpy arrays - extract scalar value
                if isinstance(rppg_value, (np.ndarray, list)):
                    if len(rppg_value) > 0:
                        rppg_value = rppg_value[0] if hasattr(rppg_value, '__getitem__') else float(rppg_value)
                    else:
                        continue
                
                if isinstance(gt_value, (np.ndarray, list)):
                    if len(gt_value) > 0:
                        gt_value = gt_value[0] if hasattr(gt_value, '__getitem__') else float(gt_value)
                    else:
                        continue
                
                # Convert to float to ensure scalar values
                try:
                    rppg_value = float(rppg_value)
                    gt_value = float(gt_value)
                except (TypeError, ValueError):
                    print(f"Warning: Could not convert values to float for {subject_id} - {metric}")
                    print(f"  rPPG value type: {type(rppg_value)}, value: {rppg_value}")
                    print(f"  GT value type: {type(gt_value)}, value: {gt_value}")
                    continue
                
                # Check for valid values (now they're guaranteed to be scalars)
                if not np.isnan(rppg_value) and not np.isnan(gt_value):
                    paired_data.append((subject_id, rppg_value, gt_value))
        
        if len(paired_data) < 2:
            print(f"Insufficient data for {method} - {metric}: {len(paired_data)} subjects")
            continue
        
        # Extract values for outlier detection
        subject_ids = [item[0] for item in paired_data]
        rppg_values = np.array([item[1] for item in paired_data])
        gt_values = np.array([item[2] for item in paired_data])
        
        # Debug information
        print(f"Debug - {method} - {metric}:")
        print(f"  Total paired subjects: {len(paired_data)}")
        print(f"  rPPG values shape: {rppg_values.shape}")
        print(f"  GT values shape: {gt_values.shape}")
        print(f"  rPPG values: {rppg_values}")
        print(f"  GT values: {gt_values}")
        
        # Identify outliers in both datasets
        rppg_outliers = identify_outliers_iqr(rppg_values)
        gt_outliers = identify_outliers_iqr(gt_values)
        
        # Combine outlier masks (remove if outlier in either dataset)
        combined_outliers = rppg_outliers | gt_outliers
        
        # Keep only non-outlier subjects
        clean_mask = ~combined_outliers
        clean_rppg_values = rppg_values[clean_mask]
        clean_gt_values = gt_values[clean_mask]
        clean_subject_ids = [subject_ids[i] for i in range(len(subject_ids)) if clean_mask[i]]
        
        print(f"{method} - {metric}: Removed {np.sum(combined_outliers)} outliers, "
              f"kept {len(clean_rppg_values)} subjects")
        
        # Store clean data for plotting
        plot_data[method][metric] = {
            'rppg_values': clean_rppg_values,
            'gt_values': clean_gt_values,
            'subject_ids': clean_subject_ids
        }
        
        # Calculate correlation on clean data
        if len(clean_rppg_values) > 1:
            correlation, p_value = stats.pearsonr(clean_rppg_values, clean_gt_values)
            correlation_results[method][metric] = {
                'correlation': correlation,
                'p_value': p_value,
                'n_subjects': len(clean_rppg_values),
                'removed_subjects': np.sum(combined_outliers),
                'clean_subject_ids': clean_subject_ids
            }
        else:
            print(f"Insufficient clean data for {method} - {metric}")


Debug - POS - MeanNN:
  Total paired subjects: 16
  rPPG values shape: (16,)
  GT values shape: (16,)
  rPPG values: [ 666.54021512  782.98031666  724.55584509  820.83103822  772.79653821
  787.14537966  630.94550237  777.3242819   805.54616657  711.14091995
  821.1341929   722.23481985  735.80179037  750.72685673 1489.20210266
  760.36207318]
  GT values: [787.85792952 769.59859914 680.66777567 856.90789474 730.73979592
 777.78532609 813.35227273 814.58813364 750.65376569 685.22509579
 857.87978469 756.19725738 870.12195122 678.32623106 721.27016129
 773.30163043]
POS - MeanNN: Removed 1 outliers, kept 15 subjects
Debug - POS - SDNN:
  Total paired subjects: 16
  rPPG values shape: (16,)
  GT values shape: (16,)
  rPPG values: [ 214.97786272  266.35176984  214.7447745   253.76474994  260.90235006
  243.7194804   183.5754531   229.75932854  288.45950226  225.8231768
  216.45017009  239.53907337  219.78089825  250.07558929 2842.61915404
  232.3155379 ]
  GT values: [269.85224786 231.778

In [22]:
## Print the correlation results
for method, metrics in correlation_results.items():
    print(f"Method: {method}")
    for metric, result in metrics.items():
        print(f"  {metric}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")

Method: POS
  MeanNN: Correlation = 0.2037, p-value = 0.4666
  SDNN: Correlation = -0.1497, p-value = 0.6096
  RMSSD: Correlation = -0.0499, p-value = 0.8654
  pNN50: Correlation = 0.4658, p-value = 0.1086
  LF: Correlation = 0.0209, p-value = 0.9412
  HF: Correlation = 0.2549, p-value = 0.3792
  LF_HF: Correlation = 0.3888, p-value = 0.2117
  PR: Correlation = 0.3650, p-value = 0.1995


Method: LGI
  MeanNN: Correlation = 0.1310, p-value = 0.6416
  SDNN: Correlation = -0.1301, p-value = 0.6719
  RMSSD: Correlation = 0.1390, p-value = 0.6356
  pNN50: Correlation = 0.3937, p-value = 0.1832
  LF: Correlation = 0.0685, p-value = 0.8083
  HF: Correlation = 0.3444, p-value = 0.2087
  LF_HF: Correlation = 0.5334, p-value = 0.0605
  PR: Correlation = -0.0341, p-value = 0.9003


Method: OMIT
  MeanNN: Correlation = 0.1280, p-value = 0.6494
  SDNN: Correlation = -0.1581, p-value = 0.6061
  RMSSD: Correlation = -0.0176, p-value = 0.9546
  pNN50: Correlation = 0.2599, p-value = 0.3912
  LF: Corre

In [23]:
def plot_correlation_scatter(rppg_values, gt_values, method, metric, correlation_info=None):
    """ Plot the correlation scatter plot for rPPG values and ground truth values.
    
    Parameters:
    ----------
    rppg_values (list): List of rPPG values.
    gt_values (list): List of ground truth values.
    method (str): The rPPG method used.
    metric (str): The HRV metric being analyzed.
    correlation_info (dict): Dictionary containing correlation statistics.
    """
    plt.figure(figsize=(10, 8))
    
    # Create scatter plot
    sns.scatterplot(x=rppg_values, y=gt_values, s=80, alpha=0.7)
    
    # Add regression line
    sns.regplot(x=rppg_values, y=gt_values, scatter=False, color='red', 
                line_kws={"linewidth": 2, "label": "Regression Line"})
    
    # Add identity line (perfect correlation)
    min_val = min(min(rppg_values), min(gt_values))
    max_val = max(max(rppg_values), max(gt_values))
    plt.plot([min_val, max_val], [min_val, max_val], '--', color='gray', 
             alpha=0.8, linewidth=1, label='Perfect Correlation')
    
    # Set labels and title
    plt.xlabel(f"{method} {metric}", fontsize=12)
    plt.ylabel(f"Ground Truth {metric}", fontsize=12)
    
    # Add correlation statistics to title if available
    if correlation_info:
        corr = correlation_info.get('correlation', 0)
        p_val = correlation_info.get('p_value', 1)
        n_subj = correlation_info.get('n_subjects', len(rppg_values))
        title = f"{method} - {metric}\nr = {corr:.3f}, p = {p_val:.3f}, n = {n_subj}"
    else:
        # Calculate correlation if not provided
        if len(rppg_values) > 1:
            corr, p_val = stats.pearsonr(rppg_values, gt_values)
            title = f"{method} - {metric}\nr = {corr:.3f}, p = {p_val:.3f}, n = {len(rppg_values)}"
        else:
            title = f"{method} - {metric}"
    
    plt.title(title, fontsize=14, fontweight='bold')
    
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

# # Plot correlation scatter plots using the cleaned data
# print("\n" + "="*50)
# print("GENERATING CORRELATION PLOTS")
# print("="*50)

# for method in plot_data.keys():
#     for metric in plot_data[method].keys():
#         if metric in plot_data[method] and len(plot_data[method][metric]['rppg_values']) > 1:
#             rppg_vals = plot_data[method][metric]['rppg_values']
#             gt_vals = plot_data[method][metric]['gt_values']
            
#             # Get correlation info if available
#             corr_info = correlation_results.get(method, {}).get(metric, None)
            
#             # Create the plot
#             plot_correlation_scatter(rppg_vals, gt_vals, method, metric, corr_info)



In [24]:
# Calculate the top 5 features with the highest correlation for each rPPG method
top_features = {}
for method, metrics in correlation_results.items():
    sorted_metrics = sorted(metrics.items(), key=lambda x: abs(x[1]['correlation']), reverse=True)
    top_features[method] = sorted_metrics[:5]
print("Top 5 Features with Highest Correlation:")
for method, features in top_features.items():
    print(f"Method: {method}")
    for feature, result in features:
        print(f"  {feature}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")
    

Top 5 Features with Highest Correlation:
Method: POS
  pNN50: Correlation = 0.4658, p-value = 0.1086
  LF_HF: Correlation = 0.3888, p-value = 0.2117
  PR: Correlation = 0.3650, p-value = 0.1995
  HF: Correlation = 0.2549, p-value = 0.3792
  MeanNN: Correlation = 0.2037, p-value = 0.4666


Method: LGI
  LF_HF: Correlation = 0.5334, p-value = 0.0605
  pNN50: Correlation = 0.3937, p-value = 0.1832
  HF: Correlation = 0.3444, p-value = 0.2087
  RMSSD: Correlation = 0.1390, p-value = 0.6356
  MeanNN: Correlation = 0.1310, p-value = 0.6416


Method: OMIT
  LF_HF: Correlation = 0.2769, p-value = 0.3598
  pNN50: Correlation = 0.2599, p-value = 0.3912
  HF: Correlation = 0.2124, p-value = 0.5075
  SDNN: Correlation = -0.1581, p-value = 0.6061
  MeanNN: Correlation = 0.1280, p-value = 0.6494


Method: GREEN
  LF: Correlation = -0.3477, p-value = 0.1870
  pNN50: Correlation = -0.2311, p-value = 0.4474
  LF_HF: Correlation = 0.1810, p-value = 0.5358
  PR: Correlation = 0.1220, p-value = 0.6526
  H

---

### Check the Bland-Altman, to see the mean bias nad the interlva of the Limit of Aggrement, make sure the point fall within the LoA

In [25]:
# # Check the value of the rPPG and GT with the Bland-Altman plot and 
# # see the measurement agreement between the rPPG methods and the ground truth

# def plot_bland_altman(rppg_values, gt_values, method, metric):
#     """ Plot Bland-Altman plot for rPPG values against ground truth values """
#     mean_diff = np.mean(rppg_values - gt_values)
#     std_diff = np.std(rppg_values - gt_values)

#     plt.figure(figsize=(10, 6))
#     plt.scatter((rppg_values + gt_values) / 2, rppg_values - gt_values, alpha=0.5)
#     plt.axhline(mean_diff, color='red', linestyle='--', label='Mean Difference')
#     plt.axhline(mean_diff + 1.96 * std_diff, color='green', linestyle='--', label='Upper Limit of Agreement')
#     plt.axhline(mean_diff - 1.96 * std_diff, color='blue', linestyle='--', label='Lower Limit of Agreement')
    
#     plt.title(f'Bland-Altman Plot: {method} - {metric}')
#     plt.xlabel('Mean of rPPG and GT Values')
#     plt.ylabel('Difference (rPPG - GT)')
#     plt.legend()
#     plt.grid()
#     plt.show()

# # Plot Bland-Altman plots for each method and metric
# for method in rppg_hrv_metrics.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         for subject_id in rppg_hrv_metrics[method].keys():
#             # Use hrv_means for the rPPG values
#             if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
#                 rppg_values.append(hrv_means[method][subject_id][metric])
            
#             # For ground truth, get the first value from the list or calculate mean
#             if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
#                 if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
#                     gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

#         if len(rppg_values) > 0 and len(gt_values) > 0:
#             plot_bland_altman(np.array(rppg_values), np.array(gt_values), method, metric)


In [26]:
def calculate_bland_altman_stats(rppg_values, gt_values):
    """ Calculate the Bland-Altman statistics 
    
    Parameters:
    ----------
    rppg_values (array): rPPG measurement values
    gt_values (array): Ground truth values
    
    Returns:
    --------
    tuple: mean_diff, std_diff, upper_limit, lower_limit, mean_avg
    """
    rppg_values = np.array(rppg_values)
    gt_values = np.array(gt_values)
    
    # Calculate differences and averages
    differences = rppg_values - gt_values
    averages = (rppg_values + gt_values) / 2
    
    mean_diff = np.mean(differences)
    std_diff = np.std(differences, ddof=1)  # Use sample standard deviation
    mean_avg = np.mean(averages)
    
    # Calculate limits of agreement (1.96 * SD)
    upper_limit = mean_diff + 1.96 * std_diff
    lower_limit = mean_diff - 1.96 * std_diff
    
    return mean_diff, std_diff, upper_limit, lower_limit, mean_avg

def calculate_percentage_difference(rppg_values, gt_values):
    """ Calculate the percentage difference between rPPG and ground truth values 
    
    Parameters:
    ----------
    rppg_values (array): rPPG measurement values
    gt_values (array): Ground truth values
    
    Returns:
    --------
    tuple: mean_percentage_diff, median_percentage_diff
    """
    rppg_values = np.array(rppg_values)
    gt_values = np.array(gt_values)
    
    # Avoid division by zero
    mask = gt_values != 0
    if np.sum(mask) == 0:
        return np.nan, np.nan
    
    # Calculate percentage differences
    percentage_diff = np.abs((rppg_values[mask] - gt_values[mask]) / gt_values[mask]) * 100
    
    return np.mean(percentage_diff), np.median(percentage_diff)

def plot_bland_altman(rppg_values, gt_values, method, metric, stats_info=None):
    """ Plot Bland-Altman plot
    
    Parameters:
    ----------
    rppg_values (array): rPPG measurement values
    gt_values (array): Ground truth values
    method (str): Method name
    metric (str): Metric name
    stats_info (dict): Statistics information
    """
    rppg_values = np.array(rppg_values)
    gt_values = np.array(gt_values)
    
    # Calculate differences and averages
    differences = rppg_values - gt_values
    averages = (rppg_values + gt_values) / 2
    
    # Calculate statistics
    mean_diff, std_diff, upper_limit, lower_limit, _ = calculate_bland_altman_stats(rppg_values, gt_values)
    
    # Create the plot
    plt.figure(figsize=(10, 8))
    
    # Scatter plot
    plt.scatter(averages, differences, alpha=0.7, s=60)
    
    # Mean difference line
    plt.axhline(mean_diff, color='red', linestyle='-', linewidth=2, label=f'Mean Diff: {mean_diff:.3f}')
    
    # Limits of agreement
    plt.axhline(upper_limit, color='red', linestyle='--', linewidth=1.5, label=f'Upper LoA: {upper_limit:.3f}')
    plt.axhline(lower_limit, color='red', linestyle='--', linewidth=1.5, label=f'Lower LoA: {lower_limit:.3f}')
    
    # Zero line
    plt.axhline(0, color='black', linestyle='-', alpha=0.3, linewidth=1)
    
    # Labels and title
    plt.xlabel(f'Average of {method} and Ground Truth {metric}', fontsize=12)
    plt.ylabel(f'{method} - Ground Truth {metric}', fontsize=12)
    
    if stats_info:
        n_subj = stats_info.get('n_subjects', len(rppg_values))
        title = f'Bland-Altman Plot: {method} - {metric}\nn = {n_subj}, SD = {std_diff:.3f}'
    else:
        title = f'Bland-Altman Plot: {method} - {metric}\nn = {len(rppg_values)}, SD = {std_diff:.3f}'
    
    plt.title(title, fontsize=14, fontweight='bold')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()


In [27]:
# Calculate Bland-Altman statistics using the cleaned data from correlation analysis
print("\n" + "="*60)
print("CALCULATING BLAND-ALTMAN STATISTICS")
print("="*60)

bland_altman_results = []

for method in plot_data.keys():
    for metric in plot_data[method].keys():
        if metric in plot_data[method] and len(plot_data[method][metric]['rppg_values']) > 1:
            rppg_vals = plot_data[method][metric]['rppg_values']
            gt_vals = plot_data[method][metric]['gt_values']
            n_subjects = len(rppg_vals)
            
            # Calculate Bland-Altman statistics
            mean_diff, std_diff, upper_limit, lower_limit, mean_avg = calculate_bland_altman_stats(rppg_vals, gt_vals)
            
            # Calculate percentage differences
            mean_perc_diff, median_perc_diff = calculate_percentage_difference(rppg_vals, gt_vals)
            
            # Get correlation info if available
            corr_info = correlation_results.get(method, {}).get(metric, {})
            correlation = corr_info.get('correlation', np.nan)
            p_value = corr_info.get('p_value', np.nan)
            
            bland_altman_results.append({
                'Method': method,
                'Metric': metric,
                'N_Subjects': n_subjects,
                'rPPG_Mean': np.mean(rppg_vals),
                'GT_Mean': np.mean(gt_vals),
                'Mean_Difference': mean_diff,
                'Std_Difference': std_diff,
                'Upper_LoA': upper_limit,
                'Lower_LoA': lower_limit,
                'Mean_Percentage_Diff': mean_perc_diff,
                'Median_Percentage_Diff': median_perc_diff,
                'Correlation': correlation,
                'P_Value': p_value
            })

# Convert to DataFrame
bland_altman_df = pd.DataFrame(bland_altman_results)

# Display results with better formatting
print("\nBland-Altman Analysis Results:")
print("-" * 100)

# Create a formatted display
display_df = bland_altman_df.copy()
for col in ['rPPG_Mean', 'GT_Mean', 'Mean_Difference', 'Std_Difference', 
           'Upper_LoA', 'Lower_LoA', 'Mean_Percentage_Diff', 'Median_Percentage_Diff', 
           'Correlation']:
    if col in display_df.columns:
        display_df[col] = display_df[col].apply(lambda x: f"{x:.3f}" if not np.isnan(x) else "N/A")

# Format p-values
if 'P_Value' in display_df.columns:
    display_df['P_Value'] = display_df['P_Value'].apply(lambda x: f"{x:.4f}" if not np.isnan(x) else "N/A")

print(display_df.to_string(index=False))

# Analysis of methods within acceptable limits
print("\n" + "="*60)
print("ANALYSIS OF METHODS WITHIN ACCEPTABLE LIMITS")
print("="*60)

# Methods within 20% difference
within_20_percent = bland_altman_df[bland_altman_df['Mean_Percentage_Diff'] <= 20.0]
print(f"\nMethods within 20% mean percentage difference ({len(within_20_percent)} out of {len(bland_altman_df)}):")
if len(within_20_percent) > 0:
    print(within_20_percent[['Method', 'Metric', 'Mean_Percentage_Diff', 'Correlation', 'N_Subjects']].to_string(index=False))
else:
    print("No methods found within 20% difference threshold.")

# Methods within 10% difference (more stringent)
within_10_percent = bland_altman_df[bland_altman_df['Mean_Percentage_Diff'] <= 10.0]
print(f"\nMethods within 10% mean percentage difference ({len(within_10_percent)} out of {len(bland_altman_df)}):")
if len(within_10_percent) > 0:
    print(within_10_percent[['Method', 'Metric', 'Mean_Percentage_Diff', 'Correlation', 'N_Subjects']].to_string(index=False))
else:
    print("No methods found within 10% difference threshold.")

# Best performing methods (lowest percentage difference)
print(f"\nTop 5 best performing method-metric combinations (lowest mean percentage difference):")
best_methods = bland_altman_df.nsmallest(5, 'Mean_Percentage_Diff')
print(best_methods[['Method', 'Metric', 'Mean_Percentage_Diff', 'Correlation', 'N_Subjects']].to_string(index=False))

# # Generate Bland-Altman plots
# print("\n" + "="*50)
# print("GENERATING BLAND-ALTMAN PLOTS")
# print("="*50)

# for method in plot_data.keys():
#     for metric in plot_data[method].keys():
#         if metric in plot_data[method] and len(plot_data[method][metric]['rppg_values']) > 1:
#             rppg_vals = plot_data[method][metric]['rppg_values']
#             gt_vals = plot_data[method][metric]['gt_values']
            
#             # Get statistics info
#             stats_info = {'n_subjects': len(rppg_vals)}
            
#             # Create Bland-Altman plot
#             plot_bland_altman(rppg_vals, gt_vals, method, metric, stats_info)

# Summary statistics
print("\n" + "="*60)
print("SUMMARY STATISTICS")
print("="*60)

print(f"Total method-metric combinations analyzed: {len(bland_altman_df)}")
print(f"Mean percentage difference across all combinations: {bland_altman_df['Mean_Percentage_Diff'].mean():.2f}%")
print(f"Median percentage difference across all combinations: {bland_altman_df['Mean_Percentage_Diff'].median():.2f}%")
print(f"Best performing combination: {best_methods.iloc[0]['Method']} - {best_methods.iloc[0]['Metric']} ({best_methods.iloc[0]['Mean_Percentage_Diff']:.2f}%)")



CALCULATING BLAND-ALTMAN STATISTICS

Bland-Altman Analysis Results:
----------------------------------------------------------------------------------------------------
Method Metric  N_Subjects rPPG_Mean GT_Mean Mean_Difference Std_Difference Upper_LoA Lower_LoA Mean_Percentage_Diff Median_Percentage_Diff Correlation P_Value
   POS MeanNN          15   751.338 773.547         -22.209         74.106   123.038  -167.456                7.294                  4.575       0.204  0.4666
   POS   SDNN          14   236.463 221.220          15.243         59.929   132.703  -102.218               25.825                 16.724      -0.150  0.6096
   POS  RMSSD          14   317.357 284.553          32.804         77.900   185.488  -119.879               28.521                 20.980      -0.050  0.8654
   POS  pNN50          13    86.733  78.143           8.589          6.392    21.117    -3.939               12.349                 10.373       0.466  0.1086
   POS     LF          15     0.038

### Conclussion : 1 Minute Window

Stuff

In [28]:
## Store the rPPG hrv metrics into the csv
output_path = "rest_rppg_hrv_metrics_window-60s.csv"

chrom_hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
    'PR' : [],
}

for subject_id in hrv_means['CHROM'].keys():
    chrom_hrv_metrics['MeanNN'].append(hrv_means['CHROM'][subject_id]['MeanNN'])
    chrom_hrv_metrics['pNN50'].append(hrv_means['CHROM'][subject_id]['pNN50'])
    chrom_hrv_metrics['RMSSD'].append(hrv_means['CHROM'][subject_id]['RMSSD'])
    chrom_hrv_metrics['SDNN'].append(hrv_means['CHROM'][subject_id]['SDNN'])
    chrom_hrv_metrics['LF'].append(hrv_means['CHROM'][subject_id]['LF'])
    chrom_hrv_metrics['HF'].append(hrv_means['CHROM'][subject_id]['HF'])
    chrom_hrv_metrics['LF_HF'].append(hrv_means['CHROM'][subject_id]['LF_HF'])
    chrom_hrv_metrics['PR'].append(hrv_means['CHROM'][subject_id]['PR'])
    
## Convert the chrom_hrv_metrics to a DataFrame
chrom_df = pd.DataFrame(chrom_hrv_metrics)

## Add label Rest to the dataFrame
chrom_df['Label'] = 'Rest'

chrom_df.head()

## Save the DataFrame to a CSV file
chrom_df.to_csv(output_path, index=False)

---

# 2 Minute Plot Correlation

For 2 minute window, the averaging purpose will be done under windowing each short rPPG segment with the **strides** of 60 seconds (means the different between each short window is 60 seconds).

The test will be done under certain scenario of the Task 1, Task 2 UBFC, Physio Rest 2 and Rest 6

In [29]:
root_path = "UBFC-Phys"
subjects = ["s41", "s42", "s43", "s44","s45","s46","s47","s48","s49","s50","s51","s52", "s53","s54","s55","s56"]
tasks = ["T2"]

# Store ground truth and rPPG data
gt_data = {}
rppg_data = {
    'POS': {},
    'LGI': {},
    'OMIT': {},
    'GREEN': {},
    'CHROM': {}
}
# Expected sampling rates (adjust if different for your dataset)
sample_rate_gt = 64  # Hz
sample_rate_video = 35 # Hz


In [30]:
## Process for each subject and task
for subject in subjects:
    for task in tasks:
        subject_task_id = f"{subject}_{task}"

        # Load rPPG signals from different methods
        pos = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_POS_rppg.npy"))
        lgi = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_LGI_rppg.npy"))
        omit = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_OMIT_rppg.npy"))
        green = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_GREEN_rppg.npy"))
        chrom = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_CHROM_rppg.npy"))

        # Load ground truth BVP
        GT = pd.read_csv(os.path.join(root_path, subject, f"bvp_{subject}_{task}.csv")).values
        GT = GT.flatten()

        ## process rPPG signals
        rppg_data["POS"][subject_task_id] = preprocess_ppg(pos, fs=sample_rate_video)
        rppg_data["LGI"][subject_task_id] = preprocess_ppg(lgi, fs=sample_rate_video)
        rppg_data["OMIT"][subject_task_id] = preprocess_ppg(omit, fs=sample_rate_video)
        rppg_data["GREEN"][subject_task_id] = preprocess_ppg(green, fs=sample_rate_video)
        rppg_data["CHROM"][subject_task_id] = preprocess_ppg(chrom, fs=sample_rate_video)
        
        GT = preprocess_ppg(GT, fs=sample_rate_gt)
        gt_data[subject_task_id] = GT

print(f"Done Process the Signals")
    

Done Process the Signals


In [31]:
"""
Steps to reproduce getting the short term of 30 seconds for each subject + averaging:
1. Loop through each subject.
2. For each short rppg segment (30 seconds), compute the hrv metrics with the neurokit2 package and store it.
3. Average the HRV metrics across all segments for each subject.
4. Compare the correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics.
# Note: The above code is a preprocessing step. The next steps would involve calculating HRV metrics and performing correlation analysis.
""" 

## Iterate for each subject and compute HRV metrics
hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
    'SD1': [],
    'SD2': [],
    'PR' : [],
}

## Store the HRV metrics for each rPPG method for each subject
rppg_hrv_metrics = {
    method: {
        subject_id: {
            key: [] for key in hrv_metrics.keys()
        } for subject_id in rppg_data[method].keys()
    } for method in rppg_data.keys()
}

## Iterate through each subject and compute HRV for each segments
for rppg_method in rppg_data.keys():
    for subject_task_id, rppg_signal in rppg_data[rppg_method].items():
        print(f"Processing {subject_task_id} for {rppg_method}")

        ## Applied the window of 120 seconds with stride of 14 seconds
        segment_length = 120 * sample_rate_video
        stride_length = 45 * sample_rate_video
        
        ## Making the segments
        for start in range(0, len(rppg_signal) - segment_length + 1, stride_length):
            segment = rppg_signal[start:start + segment_length]
            ## If the segment is less than the segment length, we skip it
            if len(segment) < segment_length:
                continue

            ## Compute the HRV metrics using neurokit2
            signals, _ = nk.ppg_process(segment, sampling_rate=sample_rate_video)
            peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=sample_rate_video)

            ## Getting the HR and store it in the metrics
            rppg_hrv_metrics[rppg_method][subject_task_id]['PR'].append(signals['PPG_Rate'][0])

            # Getting the HRV Metrics
            ## Time Domain
            hrv_time = nk.hrv_time(peaks, sampling_rate=sample_rate_video)

            ## Add into the hrv_metrics dictionary
            rppg_hrv_metrics[rppg_method][subject_task_id]['MeanNN'].append(hrv_time['HRV_MeanNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SDNN'].append(hrv_time['HRV_SDNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['RMSSD'].append(hrv_time['HRV_RMSSD'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['pNN50'].append(hrv_time['HRV_pNN50'])

            ## Frequency Domain
            hrv_freq = nk.hrv_frequency(peaks, sampling_rate=sample_rate_video, psd_method="welch")
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF'].append(hrv_freq['HRV_LF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['HF'].append(hrv_freq['HRV_HF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF_HF'].append(hrv_freq['HRV_LFHF'])

            ## Non-Linear Domain
            hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=sample_rate_video)
            rppg_hrv_metrics[rppg_method][subject_task_id]['SD1'].append(hrv_non_linear['HRV_SD1'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SD2'].append(hrv_non_linear['HRV_SD2'])

Processing s41_T2 for POS
Processing s42_T2 for POS
Processing s43_T2 for POS
Processing s44_T2 for POS
Processing s45_T2 for POS
Processing s46_T2 for POS
Processing s47_T2 for POS
Processing s48_T2 for POS
Processing s49_T2 for POS
Processing s50_T2 for POS
Processing s51_T2 for POS
Processing s52_T2 for POS
Processing s53_T2 for POS
Processing s54_T2 for POS
Processing s55_T2 for POS
Processing s56_T2 for POS
Processing s41_T2 for LGI
Processing s42_T2 for LGI
Processing s43_T2 for LGI
Processing s44_T2 for LGI
Processing s45_T2 for LGI
Processing s46_T2 for LGI
Processing s47_T2 for LGI
Processing s48_T2 for LGI
Processing s49_T2 for LGI
Processing s50_T2 for LGI
Processing s51_T2 for LGI
Processing s52_T2 for LGI
Processing s53_T2 for LGI
Processing s54_T2 for LGI
Processing s55_T2 for LGI
Processing s56_T2 for LGI
Processing s41_T2 for OMIT
Processing s42_T2 for OMIT
Processing s43_T2 for OMIT
Processing s44_T2 for OMIT
Processing s45_T2 for OMIT
Processing s46_T2 for OMIT
Proces

In [32]:
### Calculate the average HRV metrics for each segment for each subject per method

hrv_means = {}
for method in rppg_hrv_metrics:
    hrv_means[method] = {}

    for subject in rppg_hrv_metrics[method]:
        hrv_means[method][subject] = {}

        for metric, values in rppg_hrv_metrics[method][subject].items():
            if values:
                hrv_means[method][subject][metric] = np.mean(values)
            else:
                hrv_means[method][subject][metric] = np.nan

print(hrv_means)

{'POS': {'s41_T2': {'MeanNN': 667.3908948194662, 'SDNN': 219.564381742051, 'RMSSD': 305.06539569508425, 'pNN50': 87.13186813186813, 'LF': 0.01801827834838813, 'HF': 0.05619744653155116, 'LF_HF': 0.4721303003617774, 'SD1': 216.3220050898831, 'SD2': 222.42536200647424, 'PR': 89.93361120244472}, 's42_T2': {'MeanNN': 787.660030455135, 'SDNN': 271.807862628764, 'RMSSD': 349.3901498171287, 'pNN50': 88.39540290389087, 'LF': 0.04510796298982983, 'HF': 0.09915229982985518, 'LF_HF': 0.4597705880683818, 'SD1': 247.85799325161997, 'SD2': 290.6258579189631, 'PR': 76.19037227774626}, 's43_T2': {'MeanNN': 731.869921338224, 'SDNN': 217.42572492173286, 'RMSSD': 295.3917228779323, 'pNN50': 84.2990229493297, 'LF': 0.053093677518173635, 'HF': 0.08703485506834974, 'LF_HF': 0.6243221664564603, 'SD1': 209.52130850798284, 'SD2': 225.0116009870079, 'PR': 81.98183329912719}, 's44_T2': {'MeanNN': 822.2591680350301, 'SDNN': 252.82522286143382, 'RMSSD': 328.0823242356288, 'pNN50': 89.95929118773947, 'LF': 0.059899

### Getting the GT HRV Metrics

In [33]:
# Compare the Correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics

## Getting the ground truth HRV metrics

gt_hrv_metrics = {
    subject_id: {
        key: [] for key in hrv_metrics.keys()
    } for subject_id in gt_data.keys()
}

# Iterate through each subject and compute the full length HRV metrics for the ground truth
for subject_task_id, gt_signal in gt_data.items():
    print(f"Processing {subject_task_id} for ground truth")

    ## Compute the HRV metrics using neurokit2
    signals, _ = nk.ppg_process(gt_signal, sampling_rate=sample_rate_gt)
    peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=sample_rate_gt)

    ## Getting the HR and store it in the metrics
    gt_hrv_metrics[subject_task_id]['PR'] = signals['PPG_Rate'][0].item()
    
    # Getting the HRV Metrics

    ## Time Domain
    hrv_time = nk.hrv_time(peaks, sampling_rate=sample_rate_gt)

    ## Add into the hrv_metrics dictionary
    gt_hrv_metrics[subject_task_id]['MeanNN'] = (hrv_time['HRV_MeanNN'][0])
    gt_hrv_metrics[subject_task_id]['SDNN'] = (hrv_time['HRV_SDNN'][0])
    gt_hrv_metrics[subject_task_id]['RMSSD'] = (hrv_time['HRV_RMSSD'][0])
    gt_hrv_metrics[subject_task_id]['pNN50'] = (hrv_time['HRV_pNN50'][0])

    ## Frequency Domain
    hrv_freq = nk.hrv_frequency(peaks, sampling_rate=sample_rate_gt, psd_method="welch")
    gt_hrv_metrics[subject_task_id]['LF'] = (hrv_freq['HRV_LF'][0])
    gt_hrv_metrics[subject_task_id]['HF'] = (hrv_freq['HRV_HF'][0])
    gt_hrv_metrics[subject_task_id]['LF_HF'] = (hrv_freq['HRV_LFHF'][0])

    ## Non-Linear Domain
    # hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=sample_rate_gt)
    # gt_hrv_metrics[subject_task_id]['SD1'] = (hrv_non_linear['HRV_SD1'])
    # gt_hrv_metrics[subject_task_id]['SD2'] = (hrv_non_linear['HRV_SD2'])

print(gt_hrv_metrics)

Processing s41_T2 for ground truth
Processing s42_T2 for ground truth


Processing s43_T2 for ground truth
Processing s44_T2 for ground truth
Processing s45_T2 for ground truth
Processing s46_T2 for ground truth
Processing s47_T2 for ground truth
Processing s48_T2 for ground truth
Processing s49_T2 for ground truth
Processing s50_T2 for ground truth
Processing s51_T2 for ground truth
Processing s52_T2 for ground truth
Processing s53_T2 for ground truth
Processing s54_T2 for ground truth
Processing s55_T2 for ground truth
Processing s56_T2 for ground truth
{'s41_T2': {'MeanNN': 787.8579295154185, 'SDNN': 269.85224785512537, 'RMSSD': 316.26943979043244, 'pNN50': 78.8546255506608, 'LF': 0.032217513284142976, 'HF': 0.061322128869085946, 'LF_HF': 0.5253815201511155, 'SD1': [], 'SD2': [], 'PR': 76.15586230997728}, 's42_T2': {'MeanNN': 769.598599137931, 'SDNN': 231.77899344565884, 'RMSSD': 330.28720960388716, 'pNN50': 82.32758620689656, 'LF': 0.04979405843996538, 'HF': 0.10549022541304848, 'LF_HF': 0.47202532978762757, 'SD1': [], 'SD2': [], 'PR': 77.9627198739826

### Since we already get the Metrics HRV value of the rPPG, let's compare it with the GT to see the correlation

In [34]:
def identify_outliers_iqr(data):
    """Identify outlier indices using the IQR method.
    
    Parameters:
    ----------
    data (list or numpy array): The data to check for outliers.
    
    Returns:
    --------
    numpy array: Boolean mask where True indicates outlier.
    """
    data = np.asarray(data)
    
    if len(data) == 0:
        return np.array([], dtype=bool)
    
    if len(data) == 1:
        return np.array([False])
    
    # Remove any NaN or infinite values before calculating percentiles
    clean_data = data[np.isfinite(data)]
    
    if len(clean_data) < 2:
        return np.array([False] * len(data))
    
    q1 = np.percentile(clean_data, 25)
    q3 = np.percentile(clean_data, 75)
    iqr = q3 - q1
    
    # Handle case where IQR is 0 (all values are the same)
    if iqr == 0:
        return np.array([False] * len(data))
    
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr
    
    outlier_mask = (data < lower_bound) | (data > upper_bound) | ~np.isfinite(data)
    return outlier_mask

# Compute correlation between rPPG methods and ground truth HRV metrics
correlation_results = {}
plot_data = {}  # Store clean data for plotting

for method in hrv_means.keys():
    correlation_results[method] = {}
    plot_data[method] = {}
    
    for metric in hrv_metrics.keys():
        # Collect paired data (subject_id, rppg_value, gt_value)
        paired_data = []
        
        for subject_id in hrv_means[method].keys():
            # Check if both rPPG and GT data exist for this subject and metric
            rppg_available = (subject_id in hrv_means[method] and 
                            metric in hrv_means[method][subject_id])
            gt_available = (subject_id in gt_hrv_metrics and 
                          metric in gt_hrv_metrics[subject_id])
            
            if rppg_available and gt_available:
                rppg_value = hrv_means[method][subject_id][metric]
                gt_value = gt_hrv_metrics[subject_id][metric]
                
                # Handle pandas Series if needed
                if isinstance(gt_value, pd.Series):
                    if not gt_value.empty:
                        gt_value = gt_value.iloc[0]
                    else:
                        continue
                
                # Handle numpy arrays - extract scalar value
                if isinstance(rppg_value, (np.ndarray, list)):
                    if len(rppg_value) > 0:
                        rppg_value = rppg_value[0] if hasattr(rppg_value, '__getitem__') else float(rppg_value)
                    else:
                        continue
                
                if isinstance(gt_value, (np.ndarray, list)):
                    if len(gt_value) > 0:
                        gt_value = gt_value[0] if hasattr(gt_value, '__getitem__') else float(gt_value)
                    else:
                        continue
                
                # Convert to float to ensure scalar values
                try:
                    rppg_value = float(rppg_value)
                    gt_value = float(gt_value)
                except (TypeError, ValueError):
                    print(f"Warning: Could not convert values to float for {subject_id} - {metric}")
                    print(f"  rPPG value type: {type(rppg_value)}, value: {rppg_value}")
                    print(f"  GT value type: {type(gt_value)}, value: {gt_value}")
                    continue
                
                # Check for valid values (now they're guaranteed to be scalars)
                if not np.isnan(rppg_value) and not np.isnan(gt_value):
                    paired_data.append((subject_id, rppg_value, gt_value))
        
        if len(paired_data) < 2:
            print(f"Insufficient data for {method} - {metric}: {len(paired_data)} subjects")
            continue
        
        # Extract values for outlier detection
        subject_ids = [item[0] for item in paired_data]
        rppg_values = np.array([item[1] for item in paired_data])
        gt_values = np.array([item[2] for item in paired_data])
        
        # Debug information
        print(f"Debug - {method} - {metric}:")
        print(f"  Total paired subjects: {len(paired_data)}")
        print(f"  rPPG values shape: {rppg_values.shape}")
        print(f"  GT values shape: {gt_values.shape}")
        print(f"  rPPG values: {rppg_values}")
        print(f"  GT values: {gt_values}")
        
        # Identify outliers in both datasets
        rppg_outliers = identify_outliers_iqr(rppg_values)
        gt_outliers = identify_outliers_iqr(gt_values)
        
        # Combine outlier masks (remove if outlier in either dataset)
        combined_outliers = rppg_outliers | gt_outliers
        
        # Keep only non-outlier subjects
        clean_mask = ~combined_outliers
        clean_rppg_values = rppg_values[clean_mask]
        clean_gt_values = gt_values[clean_mask]
        clean_subject_ids = [subject_ids[i] for i in range(len(subject_ids)) if clean_mask[i]]
        
        print(f"{method} - {metric}: Removed {np.sum(combined_outliers)} outliers, "
              f"kept {len(clean_rppg_values)} subjects")
        
        # Store clean data for plotting
        plot_data[method][metric] = {
            'rppg_values': clean_rppg_values,
            'gt_values': clean_gt_values,
            'subject_ids': clean_subject_ids
        }
        
        # Calculate correlation on clean data
        if len(clean_rppg_values) > 1:
            correlation, p_value = stats.pearsonr(clean_rppg_values, clean_gt_values)
            correlation_results[method][metric] = {
                'correlation': correlation,
                'p_value': p_value,
                'n_subjects': len(clean_rppg_values),
                'removed_subjects': np.sum(combined_outliers),
                'clean_subject_ids': clean_subject_ids
            }
        else:
            print(f"Insufficient clean data for {method} - {metric}")


Debug - POS - MeanNN:
  Total paired subjects: 16
  rPPG values shape: (16,)
  GT values shape: (16,)
  rPPG values: [ 667.39089482  787.66003046  731.86992134  822.25916804  756.32669623
  798.4645896   628.35573808  780.62312644  808.95303327  705.39530645
  807.1203787   722.83190606  730.00140885  747.62820513 3723.47826087
  771.15689286]
  GT values: [787.85792952 769.59859914 680.66777567 856.90789474 730.73979592
 777.78532609 813.35227273 814.58813364 750.65376569 685.22509579
 857.87978469 756.19725738 870.12195122 678.32623106 721.27016129
 773.30163043]
POS - MeanNN: Removed 1 outliers, kept 15 subjects
Debug - POS - SDNN:
  Total paired subjects: 16
  rPPG values shape: (16,)
  GT values shape: (16,)
  rPPG values: [ 219.56438174  271.80786263  217.42572492  252.82522286  258.084809
  247.17135041  190.99814816  244.67215359  290.2180754   222.86313881
  210.7841997   241.20330407  225.47044231  250.22517596 8529.5434031
  237.28662867]
  GT values: [269.85224786 231.77899

GREEN - LF: Removed 0 outliers, kept 16 subjects
Debug - GREEN - HF:
  Total paired subjects: 16
  rPPG values shape: (16,)
  GT values shape: (16,)
  rPPG values: [0.08517257 0.10572894 0.09218173 0.09132453 0.06664645 0.08672379
 0.06388269 0.12111624 0.08296899 0.09706275 0.01610856 0.04331527
 0.08850715 0.06257779 0.00761048 0.11254099]
  GT values: [0.06132213 0.10549023 0.02154463 0.10192714 0.07449392 0.08742311
 0.0852411  0.00401101 0.10444742 0.08899584 0.13321674 0.0637627
 0.02185027 0.09141328 0.09364793 0.07711066]
GREEN - HF: Removed 3 outliers, kept 13 subjects
Debug - GREEN - LF_HF:
  Total paired subjects: 16
  rPPG values shape: (16,)
  GT values shape: (16,)
  rPPG values: [ 0.58145464  0.369551    0.59608691  0.42672137  0.86752082  0.60771192
  0.73471013  0.29844121  0.75788383  0.68835544  5.50114532  0.96581532
  0.51574457  0.52186289 15.07180107  0.42310972]
  GT values: [0.52538152 0.47202533 0.72137967 0.50275067 0.28983182 0.3367417
 0.61765338 0.78478662

In [35]:
## Print the correlation results
for method, metrics in correlation_results.items():
    print(f"Method: {method}")
    for metric, result in metrics.items():
        print(f"  {metric}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")

Method: POS
  MeanNN: Correlation = 0.1845, p-value = 0.5104
  SDNN: Correlation = -0.1196, p-value = 0.6839
  RMSSD: Correlation = -0.0208, p-value = 0.9438
  pNN50: Correlation = 0.5010, p-value = 0.0811
  LF: Correlation = -0.1673, p-value = 0.5357
  HF: Correlation = 0.3085, p-value = 0.2832
  LF_HF: Correlation = 0.1399, p-value = 0.6485
  PR: Correlation = 0.3552, p-value = 0.2127


Method: LGI
  MeanNN: Correlation = 0.1581, p-value = 0.5735
  SDNN: Correlation = 0.0061, p-value = 0.9835
  RMSSD: Correlation = 0.2103, p-value = 0.4706
  pNN50: Correlation = 0.4269, p-value = 0.1457
  LF: Correlation = 0.0905, p-value = 0.7485
  HF: Correlation = 0.4024, p-value = 0.1537
  LF_HF: Correlation = 0.4648, p-value = 0.0940
  PR: Correlation = 0.1297, p-value = 0.6450


Method: OMIT
  MeanNN: Correlation = 0.1635, p-value = 0.5603
  SDNN: Correlation = -0.0247, p-value = 0.9331
  RMSSD: Correlation = 0.1586, p-value = 0.5882
  pNN50: Correlation = -0.0973, p-value = 0.7406
  LF: Correl

In [36]:
def plot_correlation_scatter(rppg_values, gt_values, method, metric, correlation_info=None):
    """ Plot the correlation scatter plot for rPPG values and ground truth values.
    
    Parameters:
    ----------
    rppg_values (list): List of rPPG values.
    gt_values (list): List of ground truth values.
    method (str): The rPPG method used.
    metric (str): The HRV metric being analyzed.
    correlation_info (dict): Dictionary containing correlation statistics.
    """
    plt.figure(figsize=(10, 8))
    
    # Create scatter plot
    sns.scatterplot(x=rppg_values, y=gt_values, s=80, alpha=0.7)
    
    # Add regression line
    sns.regplot(x=rppg_values, y=gt_values, scatter=False, color='red', 
                line_kws={"linewidth": 2, "label": "Regression Line"})
    
    # Add identity line (perfect correlation)
    min_val = min(min(rppg_values), min(gt_values))
    max_val = max(max(rppg_values), max(gt_values))
    plt.plot([min_val, max_val], [min_val, max_val], '--', color='gray', 
             alpha=0.8, linewidth=1, label='Perfect Correlation')
    
    # Set labels and title
    plt.xlabel(f"{method} {metric}", fontsize=12)
    plt.ylabel(f"Ground Truth {metric}", fontsize=12)
    
    # Add correlation statistics to title if available
    if correlation_info:
        corr = correlation_info.get('correlation', 0)
        p_val = correlation_info.get('p_value', 1)
        n_subj = correlation_info.get('n_subjects', len(rppg_values))
        title = f"{method} - {metric}\nr = {corr:.3f}, p = {p_val:.3f}, n = {n_subj}"
    else:
        # Calculate correlation if not provided
        if len(rppg_values) > 1:
            corr, p_val = stats.pearsonr(rppg_values, gt_values)
            title = f"{method} - {metric}\nr = {corr:.3f}, p = {p_val:.3f}, n = {len(rppg_values)}"
        else:
            title = f"{method} - {metric}"
    
    plt.title(title, fontsize=14, fontweight='bold')
    
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

# # Plot correlation scatter plots using the cleaned data
# print("\n" + "="*50)
# print("GENERATING CORRELATION PLOTS")
# print("="*50)

# for method in plot_data.keys():
#     for metric in plot_data[method].keys():
#         if metric in plot_data[method] and len(plot_data[method][metric]['rppg_values']) > 1:
#             rppg_vals = plot_data[method][metric]['rppg_values']
#             gt_vals = plot_data[method][metric]['gt_values']
            
#             # Get correlation info if available
#             corr_info = correlation_results.get(method, {}).get(metric, None)
            
#             # Create the plot
#             plot_correlation_scatter(rppg_vals, gt_vals, method, metric, corr_info)



In [37]:
# Calculate the top 5 features with the highest correlation for each rPPG method
top_features = {}
for method, metrics in correlation_results.items():
    sorted_metrics = sorted(metrics.items(), key=lambda x: abs(x[1]['correlation']), reverse=True)
    top_features[method] = sorted_metrics[:5]
print("Top 5 Features with Highest Correlation:")
for method, features in top_features.items():
    print(f"Method: {method}")
    for feature, result in features:
        print(f"  {feature}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")
    

Top 5 Features with Highest Correlation:
Method: POS
  pNN50: Correlation = 0.5010, p-value = 0.0811
  PR: Correlation = 0.3552, p-value = 0.2127
  HF: Correlation = 0.3085, p-value = 0.2832
  MeanNN: Correlation = 0.1845, p-value = 0.5104
  LF: Correlation = -0.1673, p-value = 0.5357


Method: LGI
  LF_HF: Correlation = 0.4648, p-value = 0.0940
  pNN50: Correlation = 0.4269, p-value = 0.1457
  HF: Correlation = 0.4024, p-value = 0.1537
  RMSSD: Correlation = 0.2103, p-value = 0.4706
  MeanNN: Correlation = 0.1581, p-value = 0.5735


Method: OMIT
  HF: Correlation = 0.5018, p-value = 0.0806
  LF_HF: Correlation = 0.3518, p-value = 0.2174
  MeanNN: Correlation = 0.1635, p-value = 0.5603
  RMSSD: Correlation = 0.1586, p-value = 0.5882
  PR: Correlation = 0.1316, p-value = 0.6402


Method: GREEN
  LF_HF: Correlation = -0.4649, p-value = 0.1094
  SDNN: Correlation = -0.3194, p-value = 0.2875
  RMSSD: Correlation = -0.1492, p-value = 0.6267
  LF: Correlation = -0.1313, p-value = 0.6278
  pN

### Check the Bland-Altman, to see the mean bias nad the interlva of the Limit of Aggrement, make sure the point fall within the LoA

In [38]:
# # Check the value of the rPPG and GT with the Bland-Altman plot and 
# # see the measurement agreement between the rPPG methods and the ground truth

# def plot_bland_altman(rppg_values, gt_values, method, metric):
#     """ Plot Bland-Altman plot for rPPG values against ground truth values """
#     mean_diff = np.mean(rppg_values - gt_values)
#     std_diff = np.std(rppg_values - gt_values)

#     plt.figure(figsize=(10, 6))
#     plt.scatter((rppg_values + gt_values) / 2, rppg_values - gt_values, alpha=0.5)
#     plt.axhline(mean_diff, color='red', linestyle='--', label='Mean Difference')
#     plt.axhline(mean_diff + 1.96 * std_diff, color='green', linestyle='--', label='Upper Limit of Agreement')
#     plt.axhline(mean_diff - 1.96 * std_diff, color='blue', linestyle='--', label='Lower Limit of Agreement')
    
#     plt.title(f'Bland-Altman Plot: {method} - {metric}')
#     plt.xlabel('Mean of rPPG and GT Values')
#     plt.ylabel('Difference (rPPG - GT)')
#     plt.legend()
#     plt.grid()
#     plt.show()

# # Plot Bland-Altman plots for each method and metric
# for method in rppg_hrv_metrics.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         for subject_id in rppg_hrv_metrics[method].keys():
#             # Use hrv_means for the rPPG values
#             if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
#                 rppg_values.append(hrv_means[method][subject_id][metric])
            
#             # For ground truth, get the first value from the list or calculate mean
#             if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
#                 if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
#                     gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

#         if len(rppg_values) > 0 and len(gt_values) > 0:
#             plot_bland_altman(np.array(rppg_values), np.array(gt_values), method, metric)


In [39]:
def calculate_bland_altman_stats(rppg_values, gt_values):
    """ Calculate the Bland-Altman statistics 
    
    Parameters:
    ----------
    rppg_values (array): rPPG measurement values
    gt_values (array): Ground truth values
    
    Returns:
    --------
    tuple: mean_diff, std_diff, upper_limit, lower_limit, mean_avg
    """
    rppg_values = np.array(rppg_values)
    gt_values = np.array(gt_values)
    
    # Calculate differences and averages
    differences = rppg_values - gt_values
    averages = (rppg_values + gt_values) / 2
    
    mean_diff = np.mean(differences)
    std_diff = np.std(differences, ddof=1)  # Use sample standard deviation
    mean_avg = np.mean(averages)
    
    # Calculate limits of agreement (1.96 * SD)
    upper_limit = mean_diff + 1.96 * std_diff
    lower_limit = mean_diff - 1.96 * std_diff
    
    return mean_diff, std_diff, upper_limit, lower_limit, mean_avg

def calculate_percentage_difference(rppg_values, gt_values):
    """ Calculate the percentage difference between rPPG and ground truth values 
    
    Parameters:
    ----------
    rppg_values (array): rPPG measurement values
    gt_values (array): Ground truth values
    
    Returns:
    --------
    tuple: mean_percentage_diff, median_percentage_diff
    """
    rppg_values = np.array(rppg_values)
    gt_values = np.array(gt_values)
    
    # Avoid division by zero
    mask = gt_values != 0
    if np.sum(mask) == 0:
        return np.nan, np.nan
    
    # Calculate percentage differences
    percentage_diff = np.abs((rppg_values[mask] - gt_values[mask]) / gt_values[mask]) * 100
    
    return np.mean(percentage_diff), np.median(percentage_diff)

def plot_bland_altman(rppg_values, gt_values, method, metric, stats_info=None):
    """ Plot Bland-Altman plot
    
    Parameters:
    ----------
    rppg_values (array): rPPG measurement values
    gt_values (array): Ground truth values
    method (str): Method name
    metric (str): Metric name
    stats_info (dict): Statistics information
    """
    rppg_values = np.array(rppg_values)
    gt_values = np.array(gt_values)
    
    # Calculate differences and averages
    differences = rppg_values - gt_values
    averages = (rppg_values + gt_values) / 2
    
    # Calculate statistics
    mean_diff, std_diff, upper_limit, lower_limit, _ = calculate_bland_altman_stats(rppg_values, gt_values)
    
    # Create the plot
    plt.figure(figsize=(10, 8))
    
    # Scatter plot
    plt.scatter(averages, differences, alpha=0.7, s=60)
    
    # Mean difference line
    plt.axhline(mean_diff, color='red', linestyle='-', linewidth=2, label=f'Mean Diff: {mean_diff:.3f}')
    
    # Limits of agreement
    plt.axhline(upper_limit, color='red', linestyle='--', linewidth=1.5, label=f'Upper LoA: {upper_limit:.3f}')
    plt.axhline(lower_limit, color='red', linestyle='--', linewidth=1.5, label=f'Lower LoA: {lower_limit:.3f}')
    
    # Zero line
    plt.axhline(0, color='black', linestyle='-', alpha=0.3, linewidth=1)
    
    # Labels and title
    plt.xlabel(f'Average of {method} and Ground Truth {metric}', fontsize=12)
    plt.ylabel(f'{method} - Ground Truth {metric}', fontsize=12)
    
    if stats_info:
        n_subj = stats_info.get('n_subjects', len(rppg_values))
        title = f'Bland-Altman Plot: {method} - {metric}\nn = {n_subj}, SD = {std_diff:.3f}'
    else:
        title = f'Bland-Altman Plot: {method} - {metric}\nn = {len(rppg_values)}, SD = {std_diff:.3f}'
    
    plt.title(title, fontsize=14, fontweight='bold')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()


In [40]:
# Calculate Bland-Altman statistics using the cleaned data from correlation analysis
print("\n" + "="*60)
print("CALCULATING BLAND-ALTMAN STATISTICS")
print("="*60)

bland_altman_results = []

for method in plot_data.keys():
    for metric in plot_data[method].keys():
        if metric in plot_data[method] and len(plot_data[method][metric]['rppg_values']) > 1:
            rppg_vals = plot_data[method][metric]['rppg_values']
            gt_vals = plot_data[method][metric]['gt_values']
            n_subjects = len(rppg_vals)
            
            # Calculate Bland-Altman statistics
            mean_diff, std_diff, upper_limit, lower_limit, mean_avg = calculate_bland_altman_stats(rppg_vals, gt_vals)
            
            # Calculate percentage differences
            mean_perc_diff, median_perc_diff = calculate_percentage_difference(rppg_vals, gt_vals)
            
            # Get correlation info if available
            corr_info = correlation_results.get(method, {}).get(metric, {})
            correlation = corr_info.get('correlation', np.nan)
            p_value = corr_info.get('p_value', np.nan)
            
            bland_altman_results.append({
                'Method': method,
                'Metric': metric,
                'N_Subjects': n_subjects,
                'rPPG_Mean': np.mean(rppg_vals),
                'GT_Mean': np.mean(gt_vals),
                'Mean_Difference': mean_diff,
                'Std_Difference': std_diff,
                'Upper_LoA': upper_limit,
                'Lower_LoA': lower_limit,
                'Mean_Percentage_Diff': mean_perc_diff,
                'Median_Percentage_Diff': median_perc_diff,
                'Correlation': correlation,
                'P_Value': p_value
            })

# Convert to DataFrame
bland_altman_df = pd.DataFrame(bland_altman_results)

# Display results with better formatting
print("\nBland-Altman Analysis Results:")
print("-" * 100)

# Create a formatted display
display_df = bland_altman_df.copy()
for col in ['rPPG_Mean', 'GT_Mean', 'Mean_Difference', 'Std_Difference', 
           'Upper_LoA', 'Lower_LoA', 'Mean_Percentage_Diff', 'Median_Percentage_Diff', 
           'Correlation']:
    if col in display_df.columns:
        display_df[col] = display_df[col].apply(lambda x: f"{x:.3f}" if not np.isnan(x) else "N/A")

# Format p-values
if 'P_Value' in display_df.columns:
    display_df['P_Value'] = display_df['P_Value'].apply(lambda x: f"{x:.4f}" if not np.isnan(x) else "N/A")

print(display_df.to_string(index=False))

# Analysis of methods within acceptable limits
print("\n" + "="*60)
print("ANALYSIS OF METHODS WITHIN ACCEPTABLE LIMITS")
print("="*60)

# Methods within 20% difference
within_20_percent = bland_altman_df[bland_altman_df['Mean_Percentage_Diff'] <= 20.0]
print(f"\nMethods within 20% mean percentage difference ({len(within_20_percent)} out of {len(bland_altman_df)}):")
if len(within_20_percent) > 0:
    print(within_20_percent[['Method', 'Metric', 'Mean_Percentage_Diff', 'Correlation', 'N_Subjects']].to_string(index=False))
else:
    print("No methods found within 20% difference threshold.")

# Methods within 10% difference (more stringent)
within_10_percent = bland_altman_df[bland_altman_df['Mean_Percentage_Diff'] <= 10.0]
print(f"\nMethods within 10% mean percentage difference ({len(within_10_percent)} out of {len(bland_altman_df)}):")
if len(within_10_percent) > 0:
    print(within_10_percent[['Method', 'Metric', 'Mean_Percentage_Diff', 'Correlation', 'N_Subjects']].to_string(index=False))
else:
    print("No methods found within 10% difference threshold.")

# Best performing methods (lowest percentage difference)
print(f"\nTop 5 best performing method-metric combinations (lowest mean percentage difference):")
best_methods = bland_altman_df.nsmallest(5, 'Mean_Percentage_Diff')
print(best_methods[['Method', 'Metric', 'Mean_Percentage_Diff', 'Correlation', 'N_Subjects']].to_string(index=False))

# # Generate Bland-Altman plots
# print("\n" + "="*50)
# print("GENERATING BLAND-ALTMAN PLOTS")
# print("="*50)

# for method in plot_data.keys():
#     for metric in plot_data[method].keys():
#         if metric in plot_data[method] and len(plot_data[method][metric]['rppg_values']) > 1:
#             rppg_vals = plot_data[method][metric]['rppg_values']
#             gt_vals = plot_data[method][metric]['gt_values']
            
#             # Get statistics info
#             stats_info = {'n_subjects': len(rppg_vals)}
            
#             # Create Bland-Altman plot
#             plot_bland_altman(rppg_vals, gt_vals, method, metric, stats_info)

# Summary statistics
print("\n" + "="*60)
print("SUMMARY STATISTICS")
print("="*60)

print(f"Total method-metric combinations analyzed: {len(bland_altman_df)}")
print(f"Mean percentage difference across all combinations: {bland_altman_df['Mean_Percentage_Diff'].mean():.2f}%")
print(f"Median percentage difference across all combinations: {bland_altman_df['Mean_Percentage_Diff'].median():.2f}%")
print(f"Best performing combination: {best_methods.iloc[0]['Method']} - {best_methods.iloc[0]['Metric']} ({best_methods.iloc[0]['Mean_Percentage_Diff']:.2f}%)")



CALCULATING BLAND-ALTMAN STATISTICS

Bland-Altman Analysis Results:
----------------------------------------------------------------------------------------------------
Method Metric  N_Subjects rPPG_Mean GT_Mean Mean_Difference Std_Difference Upper_LoA Lower_LoA Mean_Percentage_Diff Median_Percentage_Diff Correlation P_Value
   POS MeanNN          15   751.069 773.547         -22.478         75.346   125.200  -170.156                7.328                  4.412       0.184  0.5104
   POS   SDNN          14   238.281 221.220          17.060         58.901   132.507   -98.386               26.093                 18.107      -0.120  0.6839
   POS  RMSSD          14   317.529 284.553          32.976         77.474   184.826  -118.873               28.354                 20.356      -0.021  0.9438
   POS  pNN50          13    87.202  78.143           9.059          6.266    21.341    -3.223               12.620                 10.497       0.501  0.0811
   POS     LF          16     0.041

### Conclussion : 2 Minute Window

Stuff

In [41]:
## Store the rPPG hrv metrics into the csv
output_path = "rest_rppg_hrv_metrics_window-120s.csv"

chrom_hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
    'PR' : [],
}

for subject_id in hrv_means['CHROM'].keys():
    chrom_hrv_metrics['MeanNN'].append(hrv_means['CHROM'][subject_id]['MeanNN'])
    chrom_hrv_metrics['pNN50'].append(hrv_means['CHROM'][subject_id]['pNN50'])
    chrom_hrv_metrics['RMSSD'].append(hrv_means['CHROM'][subject_id]['RMSSD'])
    chrom_hrv_metrics['SDNN'].append(hrv_means['CHROM'][subject_id]['SDNN'])
    chrom_hrv_metrics['LF'].append(hrv_means['CHROM'][subject_id]['LF'])
    chrom_hrv_metrics['HF'].append(hrv_means['CHROM'][subject_id]['HF'])
    chrom_hrv_metrics['LF_HF'].append(hrv_means['CHROM'][subject_id]['LF_HF'])
    chrom_hrv_metrics['PR'].append(hrv_means['CHROM'][subject_id]['PR'])
    
## Convert the chrom_hrv_metrics to a DataFrame
chrom_df = pd.DataFrame(chrom_hrv_metrics)

## Add label Rest to the dataFrame
chrom_df['Label'] = 'Rest'

chrom_df.head()

## Save the DataFrame to a CSV file
chrom_df.to_csv(output_path, index=False)