### Study Correlation Plan

For the purpose of getting the HRV data, we will use the library Neurokit2 to handle the proceess to get the data short window and the full one.

### Flow of the Study

- Takes the Windowed version of the data (30 seconds, 1 minute and 2 minute)
- Calculate the HRV Metrics / Features
- Take the signal of the full length
- Take the study correlation

### HRV Metrics that we're going to use

| **Domain**     | **HRV Feature** | **Unit** | **Description**                                                                 |
|----------------|------------------|----------|----------------------------------------------------------------------------------|
| **Time**       | MeanNN           | ms       | Mean RR interval                                                                 |
|                | SDNN             | ms       | Standard deviation of the RR intervals                                           |
|                | NN50             | -        | Number of pairs of differences between adjacent RR intervals > 50 ms             |
|                | pNN50            | %        | NN50 count divided by the total number of all RR intervals                       |
|                | RMSSD            | ms       | Root mean square of successive RR interval differences                           |
|                | MeanHR           | bpm      | Mean heart rate                                                                  |
|                | SDHR             | bpm      | Standard deviation of the heart rate                                             |
| **Frequency**  | LF               | ms²      | Power of low frequency band (0.04–0.15 Hz)                                       |
|                | HF               | ms²      | Power of high frequency band (0.15–0.4 Hz)                                       |
|                | LF/HF            | -        | Ratio of LF to HF                                                                |
| **Non-linear**  | CSI              | -        | Cardiac sympathetic index                                                        |
|                | CVI              | -        | Cardiac vagal index                                                              |
|                | SD1              | -        | Standard deviation of Poincaré plot projection on the line perpendicular to line y=x |
|                | SD2              | -        | Standard deviation of Poincaré plot projection on the line y=x                  |


### Setup Requirements

In [106]:
# UST HRV and Normal HRV Correlation Analysis for Stress Detection
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import os
from glob import glob
import warnings
import neurokit2 as nk
warnings.filterwarnings('ignore')

# Set plot style
plt.style.use('ggplot')
sns.set(font_scale=1.2)
sns.set_style("whitegrid")

In [145]:
import scipy 

def preprocess_ppg(signal, fs = 35):
    """ Computes the Preprocessed PPG Signal, this steps include the following:
        1. Moving Average Smoothing
        2. Bandpass Filtering
        
        Parameters:
        ----------
        signal (numpy array): 
            The PPG Signal to be preprocessed
        fs (float): 
            The Sampling Frequency of the Signal
            
        Returns:
        --------
        numpy array: 
            The Preprocessed PPG Signal
    
    """ 

    # 2. Bandpass filter to isolate the cardiac component (0.4-2.5 Hz)
    b_bp, a_bp = scipy.signal.butter(3, [0.7, 2.5], btype='band', fs=fs)
    filtered = scipy.signal.filtfilt(b_bp, a_bp, signal)
    
    # # 3. Upsample the signal to 100Hz (better temporal resolution for peak detection)
    # upsampled = nk.signal_resample(filtered, sampling_rate=fs, desired_sampling_rate=100)

    # window_size = int(100 * 0.1)  # 100ms window at 100Hz
    # if window_size % 2 == 0:      # ensure odd window size for centered smoothing
    #     window_size += 1
    # smoothed = scipy.signal.savgol_filter(upsampled, window_size, 3)

    return filtered

# 30 Seconds Plot Correlation

For 30 seconds window, the averaging purpose will be done under windowing each short rPPG segment with the **strides** of 15 seconds (means the different between each short window is 15 seconds).

The test will be done under certain scenario of the Task 1, Task 2 UBFC, Physio Rest 2 and Rest 6

In [146]:
root_path = "UBFC-Phys"
subjects = ["s41", "s42", "s43", "s44","s45","s46","s47","s48","s49","s50","s51","s52", "s53","s54","s55","s56"]
tasks = ["T1"]

# Store ground truth and rPPG data
gt_data = {}
rppg_data = {
    'POS': {},
    'LGI': {},
    'OMIT': {},
    'GREEN': {},
    'CHROM': {}
}
# Expected sampling rates (adjust if different for your dataset)
sample_rate_gt = 64  # Hz
sample_rate_video = 35 # Hz
desired_sample_rate = 100  # Hz


In [147]:
## Process for each subject and task
for subject in subjects:
    for task in tasks:
        subject_task_id = f"{subject}_{task}"

        # Load rPPG signals from different methods
        pos = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_POS_rppg.npy"))
        lgi = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_LGI_rppg.npy"))
        omit = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_OMIT_rppg.npy"))
        green = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_GREEN_rppg.npy"))
        chrom = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_CHROM_rppg.npy"))

        # Load ground truth BVP
        GT = pd.read_csv(os.path.join(root_path, subject, f"bvp_{subject}_{task}.csv")).values
        GT = GT.flatten()

        ## process rPPG signals
        rppg_data["POS"][subject_task_id] = preprocess_ppg(pos, fs=sample_rate_video)
        rppg_data["LGI"][subject_task_id] = preprocess_ppg(lgi, fs=sample_rate_video)
        rppg_data["OMIT"][subject_task_id] = preprocess_ppg(omit, fs=sample_rate_video)
        rppg_data["GREEN"][subject_task_id] = preprocess_ppg(green, fs=sample_rate_video)
        rppg_data["CHROM"][subject_task_id] = preprocess_ppg(chrom, fs=sample_rate_video)
        
        GT = preprocess_ppg(GT, fs=sample_rate_gt)
        gt_data[subject_task_id] = GT

print(f"Done Process the Signals")
    

Done Process the Signals


In [157]:
"""
Steps to reproduce getting the short term of 30 seconds for each subject + averaging:
1. Loop through each subject.
2. For each short rppg segment (30 seconds), compute the hrv metrics with the neurokit2 package and store it.
3. Average the HRV metrics across all segments for each subject.
4. Compare the correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics.
# Note: The above code is a preprocessing step. The next steps would involve calculating HRV metrics and performing correlation analysis.
""" 

## Iterate for each subject and compute HRV metrics
hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
}

## Store the HRV metrics for each rPPG method for each subject
rppg_hrv_metrics = {
    method: {
        subject_id: {
            key: [] for key in hrv_metrics.keys()
        } for subject_id in rppg_data[method].keys()
    } for method in rppg_data.keys()
}

## Iterate through each subject and compute HRV for each segments
for rppg_method in rppg_data.keys():
    for subject_task_id, rppg_signal in rppg_data[rppg_method].items():
        print(f"Processing {subject_task_id} for {rppg_method}")

        ## Applied the window of 30 seconds with stride of 15 seconds
        segment_length = 30 * desired_sample_rate
        stride_length = 15 * desired_sample_rate
        
        ## Making the segments
        for start in range(0, len(rppg_signal) - segment_length + 1, stride_length):
            segment = rppg_signal[start:start + segment_length]
            ## If the segment is less than the segment length, we skip it
            if len(segment) < segment_length:
                continue

            ## Compute the HRV metrics using neurokit2
            signals, _ = nk.ppg_process(segment, sampling_rate=sample_rate_video)
            peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=sample_rate_video)

            # Getting the HRV Metrics

            ## Time Domain
            hrv_time = nk.hrv_time(peaks, sampling_rate=sample_rate_video)

            ## Add into the hrv_metrics dictionary
            rppg_hrv_metrics[rppg_method][subject_task_id]['MeanNN'].append(hrv_time['HRV_MeanNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SDNN'].append(hrv_time['HRV_SDNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['RMSSD'].append(hrv_time['HRV_RMSSD'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['pNN50'].append(hrv_time['HRV_pNN50'])

            ## Frequency Domain
            hrv_freq = nk.hrv_frequency(peaks, sampling_rate=sample_rate_video, psd_method="welch")
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF'].append(hrv_freq['HRV_LF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['HF'].append(hrv_freq['HRV_HF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF_HF'].append(hrv_freq['HRV_LFHF'])

            ## Non-Linear Domain
            # hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=sample_rate_video)
            # rppg_hrv_metrics[rppg_method][subject_task_id]['SD1'].append(hrv_non_linear['HRV_SD1'])
            # rppg_hrv_metrics[rppg_method][subject_task_id]['SD2'].append(hrv_non_linear['HRV_SD2'])

Processing s41_T1 for POS
Processing s42_T1 for POS
Processing s43_T1 for POS
Processing s44_T1 for POS
Processing s45_T1 for POS
Processing s46_T1 for POS
Processing s47_T1 for POS
Processing s48_T1 for POS
Processing s49_T1 for POS
Processing s50_T1 for POS
Processing s51_T1 for POS
Processing s52_T1 for POS
Processing s53_T1 for POS
Processing s54_T1 for POS
Processing s55_T1 for POS
Processing s56_T1 for POS
Processing s41_T1 for LGI
Processing s42_T1 for LGI
Processing s43_T1 for LGI
Processing s44_T1 for LGI
Processing s45_T1 for LGI
Processing s46_T1 for LGI
Processing s47_T1 for LGI
Processing s48_T1 for LGI
Processing s49_T1 for LGI
Processing s50_T1 for LGI
Processing s51_T1 for LGI
Processing s52_T1 for LGI
Processing s53_T1 for LGI
Processing s54_T1 for LGI
Processing s55_T1 for LGI
Processing s56_T1 for LGI
Processing s41_T1 for OMIT
Processing s42_T1 for OMIT
Processing s43_T1 for OMIT
Processing s44_T1 for OMIT
Processing s45_T1 for OMIT
Processing s46_T1 for OMIT
Proces

In [158]:
### Calculate the average HRV metrics for each segment for each subject per method

hrv_means = {}
for method in rppg_hrv_metrics:
    hrv_means[method] = {}

    for subject in rppg_hrv_metrics[method]:
        hrv_means[method][subject] = {}

        for metric, values in rppg_hrv_metrics[method][subject].items():
            if values:
                hrv_means[method][subject][metric] = np.mean(values)
            else:
                hrv_means[method][subject][metric] = np.nan

print(hrv_means)

{'POS': {'s41_T1': {'MeanNN': 640.0896634229968, 'SDNN': 137.1220284647812, 'RMSSD': 197.5482589075215, 'pNN50': 67.57057757057756, 'LF': 0.033722474125948594, 'HF': 0.10389434641613032, 'LF_HF': 0.32356178303433686}, 's42_T1': {'MeanNN': 841.1707629115502, 'SDNN': 167.87931434426187, 'RMSSD': 211.69034103357228, 'pNN50': 75.12091006408399, 'LF': 0.02070623437459827, 'HF': 0.09391026410406207, 'LF_HF': 0.2097883012970804}, 's43_T1': {'MeanNN': 661.492258961816, 'SDNN': 154.63663709307136, 'RMSSD': 211.2979351388303, 'pNN50': 74.74667274512198, 'LF': 0.06460360524073014, 'HF': 0.08554058145506256, 'LF_HF': 0.7835743624182969}, 's44_T1': {'MeanNN': 848.793535283529, 'SDNN': 251.69938348276378, 'RMSSD': 325.0467861135897, 'pNN50': 87.55676841183208, 'LF': 0.04476387841809119, 'HF': 0.11564989187781943, 'LF_HF': 0.4037539662317912}, 's45_T1': {'MeanNN': 780.2202327471869, 'SDNN': 153.40842073960198, 'RMSSD': 204.00571723359454, 'pNN50': 78.71119097534192, 'LF': 0.027234885812153747, 'HF': 

### Getting the GT HRV Metrics

In [159]:
# Compare the Correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics

## Getting the ground truth HRV metrics

gt_hrv_metrics = {
    subject_id: {
        key: [] for key in hrv_metrics.keys()
    } for subject_id in gt_data.keys()
}

# Iterate through each subject and compute the full length HRV metrics for the ground truth
for subject_task_id, gt_signal in gt_data.items():
    print(f"Processing {subject_task_id} for ground truth")

    ## Compute the HRV metrics using neurokit2
    signals, _ = nk.ppg_process(gt_signal, sampling_rate=sample_rate_gt)
    peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=sample_rate_gt)

    # Getting the HRV Metrics

    ## Time Domain
    hrv_time = nk.hrv_time(peaks, sampling_rate=sample_rate_gt)

    ## Add into the hrv_metrics dictionary
    gt_hrv_metrics[subject_task_id]['MeanNN'] = (hrv_time['HRV_MeanNN'])
    gt_hrv_metrics[subject_task_id]['SDNN'] = (hrv_time['HRV_SDNN'])
    gt_hrv_metrics[subject_task_id]['RMSSD'] = (hrv_time['HRV_RMSSD'])
    gt_hrv_metrics[subject_task_id]['pNN50'] = (hrv_time['HRV_pNN50'])

    ## Frequency Domain
    hrv_freq = nk.hrv_frequency(peaks, sampling_rate=sample_rate_gt, psd_method="welch")
    gt_hrv_metrics[subject_task_id]['LF'] = (hrv_freq['HRV_LF'])
    gt_hrv_metrics[subject_task_id]['HF'] = (hrv_freq['HRV_HF'])
    gt_hrv_metrics[subject_task_id]['LF_HF'] = (hrv_freq['HRV_LFHF'])

    ## Non-Linear Domain
    # hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=sample_rate_gt)
    # gt_hrv_metrics[subject_task_id]['SD1'] = (hrv_non_linear['HRV_SD1'])
    # gt_hrv_metrics[subject_task_id]['SD2'] = (hrv_non_linear['HRV_SD2'])



Processing s41_T1 for ground truth
Processing s42_T1 for ground truth
Processing s43_T1 for ground truth
Processing s44_T1 for ground truth
Processing s45_T1 for ground truth
Processing s46_T1 for ground truth
Processing s47_T1 for ground truth
Processing s48_T1 for ground truth
Processing s49_T1 for ground truth
Processing s50_T1 for ground truth
Processing s51_T1 for ground truth
Processing s52_T1 for ground truth
Processing s53_T1 for ground truth
Processing s54_T1 for ground truth
Processing s55_T1 for ground truth
Processing s56_T1 for ground truth


### Since we already get the Metrics HRV value of the rPPG, let's compare it with the GT to see the correlation

In [160]:
# First thing first is we need to remove the outlier from rppg, 
# and make to remove the same subjects from the ground truth as well
# Process of removing the outlier itself, is also done under the IQR method
def remove_outliers_iqr(data):
    """ Remove outliers using the IQR method.
    
    Parameters:
    ----------
    data (list or numpy array): The data from which to remove outliers.
    
    Returns:
    --------
    numpy array: Data with outliers removed.
    """
    data = np.asarray(data)  
    
    if len(data) == 0:
        return np.array([])

    q1 = np.percentile(data, 25)
    q3 = np.percentile(data, 75)
    iqr = q3 - q1
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr

    return np.array([x for x in data if lower_bound <= x <= upper_bound])

# Compute correlation between rPPG methods and ground truth HRV metrics
correlation_results = {}

for method in hrv_means.keys():
    correlation_results[method] = {}
    
    for metric in hrv_metrics.keys():
        # Collect all values for this metric across subjects
        all_metric_values = []
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value):
                    all_metric_values.append(value)
        
        # Remove outlier subjects for this metric
        cleaned_values = remove_outliers_iqr(all_metric_values)
        
        # Prepare data for correlation
        rppg_values = []
        gt_values = []
        
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value) and value in cleaned_values:
                    # Subject is not an outlier, include in analysis
                    rppg_values.append(value)
                    
                    # Add corresponding ground truth
                    if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                        if not gt_hrv_metrics[subject_id][metric].empty:
                            gt_value = gt_hrv_metrics[subject_id][metric][0] if isinstance(gt_hrv_metrics[subject_id][metric], pd.Series) else gt_hrv_metrics[subject_id][metric]
                            gt_values.append(gt_value)
        
        # Calculate correlation
        if len(rppg_values) > 1 and len(gt_values) > 1:
            correlation, p_value = stats.pearsonr(rppg_values, gt_values)
            correlation_results[method][metric] = {
                'correlation': correlation,
                'p_value': p_value,
                'n_subjects': len(rppg_values)
            }

In [161]:
## Print the correlation results
for method, metrics in correlation_results.items():
    print(f"Method: {method}")
    for metric, result in metrics.items():
        print(f"  {metric}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")

Method: POS
  MeanNN: Correlation = 0.6353, p-value = 0.0082
  SDNN: Correlation = 0.2266, p-value = 0.4567
  RMSSD: Correlation = 0.1640, p-value = 0.5924
  pNN50: Correlation = 0.3631, p-value = 0.1668
  LF: Correlation = 0.5189, p-value = 0.0394
  HF: Correlation = -0.4269, p-value = 0.0991
  LF_HF: Correlation = 0.0416, p-value = 0.8783


Method: LGI
  MeanNN: Correlation = 0.7119, p-value = 0.0020
  SDNN: Correlation = 0.2793, p-value = 0.3334
  RMSSD: Correlation = 0.2436, p-value = 0.4013
  pNN50: Correlation = 0.4136, p-value = 0.1113
  LF: Correlation = 0.5427, p-value = 0.0298
  HF: Correlation = 0.1236, p-value = 0.6484
  LF_HF: Correlation = 0.3519, p-value = 0.1813


Method: OMIT
  MeanNN: Correlation = 0.7090, p-value = 0.0021
  SDNN: Correlation = 0.2815, p-value = 0.3295
  RMSSD: Correlation = 0.2471, p-value = 0.3943
  pNN50: Correlation = 0.4335, p-value = 0.0934
  LF: Correlation = 0.5819, p-value = 0.0229
  HF: Correlation = -0.0618, p-value = 0.8200
  LF_HF: Correl

In [162]:
# ### Plot the correlation scatter plots for each method and metric
# def plot_correlation_scatter(rppg_values, gt_values, method, metric):
#     """ Plot the correlation scatter plot for rPPG values and ground truth values.
    
#     Parameters:
#     ----------
#     rppg_values (list): List of rPPG values.
#     gt_values (list): List of ground truth values.
#     method (str): The rPPG method used.
#     metric (str): The HRV metric being analyzed.
#     """
#     plt.figure(figsize=(8, 6))
#     sns.scatterplot(x=rppg_values, y=gt_values)
#     plt.title(f"{method} - {metric} Correlation")
#     plt.xlabel(f"{method} {metric}")
#     plt.ylabel(f"Ground Truth {metric}")
    
#     # Fit a regression line
#     sns.regplot(x=rppg_values, y=gt_values, scatter=False, color='red', line_kws={"label": "Fit Line"})
    
#     plt.legend()
#     plt.grid(True)
#     plt.show()

# # Plot the correlation scatter plots for each method and metric
# for method in hrv_means.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         # Collect values for plotting
#         for subject_id in hrv_means[method].keys():
#             if subject_id in rppg_hrv_metrics[method] and metric in rppg_hrv_metrics[method][subject_id]:
#                 original_values = rppg_hrv_metrics[method][subject_id][metric]
#                 cleaned_values = remove_outliers_iqr(original_values)
                
#                 if len(cleaned_values) > 0:
#                     rppg_value = np.mean(cleaned_values)
                    
#                     gt_hrv_temp = gt_hrv_metrics.get(subject_id, {})
#                     if metric in gt_hrv_temp and not gt_hrv_temp[metric].empty:
#                         gt_value = gt_hrv_temp[metric][0] if isinstance(gt_hrv_temp[metric], pd.Series) else gt_hrv_temp[metric]
                        
#                         rppg_values.append(rppg_value)
#                         gt_values.append(gt_value)

#         # Plot if we have enough data points
#         if len(rppg_values) > 1 and len(gt_values) > 1:
#             plot_correlation_scatter(rppg_values, gt_values, method, metric)

In [163]:
# Calculate the top 5 features with the highest correlation for each rPPG method
top_features = {}
for method, metrics in correlation_results.items():
    sorted_metrics = sorted(metrics.items(), key=lambda x: abs(x[1]['correlation']), reverse=True)
    top_features[method] = sorted_metrics[:5]
print("Top 5 Features with Highest Correlation:")
for method, features in top_features.items():
    print(f"Method: {method}")
    for feature, result in features:
        print(f"  {feature}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")
    

Top 5 Features with Highest Correlation:
Method: POS
  MeanNN: Correlation = 0.6353, p-value = 0.0082
  LF: Correlation = 0.5189, p-value = 0.0394
  HF: Correlation = -0.4269, p-value = 0.0991
  pNN50: Correlation = 0.3631, p-value = 0.1668
  SDNN: Correlation = 0.2266, p-value = 0.4567


Method: LGI
  MeanNN: Correlation = 0.7119, p-value = 0.0020
  LF: Correlation = 0.5427, p-value = 0.0298
  pNN50: Correlation = 0.4136, p-value = 0.1113
  LF_HF: Correlation = 0.3519, p-value = 0.1813
  SDNN: Correlation = 0.2793, p-value = 0.3334


Method: OMIT
  MeanNN: Correlation = 0.7090, p-value = 0.0021
  LF: Correlation = 0.5819, p-value = 0.0229
  pNN50: Correlation = 0.4335, p-value = 0.0934
  SDNN: Correlation = 0.2815, p-value = 0.3295
  RMSSD: Correlation = 0.2471, p-value = 0.3943


Method: GREEN
  MeanNN: Correlation = 0.7793, p-value = 0.0006
  HF: Correlation = -0.4486, p-value = 0.0814
  RMSSD: Correlation = 0.4353, p-value = 0.1048
  SDNN: Correlation = 0.4209, p-value = 0.1182
  L

### Check the Bland-Altman, to see the mean bias nad the interlva of the Limit of Aggrement, make sure the point fall within the LoA

In [155]:
# # Check the value of the rPPG and GT with the Bland-Altman plot and 
# # see the measurement agreement between the rPPG methods and the ground truth

# def plot_bland_altman(rppg_values, gt_values, method, metric):
#     """ Plot Bland-Altman plot for rPPG values against ground truth values """
#     mean_diff = np.mean(rppg_values - gt_values)
#     std_diff = np.std(rppg_values - gt_values)

#     plt.figure(figsize=(10, 6))
#     plt.scatter((rppg_values + gt_values) / 2, rppg_values - gt_values, alpha=0.5)
#     plt.axhline(mean_diff, color='red', linestyle='--', label='Mean Difference')
#     plt.axhline(mean_diff + 1.96 * std_diff, color='green', linestyle='--', label='Upper Limit of Agreement')
#     plt.axhline(mean_diff - 1.96 * std_diff, color='blue', linestyle='--', label='Lower Limit of Agreement')
    
#     plt.title(f'Bland-Altman Plot: {method} - {metric}')
#     plt.xlabel('Mean of rPPG and GT Values')
#     plt.ylabel('Difference (rPPG - GT)')
#     plt.legend()
#     plt.grid()
#     plt.show()

# # Plot Bland-Altman plots for each method and metric
# for method in rppg_hrv_metrics.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         for subject_id in rppg_hrv_metrics[method].keys():
#             # Use hrv_means for the rPPG values
#             if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
#                 rppg_values.append(hrv_means[method][subject_id][metric])
            
#             # For ground truth, get the first value from the list or calculate mean
#             if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
#                 if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
#                     gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

#         if len(rppg_values) > 0 and len(gt_values) > 0:
#             plot_bland_altman(np.array(rppg_values), np.array(gt_values), method, metric)


In [156]:
## Calculate the mean bias, average, standard deviation and the interval of the LOA
## Put inside the table and show the results
## Calculate the LoA percentage for each method and metric and see if the percentage is within 20% difference

def calculate_bland_altman_stats(rppg_values, gt_values):
    """ Calculate the Bland-Altman statistics """
    mean_diff = np.mean(rppg_values - gt_values)
    std_diff = np.std(rppg_values - gt_values)
    
    upper_limit = mean_diff + 1.96 * std_diff
    lower_limit = mean_diff - 1.96 * std_diff
    
    return mean_diff, std_diff, upper_limit, lower_limit

def calculate_percentage_difference(rppg_values, gt_values):
    """ Calculate the percentage difference between rPPG and ground truth values """
    percentage_diff = np.abs((rppg_values - gt_values) / gt_values) * 100
    return np.mean(percentage_diff)

# Prepare the results table
results_table = []  
for method in rppg_hrv_metrics.keys():
    for metric in hrv_metrics.keys():
        rppg_values = []
        gt_values = []

        for subject_id in rppg_hrv_metrics[method].keys():
            # Use hrv_means for the rPPG values
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                rppg_values.append(hrv_means[method][subject_id][metric])
            
            # For ground truth, get the first value from the list or calculate mean
            if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
                    gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

        if len(rppg_values) > 0 and len(gt_values) > 0:
            mean_diff, std_diff, upper_limit, lower_limit = calculate_bland_altman_stats(np.array(rppg_values), np.array(gt_values))
            percentage_diff = calculate_percentage_difference(np.array(rppg_values), np.array(gt_values))

            results_table.append({
                'Method': method,
                'Metric': metric,
                'Mean Average': np.mean(rppg_values),
                'Ground Truth Average': np.mean(gt_values),
                'Mean Difference': mean_diff,
                'Standard Deviation': std_diff,
                'Upper Limit of Agreement': upper_limit,
                'Lower Limit of Agreement': lower_limit,
                'Percentage Difference': percentage_diff
            })
# Convert results to DataFrame for better visualization
results_df = pd.DataFrame(results_table)
# Display the results
print("\nBland-Altman Results:")
print(results_df)



Bland-Altman Results:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN    568.775533            525.278046        43.497487   
1     POS    SDNN    211.140513            107.000090       104.140422   
2     POS   RMSSD    294.984797            128.965127       166.019671   
3     POS   pNN50     68.743352             31.559423        37.183929   
4     POS      LF           NaN              0.034197              NaN   
5     POS      HF      0.117774              0.063167         0.054607   
6     POS   LF_HF           NaN              0.652155              NaN   
7     LGI  MeanNN    548.920422            525.278046        23.642376   
8     LGI    SDNN    185.302311            107.000090        78.302221   
9     LGI   RMSSD    264.175075            128.965127       135.209948   
10    LGI   pNN50     67.581010             31.559423        36.021587   
11    LGI      LF           NaN              0.034197              NaN   
12    LGI      

In [119]:
### Calculate which methods are within 20% difference and the best in terms of minimal percentage difference
within_20_percent = results_df[results_df['Percentage Difference'] <= 20]
print("\nMethods within 20% difference:")
print(within_20_percent)


Methods within 20% difference:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN    764.797798             778.41319       -13.615392   
7     LGI  MeanNN    767.500524             778.41319       -10.912667   
14   OMIT  MeanNN    765.015051             778.41319       -13.398139   
21  GREEN  MeanNN    817.050344             778.41319        38.637154   
28  CHROM  MeanNN    763.153262             778.41319       -15.259928   

    Standard Deviation  Upper Limit of Agreement  Lower Limit of Agreement  \
0            89.589969                161.980948               -189.211731   
7            92.760654                170.898215               -192.723549   
14           90.303552                163.596823               -190.393102   
21          151.310433                335.205603               -257.931295   
28          100.228273                181.187488               -211.707344   

    Percentage Difference  
0                7.897974 

### Conclussion : 30 Seconds window

The study correlation within the 30 seconds rppg hrv metrics compare to the GT shows weak / moderate relation with the GT.

Using the bland-altman itself it shows one feature. The MeanNN (time it takes between each heart beat) have acceptable agreement with the reference based on your 20% threshold.

In [120]:
## Store the rPPG hrv metrics into the csv
output_path = "rest_rppg_hrv_metrics_window-30s.csv"

## Convert the feature of the CHROM within the HRV Means to be the DataFrame
#   MeanNN: Correlation = 0.6109, p-value = 0.0119
#   SD1: Correlation = 0.5190, p-value = 0.0474
#   RMSSD: Correlation = 0.5185, p-value = 0.0477
#   LF: Correlation = 0.3975, p-value = 0.1423
#   SDNN: Correlation = 0.3676, p-value = 0.1959
## Take only the CHROM method and the MeanNN, SD1, RMSSD, LF, SDNN
chrom_hrv_metrics = {
    'MeanNN': [],
    'pNN50': [],
    'RMSSD': [],
    'SDNN': []
}

for subject_id in hrv_means['CHROM'].keys():
    chrom_hrv_metrics['MeanNN'].append(hrv_means['CHROM'][subject_id]['MeanNN'])
    chrom_hrv_metrics['pNN50'].append(hrv_means['CHROM'][subject_id]['pNN50'])
    chrom_hrv_metrics['RMSSD'].append(hrv_means['CHROM'][subject_id]['RMSSD'])
    chrom_hrv_metrics['SDNN'].append(hrv_means['CHROM'][subject_id]['SDNN'])

## Convert the chrom_hrv_metrics to a DataFrame
chrom_df = pd.DataFrame(chrom_hrv_metrics)

## Add label Rest to the dataFrame
chrom_df['Label'] = 'Rest'

chrom_df.head()

## Save the DataFrame to a CSV file
chrom_df.to_csv(output_path, index=False)

---

# 1 Minute Plot Correlation

For 1 minute window, the averaging purpose will be done under windowing each short rPPG segment with the **strides** of 30 seconds (means the different between each short window is 30 seconds).

The test will be done under certain scenario of the Task 1, Task 2 UBFC, Physio Rest 2 and Rest 6

In [121]:
root_path = "UBFC-Phys"
subjects = ["s41", "s42", "s43", "s44","s45","s46","s47","s48","s49","s50","s51","s52", "s53","s54","s55","s56"]
tasks = ["T1"]

# Store ground truth and rPPG data
gt_data = {}
rppg_data = {
    'POS': {},
    'LGI': {},
    'OMIT': {},
    'GREEN': {},
    'CHROM': {}
}
# Expected sampling rates (adjust if different for your dataset)
sample_rate_gt = 64  # Hz
sample_rate_video = 35 # Hz


In [122]:
## Process for each subject and task
for subject in subjects:
    for task in tasks:
        subject_task_id = f"{subject}_{task}"

        # Load rPPG signals from different methods
        pos = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_POS_rppg.npy"))
        lgi = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_LGI_rppg.npy"))
        omit = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_OMIT_rppg.npy"))
        green = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_GREEN_rppg.npy"))
        chrom = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_CHROM_rppg.npy"))

        # Load ground truth BVP
        GT = pd.read_csv(os.path.join(root_path, subject, f"bvp_{subject}_{task}.csv")).values
        GT = GT.flatten()

        ## process rPPG signals
        rppg_data["POS"][subject_task_id] = preprocess_ppg(pos, fs=sample_rate_video)
        rppg_data["LGI"][subject_task_id] = preprocess_ppg(lgi, fs=sample_rate_video)
        rppg_data["OMIT"][subject_task_id] = preprocess_ppg(omit, fs=sample_rate_video)
        rppg_data["GREEN"][subject_task_id] = preprocess_ppg(green, fs=sample_rate_video)
        rppg_data["CHROM"][subject_task_id] = preprocess_ppg(chrom, fs=sample_rate_video)
        
        GT = preprocess_ppg(GT, fs=sample_rate_gt)
        gt_data[subject_task_id] = GT

print(f"Done Process the Signals")
    

Done Process the Signals


In [123]:
"""
Steps to reproduce getting the short term of 30 seconds for each subject + averaging:
1. Loop through each subject.
2. For each short rppg segment (30 seconds), compute the hrv metrics with the neurokit2 package and store it.
3. Average the HRV metrics across all segments for each subject.
4. Compare the correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics.
# Note: The above code is a preprocessing step. The next steps would involve calculating HRV metrics and performing correlation analysis.
""" 

## Iterate for each subject and compute HRV metrics
hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
    'SD1': [],
    'SD2': [],
}

## Store the HRV metrics for each rPPG method for each subject
rppg_hrv_metrics = {
    method: {
        subject_id: {
            key: [] for key in hrv_metrics.keys()
        } for subject_id in rppg_data[method].keys()
    } for method in rppg_data.keys()
}

## Iterate through each subject and compute HRV for each segments
for rppg_method in rppg_data.keys():
    for subject_task_id, rppg_signal in rppg_data[rppg_method].items():
        print(f"Processing {subject_task_id} for {rppg_method}")

        ## Applied the window of 30 seconds with stride of 15 seconds
        segment_length = 60 * desired_sample_rate
        stride_length = 30 * desired_sample_rate
        
        ## Making the segments
        for start in range(0, len(rppg_signal) - segment_length + 1, stride_length):
            segment = rppg_signal[start:start + segment_length]
            ## If the segment is less than the segment length, we skip it
            if len(segment) < segment_length:
                continue

            ## Compute the HRV metrics using neurokit2
            signals, _ = nk.ppg_process(segment, sampling_rate=desired_sample_rate)
            peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=sample_rate_video)

            # Getting the HRV Metrics

            ## Time Domain
            hrv_time = nk.hrv_time(peaks, sampling_rate=desired_sample_rate)

            ## Add into the hrv_metrics dictionary
            rppg_hrv_metrics[rppg_method][subject_task_id]['MeanNN'].append(hrv_time['HRV_MeanNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SDNN'].append(hrv_time['HRV_SDNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['RMSSD'].append(hrv_time['HRV_RMSSD'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['pNN50'].append(hrv_time['HRV_pNN50'])

            ## Frequency Domain
            hrv_freq = nk.hrv_frequency(peaks, sampling_rate=desired_sample_rate, psd_method="welch")
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF'].append(hrv_freq['HRV_LF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['HF'].append(hrv_freq['HRV_HF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF_HF'].append(hrv_freq['HRV_LFHF'])

            ## Non-Linear Domain
            hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=desired_sample_rate)
            rppg_hrv_metrics[rppg_method][subject_task_id]['SD1'].append(hrv_non_linear['HRV_SD1'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SD2'].append(hrv_non_linear['HRV_SD2'])

Processing s41_T1 for POS
Processing s42_T1 for POS
Processing s43_T1 for POS
Processing s44_T1 for POS
Processing s45_T1 for POS
Processing s46_T1 for POS
Processing s47_T1 for POS
Processing s48_T1 for POS
Processing s49_T1 for POS
Processing s50_T1 for POS
Processing s51_T1 for POS
Processing s52_T1 for POS
Processing s53_T1 for POS
Processing s54_T1 for POS
Processing s55_T1 for POS
Processing s56_T1 for POS
Processing s41_T1 for LGI
Processing s42_T1 for LGI
Processing s43_T1 for LGI
Processing s44_T1 for LGI
Processing s45_T1 for LGI
Processing s46_T1 for LGI
Processing s47_T1 for LGI
Processing s48_T1 for LGI
Processing s49_T1 for LGI
Processing s50_T1 for LGI
Processing s51_T1 for LGI
Processing s52_T1 for LGI
Processing s53_T1 for LGI
Processing s54_T1 for LGI
Processing s55_T1 for LGI
Processing s56_T1 for LGI
Processing s41_T1 for OMIT
Processing s42_T1 for OMIT
Processing s43_T1 for OMIT
Processing s44_T1 for OMIT
Processing s45_T1 for OMIT
Processing s46_T1 for OMIT
Proces

In [124]:
### Calculate the average HRV metrics for each segment for each subject per method

hrv_means = {}
for method in rppg_hrv_metrics:
    hrv_means[method] = {}

    for subject in rppg_hrv_metrics[method]:
        hrv_means[method][subject] = {}

        for metric, values in rppg_hrv_metrics[method][subject].items():
            if values:
                hrv_means[method][subject][metric] = np.mean(values)
            else:
                hrv_means[method][subject][metric] = np.nan

print(hrv_means)

{'POS': {'s41_T1': {'MeanNN': 631.6764208958989, 'SDNN': 128.1619220270371, 'RMSSD': 185.4826900104032, 'pNN50': 62.91453986055366, 'LF': 0.030636184903717, 'HF': 0.11858333398721517, 'LF_HF': 0.2687657671234442, 'SD1': 131.8501523782398, 'SD2': 121.5042646249602}, 's42_T1': {'MeanNN': 803.1586950374957, 'SDNN': 188.46820539755805, 'RMSSD': 239.15005641009355, 'pNN50': 72.21991542435447, 'LF': 0.026746054667673304, 'HF': 0.13417755890902922, 'LF_HF': 0.2070743523431043, 'SD1': 170.26835841338197, 'SD2': 205.51961788418743}, 's43_T1': {'MeanNN': 631.3771778892728, 'SDNN': 133.24798633569338, 'RMSSD': 181.02764848480163, 'pNN50': 66.44340741169157, 'LF': 0.034103073175618795, 'HF': 0.08965820607141468, 'LF_HF': 0.46657900626379833, 'SD1': 128.6903453694419, 'SD2': 136.95676641517815}, 's44_T1': {'MeanNN': 791.1915696206792, 'SDNN': 257.5916702904131, 'RMSSD': 320.27141183589885, 'pNN50': 85.73565346168087, 'LF': 0.039539030405811346, 'HF': 0.09989416047161918, 'LF_HF': 0.4585626760877533

### Getting the GT HRV Metrics

In [125]:
# Compare the Correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics

## Getting the ground truth HRV metrics

gt_hrv_metrics = {
    subject_id: {
        key: [] for key in hrv_metrics.keys()
    } for subject_id in gt_data.keys()
}

# Iterate through each subject and compute the full length HRV metrics for the ground truth
for subject_task_id, gt_signal in gt_data.items():
    print(f"Processing {subject_task_id} for ground truth")

    ## Compute the HRV metrics using neurokit2
    signals, _ = nk.ppg_process(gt_signal, sampling_rate=desired_sample_rate)
    peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=desired_sample_rate)

    # Getting the HRV Metrics

    ## Time Domain
    hrv_time = nk.hrv_time(peaks, sampling_rate=desired_sample_rate)

    ## Add into the hrv_metrics dictionary
    gt_hrv_metrics[subject_task_id]['MeanNN'] = (hrv_time['HRV_MeanNN'])
    gt_hrv_metrics[subject_task_id]['SDNN'] = (hrv_time['HRV_SDNN'])
    gt_hrv_metrics[subject_task_id]['RMSSD'] = (hrv_time['HRV_RMSSD'])
    gt_hrv_metrics[subject_task_id]['pNN50'] = (hrv_time['HRV_pNN50'])

    ## Frequency Domain
    hrv_freq = nk.hrv_frequency(peaks, sampling_rate=desired_sample_rate, psd_method="welch")
    gt_hrv_metrics[subject_task_id]['LF'] = (hrv_freq['HRV_LF'])
    gt_hrv_metrics[subject_task_id]['HF'] = (hrv_freq['HRV_HF'])
    gt_hrv_metrics[subject_task_id]['LF_HF'] = (hrv_freq['HRV_LFHF'])

    ## Non-Linear Domain
    hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=desired_sample_rate)
    gt_hrv_metrics[subject_task_id]['SD1'] = (hrv_non_linear['HRV_SD1'])
    gt_hrv_metrics[subject_task_id]['SD2'] = (hrv_non_linear['HRV_SD2'])



Processing s41_T1 for ground truth
Processing s42_T1 for ground truth
Processing s43_T1 for ground truth
Processing s44_T1 for ground truth
Processing s45_T1 for ground truth
Processing s46_T1 for ground truth
Processing s47_T1 for ground truth
Processing s48_T1 for ground truth
Processing s49_T1 for ground truth
Processing s50_T1 for ground truth
Processing s51_T1 for ground truth
Processing s52_T1 for ground truth
Processing s53_T1 for ground truth
Processing s54_T1 for ground truth
Processing s55_T1 for ground truth
Processing s56_T1 for ground truth


### Since we already get the Metrics HRV value of the rPPG, let's compare it with the GT to see the correlation

In [126]:
# First thing first is we need to remove the outlier from rppg, 
# and make to remove the same subjects from the ground truth as well
# Process of removing the outlier itself, is also done under the IQR method
def remove_outliers_iqr(data):
    """ Remove outliers using the IQR method.
    
    Parameters:
    ----------
    data (list or numpy array): The data from which to remove outliers.
    
    Returns:
    --------
    numpy array: Data with outliers removed.
    """
    data = np.asarray(data)  
    
    if len(data) == 0:
        return np.array([])

    q1 = np.percentile(data, 25)
    q3 = np.percentile(data, 75)
    iqr = q3 - q1
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr

    return np.array([x for x in data if lower_bound <= x <= upper_bound])

# Compute correlation between rPPG methods and ground truth HRV metrics
correlation_results = {}

for method in hrv_means.keys():
    correlation_results[method] = {}
    
    for metric in hrv_metrics.keys():
        # Collect all values for this metric across subjects
        all_metric_values = []
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value):
                    all_metric_values.append(value)
        
        # Remove outlier subjects for this metric
        cleaned_values = remove_outliers_iqr(all_metric_values)
        
        # Prepare data for correlation
        rppg_values = []
        gt_values = []
        
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value) and value in cleaned_values:
                    # Subject is not an outlier, include in analysis
                    rppg_values.append(value)
                    
                    # Add corresponding ground truth
                    if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                        if not gt_hrv_metrics[subject_id][metric].empty:
                            gt_value = gt_hrv_metrics[subject_id][metric][0] if isinstance(gt_hrv_metrics[subject_id][metric], pd.Series) else gt_hrv_metrics[subject_id][metric]
                            gt_values.append(gt_value)
        
        # Calculate correlation
        if len(rppg_values) > 1 and len(gt_values) > 1:
            correlation, p_value = stats.pearsonr(rppg_values, gt_values)
            correlation_results[method][metric] = {
                'correlation': correlation,
                'p_value': p_value,
                'n_subjects': len(rppg_values)
            }

In [127]:
## Print the correlation results
for method, metrics in correlation_results.items():
    print(f"Method: {method}")
    for metric, result in metrics.items():
        print(f"  {metric}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")


Method: POS
  MeanNN: Correlation = 0.5291, p-value = 0.0351
  SDNN: Correlation = 0.3188, p-value = 0.2666
  RMSSD: Correlation = 0.3784, p-value = 0.2023
  pNN50: Correlation = 0.4511, p-value = 0.0795
  LF: Correlation = -0.0398, p-value = 0.8837
  HF: Correlation = -0.4725, p-value = 0.0753
  LF_HF: Correlation = -0.2740, p-value = 0.3231
  SD1: Correlation = 0.3804, p-value = 0.1998
  SD2: Correlation = 0.3271, p-value = 0.2537


Method: LGI
  MeanNN: Correlation = 0.6588, p-value = 0.0055
  SDNN: Correlation = 0.2908, p-value = 0.3131
  RMSSD: Correlation = 0.2026, p-value = 0.4873
  pNN50: Correlation = 0.4564, p-value = 0.0756
  LF: Correlation = 0.2104, p-value = 0.4340
  HF: Correlation = 0.0953, p-value = 0.7256
  LF_HF: Correlation = 0.2483, p-value = 0.3537
  SD1: Correlation = 0.2030, p-value = 0.4865
  SD2: Correlation = 0.3320, p-value = 0.2462


Method: OMIT
  MeanNN: Correlation = 0.6426, p-value = 0.0073
  SDNN: Correlation = 0.2935, p-value = 0.3084
  RMSSD: Correla

In [128]:
# Calculate the top 5 features with the highest correlation for each rPPG method
top_features = {}
for method, metrics in correlation_results.items():
    sorted_metrics = sorted(metrics.items(), key=lambda x: abs(x[1]['correlation']), reverse=True)
    top_features[method] = sorted_metrics[:5]
print("Top 5 Features with Highest Correlation:")
for method, features in top_features.items():
    print(f"Method: {method}")
    for feature, result in features:
        print(f"  {feature}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")
    

Top 5 Features with Highest Correlation:
Method: POS
  MeanNN: Correlation = 0.5291, p-value = 0.0351
  HF: Correlation = -0.4725, p-value = 0.0753
  pNN50: Correlation = 0.4511, p-value = 0.0795
  SD1: Correlation = 0.3804, p-value = 0.1998
  RMSSD: Correlation = 0.3784, p-value = 0.2023


Method: LGI
  MeanNN: Correlation = 0.6588, p-value = 0.0055
  pNN50: Correlation = 0.4564, p-value = 0.0756
  SD2: Correlation = 0.3320, p-value = 0.2462
  SDNN: Correlation = 0.2908, p-value = 0.3131
  LF_HF: Correlation = 0.2483, p-value = 0.3537


Method: OMIT
  MeanNN: Correlation = 0.6426, p-value = 0.0073
  pNN50: Correlation = 0.4618, p-value = 0.0717
  SD2: Correlation = 0.3995, p-value = 0.1401
  LF_HF: Correlation = 0.3256, p-value = 0.2185
  SDNN: Correlation = 0.2935, p-value = 0.3084


Method: GREEN
  MeanNN: Correlation = 0.7850, p-value = 0.0005
  HF: Correlation = -0.4581, p-value = 0.0743
  SD2: Correlation = 0.4286, p-value = 0.1109
  SDNN: Correlation = 0.3671, p-value = 0.1967
 

---

### Check the Bland-Altman, to see the mean bias nad the interlva of the Limit of Aggrement, make sure the point fall within the LoA

In [129]:
# # Check the value of the rPPG and GT with the Bland-Altman plot and 
# # see the measurement agreement between the rPPG methods and the ground truth

# def plot_bland_altman(rppg_values, gt_values, method, metric):
#     """ Plot Bland-Altman plot for rPPG values against ground truth values """
#     mean_diff = np.mean(rppg_values - gt_values)
#     std_diff = np.std(rppg_values - gt_values)

#     plt.figure(figsize=(10, 6))
#     plt.scatter((rppg_values + gt_values) / 2, rppg_values - gt_values, alpha=0.5)
#     plt.axhline(mean_diff, color='red', linestyle='--', label='Mean Difference')
#     plt.axhline(mean_diff + 1.96 * std_diff, color='green', linestyle='--', label='Upper Limit of Agreement')
#     plt.axhline(mean_diff - 1.96 * std_diff, color='blue', linestyle='--', label='Lower Limit of Agreement')
    
#     plt.title(f'Bland-Altman Plot: {method} - {metric}')
#     plt.xlabel('Mean of rPPG and GT Values')
#     plt.ylabel('Difference (rPPG - GT)')
#     plt.legend()
#     plt.grid()
#     plt.show()

# # Plot Bland-Altman plots for each method and metric
# for method in rppg_hrv_metrics.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         for subject_id in rppg_hrv_metrics[method].keys():
#             # Use hrv_means for the rPPG values
#             if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
#                 rppg_values.append(hrv_means[method][subject_id][metric])
            
#             # For ground truth, get the first value from the list or calculate mean
#             if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
#                 if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
#                     gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

#         if len(rppg_values) > 0 and len(gt_values) > 0:
#             plot_bland_altman(np.array(rppg_values), np.array(gt_values), method, metric)


In [130]:
## Calculate the mean bias, average, standard deviation and the interval of the LOA
## Put inside the table and show the results
## Calculate the LoA percentage for each method and metric and see if the percentage is within 20% difference

def calculate_bland_altman_stats(rppg_values, gt_values):
    """ Calculate the Bland-Altman statistics """
    mean_diff = np.mean(rppg_values - gt_values)
    std_diff = np.std(rppg_values - gt_values)
    
    upper_limit = mean_diff + 1.96 * std_diff
    lower_limit = mean_diff - 1.96 * std_diff
    
    return mean_diff, std_diff, upper_limit, lower_limit

def calculate_percentage_difference(rppg_values, gt_values):
    """ Calculate the percentage difference between rPPG and ground truth values """
    percentage_diff = np.abs((rppg_values - gt_values) / gt_values) * 100
    return np.mean(percentage_diff)

# Prepare the results table
results_table = []  
for method in rppg_hrv_metrics.keys():
    for metric in hrv_metrics.keys():
        rppg_values = []
        gt_values = []

        for subject_id in rppg_hrv_metrics[method].keys():
            # Use hrv_means for the rPPG values
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                rppg_values.append(hrv_means[method][subject_id][metric])
            
            # For ground truth, get the first value from the list or calculate mean
            if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
                    gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

        if len(rppg_values) > 0 and len(gt_values) > 0:
            mean_diff, std_diff, upper_limit, lower_limit = calculate_bland_altman_stats(np.array(rppg_values), np.array(gt_values))
            percentage_diff = calculate_percentage_difference(np.array(rppg_values), np.array(gt_values))

            results_table.append({
                'Method': method,
                'Metric': metric,
                'Mean Average': np.mean(rppg_values),
                'Ground Truth Average': np.mean(gt_values),
                'Mean Difference': mean_diff,
                'Standard Deviation': std_diff,
                'Upper Limit of Agreement': upper_limit,
                'Lower Limit of Agreement': lower_limit,
                'Percentage Difference': percentage_diff
            })
# Convert results to DataFrame for better visualization
results_df = pd.DataFrame(results_table)
# Display the results
print("\nBland-Altman Results:")
print(results_df)



Bland-Altman Results:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN    751.532234            778.413190       -26.880956   
1     POS    SDNN    202.707753            147.745004        54.962749   
2     POS   RMSSD    271.955644            168.860546       103.095098   
3     POS   pNN50     71.495483             43.796627        27.698856   
4     POS      LF      0.034232              0.028815         0.005417   
5     POS      HF      0.111552              0.037439         0.074114   
6     POS   LF_HF      0.371225              0.990990        -0.619765   
7     POS     SD1    193.876860            119.688613        74.188247   
8     POS     SD2    210.944204            169.378610        41.565594   
9     LGI  MeanNN    751.981610            778.413190       -26.431581   
10    LGI    SDNN    163.842988            147.745004        16.097984   
11    LGI   RMSSD    211.197766            168.860546        42.337220   
12    LGI   pNN

In [131]:
### Calculate which methods are within 20% difference and the best in terms of minimal percentage difference
within_20_percent = results_df[results_df['Percentage Difference'] <= 20]
print("\nMethods within 20% difference:")
print(within_20_percent)


Methods within 20% difference:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN    751.532234             778.41319       -26.880956   
9     LGI  MeanNN    751.981610             778.41319       -26.431581   
18   OMIT  MeanNN    750.398379             778.41319       -28.014811   
27  GREEN  MeanNN    809.275703             778.41319        30.862513   
36  CHROM  MeanNN    739.944512             778.41319       -38.468678   

    Standard Deviation  Upper Limit of Agreement  Lower Limit of Agreement  \
0           121.631567                211.516915               -265.278827   
9           102.809644                175.075322               -227.938483   
18          105.304452                178.381915               -234.411537   
27          200.377912                423.603221               -361.878195   
36          113.787399                184.554624               -261.491980   

    Percentage Difference  
0                9.854468 

### Conclussion : 1 Minute Window

Stuff

In [132]:
## Store the rPPG hrv metrics into the csv
output_path = "rest_rppg_hrv_metrics_window-60s.csv"

## Convert the feature of the CHROM within the HRV Means to be the DataFrame
#   MeanNN: Correlation = 0.6109, p-value = 0.0119
#   SD1: Correlation = 0.5190, p-value = 0.0474
#   RMSSD: Correlation = 0.5185, p-value = 0.0477
#   LF: Correlation = 0.3975, p-value = 0.1423
#   SDNN: Correlation = 0.3676, p-value = 0.1959
## Take only the CHROM method and the MeanNN, SD1, RMSSD, LF, SDNN
chrom_hrv_metrics = {
    'MeanNN': [],
    'pNN50': [],
    'RMSSD': [],
    'SDNN': []
}

for subject_id in hrv_means['CHROM'].keys():
    chrom_hrv_metrics['MeanNN'].append(hrv_means['CHROM'][subject_id]['MeanNN'])
    chrom_hrv_metrics['pNN50'].append(hrv_means['CHROM'][subject_id]['pNN50'])
    chrom_hrv_metrics['RMSSD'].append(hrv_means['CHROM'][subject_id]['RMSSD'])
    chrom_hrv_metrics['SDNN'].append(hrv_means['CHROM'][subject_id]['SDNN'])

## Convert the chrom_hrv_metrics to a DataFrame
chrom_df = pd.DataFrame(chrom_hrv_metrics)

## Add label Rest to the dataFrame
chrom_df['Label'] = 'Rest'

chrom_df.head()

## Save the DataFrame to a CSV file
chrom_df.to_csv(output_path, index=False)

---

# 2 Minute Plot Correlation

For 2 minute window, the averaging purpose will be done under windowing each short rPPG segment with the **strides** of 60 seconds (means the different between each short window is 60 seconds).

The test will be done under certain scenario of the Task 1, Task 2 UBFC, Physio Rest 2 and Rest 6

In [133]:
root_path = "UBFC-Phys"
subjects = ["s41", "s42", "s43", "s44","s45","s46","s47","s48","s49","s50","s51","s52", "s53","s54","s55","s56"]
tasks = ["T1"]

# Store ground truth and rPPG data
gt_data = {}
rppg_data = {
    'POS': {},
    'LGI': {},
    'OMIT': {},
    'GREEN': {},
    'CHROM': {}
}
# Expected sampling rates (adjust if different for your dataset)
sample_rate_gt = 64  # Hz
sample_rate_video = 35 # Hz


In [134]:
## Process for each subject and task
for subject in subjects:
    for task in tasks:
        subject_task_id = f"{subject}_{task}"

        # Load rPPG signals from different methods
        pos = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_POS_rppg.npy"))
        lgi = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_LGI_rppg.npy"))
        omit = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_OMIT_rppg.npy"))
        green = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_GREEN_rppg.npy"))
        chrom = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_CHROM_rppg.npy"))

        # Load ground truth BVP
        GT = pd.read_csv(os.path.join(root_path, subject, f"bvp_{subject}_{task}.csv")).values
        GT = GT.flatten()

        ## process rPPG signals
        rppg_data["POS"][subject_task_id] = preprocess_ppg(pos, fs=sample_rate_video)
        rppg_data["LGI"][subject_task_id] = preprocess_ppg(lgi, fs=sample_rate_video)
        rppg_data["OMIT"][subject_task_id] = preprocess_ppg(omit, fs=sample_rate_video)
        rppg_data["GREEN"][subject_task_id] = preprocess_ppg(green, fs=sample_rate_video)
        rppg_data["CHROM"][subject_task_id] = preprocess_ppg(chrom, fs=sample_rate_video)
        
        GT = preprocess_ppg(GT, fs=sample_rate_gt)
        gt_data[subject_task_id] = GT

print(f"Done Process the Signals")
    

Done Process the Signals


In [135]:
"""
Steps to reproduce getting the short term of 30 seconds for each subject + averaging:
1. Loop through each subject.
2. For each short rppg segment (30 seconds), compute the hrv metrics with the neurokit2 package and store it.
3. Average the HRV metrics across all segments for each subject.
4. Compare the correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics.
# Note: The above code is a preprocessing step. The next steps would involve calculating HRV metrics and performing correlation analysis.
""" 

## Iterate for each subject and compute HRV metrics
hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
    'SD1': [],
    'SD2': [],
}

## Store the HRV metrics for each rPPG method for each subject
rppg_hrv_metrics = {
    method: {
        subject_id: {
            key: [] for key in hrv_metrics.keys()
        } for subject_id in rppg_data[method].keys()
    } for method in rppg_data.keys()
}

## Iterate through each subject and compute HRV for each segments
for rppg_method in rppg_data.keys():
    for subject_task_id, rppg_signal in rppg_data[rppg_method].items():
        print(f"Processing {subject_task_id} for {rppg_method}")

        ## Applied the window of 30 seconds with stride of 15 seconds
        segment_length = 120 * desired_sample_rate
        stride_length = 60 * desired_sample_rate
        
        ## Making the segments
        for start in range(0, len(rppg_signal) - segment_length + 1, stride_length):
            segment = rppg_signal[start:start + segment_length]
            ## If the segment is less than the segment length, we skip it
            if len(segment) < segment_length:
                continue

            ## Compute the HRV metrics using neurokit2
            signals, _ = nk.ppg_process(segment, sampling_rate=desired_sample_rate)
            peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=desired_sample_rate)

            # Getting the HRV Metrics

            ## Time Domain
            hrv_time = nk.hrv_time(peaks, sampling_rate=desired_sample_rate)

            ## Add into the hrv_metrics dictionary
            rppg_hrv_metrics[rppg_method][subject_task_id]['MeanNN'].append(hrv_time['HRV_MeanNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SDNN'].append(hrv_time['HRV_SDNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['RMSSD'].append(hrv_time['HRV_RMSSD'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['pNN50'].append(hrv_time['HRV_pNN50'])

            ## Frequency Domain
            hrv_freq = nk.hrv_frequency(peaks, sampling_rate=desired_sample_rate, psd_method="welch")
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF'].append(hrv_freq['HRV_LF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['HF'].append(hrv_freq['HRV_HF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF_HF'].append(hrv_freq['HRV_LFHF'])

            ## Non-Linear Domain
            hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=desired_sample_rate)
            rppg_hrv_metrics[rppg_method][subject_task_id]['SD1'].append(hrv_non_linear['HRV_SD1'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SD2'].append(hrv_non_linear['HRV_SD2'])

Processing s41_T1 for POS
Processing s42_T1 for POS
Processing s43_T1 for POS
Processing s44_T1 for POS
Processing s45_T1 for POS
Processing s46_T1 for POS
Processing s47_T1 for POS
Processing s48_T1 for POS
Processing s49_T1 for POS
Processing s50_T1 for POS
Processing s51_T1 for POS
Processing s52_T1 for POS
Processing s53_T1 for POS
Processing s54_T1 for POS
Processing s55_T1 for POS
Processing s56_T1 for POS
Processing s41_T1 for LGI
Processing s42_T1 for LGI
Processing s43_T1 for LGI
Processing s44_T1 for LGI
Processing s45_T1 for LGI
Processing s46_T1 for LGI
Processing s47_T1 for LGI
Processing s48_T1 for LGI
Processing s49_T1 for LGI
Processing s50_T1 for LGI
Processing s51_T1 for LGI
Processing s52_T1 for LGI
Processing s53_T1 for LGI
Processing s54_T1 for LGI
Processing s55_T1 for LGI
Processing s56_T1 for LGI
Processing s41_T1 for OMIT
Processing s42_T1 for OMIT
Processing s43_T1 for OMIT
Processing s44_T1 for OMIT
Processing s45_T1 for OMIT
Processing s46_T1 for OMIT
Proces

In [136]:
### Calculate the average HRV metrics for each segment for each subject per method

hrv_means = {}
for method in rppg_hrv_metrics:
    hrv_means[method] = {}

    for subject in rppg_hrv_metrics[method]:
        hrv_means[method][subject] = {}

        for metric, values in rppg_hrv_metrics[method][subject].items():
            if values:
                hrv_means[method][subject][metric] = np.mean(values)
            else:
                hrv_means[method][subject][metric] = np.nan

print(hrv_means)

{'POS': {'s41_T1': {'MeanNN': 641.0823888404534, 'SDNN': 131.90973563043002, 'RMSSD': 185.38534222625216, 'pNN50': 64.69049694856147, 'LF': 0.041435878671908004, 'HF': 0.12023779621413275, 'LF_HF': 0.3546703194536427, 'SD1': 131.43167801086423, 'SD2': 129.45352470252493}, 's42_T1': {'MeanNN': 849.2294931346026, 'SDNN': 167.27172903566736, 'RMSSD': 212.6624340479998, 'pNN50': 68.57230360879996, 'LF': 0.02632030626703126, 'HF': 0.07796274417562842, 'LF_HF': 0.331170746989507, 'SD1': 150.91005044795267, 'SD2': 180.91858694068685}, 's43_T1': {'MeanNN': 662.547763819871, 'SDNN': 150.7210458263655, 'RMSSD': 206.91131045425485, 'pNN50': 67.23047007623693, 'LF': 0.05821270908324013, 'HF': 0.08135403266041452, 'LF_HF': 0.7241101241743103, 'SD1': 146.71514447084195, 'SD2': 154.01636118240265}, 's44_T1': {'MeanNN': 854.1354059609455, 'SDNN': 253.44955390642815, 'RMSSD': 320.0945105860005, 'pNN50': 84.22661870503597, 'LF': 0.03658023982672856, 'HF': 0.0938929084251946, 'LF_HF': 0.39046701625656444

### Getting the GT HRV Metrics

In [137]:
# Compare the Correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics

## Getting the ground truth HRV metrics

gt_hrv_metrics = {
    subject_id: {
        key: [] for key in hrv_metrics.keys()
    } for subject_id in gt_data.keys()
}

# Iterate through each subject and compute the full length HRV metrics for the ground truth
for subject_task_id, gt_signal in gt_data.items():
    print(f"Processing {subject_task_id} for ground truth")

    ## Compute the HRV metrics using neurokit2
    signals, _ = nk.ppg_process(gt_signal, sampling_rate=desired_sample_rate)
    peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=desired_sample_rate)

    # Getting the HRV Metrics

    ## Time Domain
    hrv_time = nk.hrv_time(peaks, sampling_rate=desired_sample_rate)

    ## Add into the hrv_metrics dictionary
    gt_hrv_metrics[subject_task_id]['MeanNN'] = (hrv_time['HRV_MeanNN'])
    gt_hrv_metrics[subject_task_id]['SDNN'] = (hrv_time['HRV_SDNN'])
    gt_hrv_metrics[subject_task_id]['RMSSD'] = (hrv_time['HRV_RMSSD'])
    gt_hrv_metrics[subject_task_id]['pNN50'] = (hrv_time['HRV_pNN50'])

    ## Frequency Domain
    hrv_freq = nk.hrv_frequency(peaks, sampling_rate=desired_sample_rate, psd_method="welch")
    gt_hrv_metrics[subject_task_id]['LF'] = (hrv_freq['HRV_LF'])
    gt_hrv_metrics[subject_task_id]['HF'] = (hrv_freq['HRV_HF'])
    gt_hrv_metrics[subject_task_id]['LF_HF'] = (hrv_freq['HRV_LFHF'])

    ## Non-Linear Domain
    hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=desired_sample_rate)
    gt_hrv_metrics[subject_task_id]['SD1'] = (hrv_non_linear['HRV_SD1'])
    gt_hrv_metrics[subject_task_id]['SD2'] = (hrv_non_linear['HRV_SD2'])



Processing s41_T1 for ground truth
Processing s42_T1 for ground truth
Processing s43_T1 for ground truth
Processing s44_T1 for ground truth
Processing s45_T1 for ground truth
Processing s46_T1 for ground truth
Processing s47_T1 for ground truth
Processing s48_T1 for ground truth
Processing s49_T1 for ground truth
Processing s50_T1 for ground truth
Processing s51_T1 for ground truth
Processing s52_T1 for ground truth
Processing s53_T1 for ground truth
Processing s54_T1 for ground truth
Processing s55_T1 for ground truth
Processing s56_T1 for ground truth


### Since we already get the Metrics HRV value of the rPPG, let's compare it with the GT to see the correlation

In [138]:
# First thing first is we need to remove the outlier from rppg, 
# and make to remove the same subjects from the ground truth as well
# Process of removing the outlier itself, is also done under the IQR method
def remove_outliers_iqr(data):
    """ Remove outliers using the IQR method.
    
    Parameters:
    ----------
    data (list or numpy array): The data from which to remove outliers.
    
    Returns:
    --------
    numpy array: Data with outliers removed.
    """
    data = np.asarray(data)  
    
    if len(data) == 0:
        return np.array([])

    q1 = np.percentile(data, 25)
    q3 = np.percentile(data, 75)
    iqr = q3 - q1
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr

    return np.array([x for x in data if lower_bound <= x <= upper_bound])

# Compute correlation between rPPG methods and ground truth HRV metrics
correlation_results = {}

for method in hrv_means.keys():
    correlation_results[method] = {}
    
    for metric in hrv_metrics.keys():
        # Collect all values for this metric across subjects
        all_metric_values = []
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value):
                    all_metric_values.append(value)
        
        # Remove outlier subjects for this metric
        cleaned_values = remove_outliers_iqr(all_metric_values)
        
        # Prepare data for correlation
        rppg_values = []
        gt_values = []
        
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value) and value in cleaned_values:
                    # Subject is not an outlier, include in analysis
                    rppg_values.append(value)
                    
                    # Add corresponding ground truth
                    if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                        if not gt_hrv_metrics[subject_id][metric].empty:
                            gt_value = gt_hrv_metrics[subject_id][metric][0] if isinstance(gt_hrv_metrics[subject_id][metric], pd.Series) else gt_hrv_metrics[subject_id][metric]
                            gt_values.append(gt_value)
        
        # Calculate correlation
        if len(rppg_values) > 1 and len(gt_values) > 1:
            correlation, p_value = stats.pearsonr(rppg_values, gt_values)
            correlation_results[method][metric] = {
                'correlation': correlation,
                'p_value': p_value,
                'n_subjects': len(rppg_values)
            }

In [139]:
## Print the correlation results
for method, metrics in correlation_results.items():
    print(f"Method: {method}")
    for metric, result in metrics.items():
        print(f"  {metric}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")


Method: POS
  MeanNN: Correlation = 0.4769, p-value = 0.0618
  SDNN: Correlation = 0.2210, p-value = 0.4680
  RMSSD: Correlation = 0.1178, p-value = 0.7016
  pNN50: Correlation = 0.3460, p-value = 0.1893
  LF: Correlation = 0.3662, p-value = 0.1631
  HF: Correlation = -0.3878, p-value = 0.1378
  LF_HF: Correlation = -0.0758, p-value = 0.7803
  SD1: Correlation = 0.1187, p-value = 0.6994
  SD2: Correlation = 0.2891, p-value = 0.3381


Method: LGI
  MeanNN: Correlation = 0.6239, p-value = 0.0098
  SDNN: Correlation = 0.2729, p-value = 0.3451
  RMSSD: Correlation = 0.2705, p-value = 0.3496
  pNN50: Correlation = 0.4307, p-value = 0.0958
  LF: Correlation = 0.1929, p-value = 0.4742
  HF: Correlation = 0.1176, p-value = 0.6645
  LF_HF: Correlation = 0.6186, p-value = 0.0106
  SD1: Correlation = 0.2709, p-value = 0.3488
  SD2: Correlation = 0.2559, p-value = 0.3772


Method: OMIT
  MeanNN: Correlation = 0.6222, p-value = 0.0101
  SDNN: Correlation = 0.2358, p-value = 0.4171
  RMSSD: Correlat

In [140]:
# Calculate the top 5 features with the highest correlation for each rPPG method
top_features = {}
for method, metrics in correlation_results.items():
    sorted_metrics = sorted(metrics.items(), key=lambda x: abs(x[1]['correlation']), reverse=True)
    top_features[method] = sorted_metrics[:5]
print("Top 5 Features with Highest Correlation:")
for method, features in top_features.items():
    print(f"Method: {method}")
    for feature, result in features:
        print(f"  {feature}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")
    

Top 5 Features with Highest Correlation:
Method: POS
  MeanNN: Correlation = 0.4769, p-value = 0.0618
  HF: Correlation = -0.3878, p-value = 0.1378
  LF: Correlation = 0.3662, p-value = 0.1631
  pNN50: Correlation = 0.3460, p-value = 0.1893
  SD2: Correlation = 0.2891, p-value = 0.3381


Method: LGI
  MeanNN: Correlation = 0.6239, p-value = 0.0098
  LF_HF: Correlation = 0.6186, p-value = 0.0106
  pNN50: Correlation = 0.4307, p-value = 0.0958
  SDNN: Correlation = 0.2729, p-value = 0.3451
  SD1: Correlation = 0.2709, p-value = 0.3488


Method: OMIT
  MeanNN: Correlation = 0.6222, p-value = 0.0101
  LF_HF: Correlation = 0.5234, p-value = 0.0453
  pNN50: Correlation = 0.4467, p-value = 0.0828
  SD1: Correlation = 0.2675, p-value = 0.3552
  RMSSD: Correlation = 0.2671, p-value = 0.3560


Method: GREEN
  MeanNN: Correlation = 0.7883, p-value = 0.0005
  SD1: Correlation = 0.4335, p-value = 0.1065
  RMSSD: Correlation = 0.4333, p-value = 0.1066
  SDNN: Correlation = 0.4062, p-value = 0.1330
 

### Check the Bland-Altman, to see the mean bias nad the interlva of the Limit of Aggrement, make sure the point fall within the LoA

In [141]:
# # Check the value of the rPPG and GT with the Bland-Altman plot and 
# # see the measurement agreement between the rPPG methods and the ground truth

# def plot_bland_altman(rppg_values, gt_values, method, metric):
#     """ Plot Bland-Altman plot for rPPG values against ground truth values """
#     mean_diff = np.mean(rppg_values - gt_values)
#     std_diff = np.std(rppg_values - gt_values)

#     plt.figure(figsize=(10, 6))
#     plt.scatter((rppg_values + gt_values) / 2, rppg_values - gt_values, alpha=0.5)
#     plt.axhline(mean_diff, color='red', linestyle='--', label='Mean Difference')
#     plt.axhline(mean_diff + 1.96 * std_diff, color='green', linestyle='--', label='Upper Limit of Agreement')
#     plt.axhline(mean_diff - 1.96 * std_diff, color='blue', linestyle='--', label='Lower Limit of Agreement')
    
#     plt.title(f'Bland-Altman Plot: {method} - {metric}')
#     plt.xlabel('Mean of rPPG and GT Values')
#     plt.ylabel('Difference (rPPG - GT)')
#     plt.legend()
#     plt.grid()
#     plt.show()

# # Plot Bland-Altman plots for each method and metric
# for method in rppg_hrv_metrics.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         for subject_id in rppg_hrv_metrics[method].keys():
#             # Use hrv_means for the rPPG values
#             if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
#                 rppg_values.append(hrv_means[method][subject_id][metric])
            
#             # For ground truth, get the first value from the list or calculate mean
#             if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
#                 if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
#                     gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

#         if len(rppg_values) > 0 and len(gt_values) > 0:
#             plot_bland_altman(np.array(rppg_values), np.array(gt_values), method, metric)


In [142]:
## Calculate the mean bias, average, standard deviation and the interval of the LOA
## Put inside the table and show the results
## Calculate the LoA percentage for each method and metric and see if the percentage is within 20% difference

def calculate_bland_altman_stats(rppg_values, gt_values):
    """ Calculate the Bland-Altman statistics """
    mean_diff = np.mean(rppg_values - gt_values)
    std_diff = np.std(rppg_values - gt_values)
    
    upper_limit = mean_diff + 1.96 * std_diff
    lower_limit = mean_diff - 1.96 * std_diff
    
    return mean_diff, std_diff, upper_limit, lower_limit

def calculate_percentage_difference(rppg_values, gt_values):
    """ Calculate the percentage difference between rPPG and ground truth values """
    percentage_diff = np.abs((rppg_values - gt_values) / gt_values) * 100
    return np.mean(percentage_diff)

# Prepare the results table
results_table = []  
for method in rppg_hrv_metrics.keys():
    for metric in hrv_metrics.keys():
        rppg_values = []
        gt_values = []

        for subject_id in rppg_hrv_metrics[method].keys():
            # Use hrv_means for the rPPG values
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                rppg_values.append(hrv_means[method][subject_id][metric])
            
            # For ground truth, get the first value from the list or calculate mean
            if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
                    gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

        if len(rppg_values) > 0 and len(gt_values) > 0:
            mean_diff, std_diff, upper_limit, lower_limit = calculate_bland_altman_stats(np.array(rppg_values), np.array(gt_values))
            percentage_diff = calculate_percentage_difference(np.array(rppg_values), np.array(gt_values))

            results_table.append({
                'Method': method,
                'Metric': metric,
                'Mean Average': np.mean(rppg_values),
                'Ground Truth Average': np.mean(gt_values),
                'Mean Difference': mean_diff,
                'Standard Deviation': std_diff,
                'Upper Limit of Agreement': upper_limit,
                'Lower Limit of Agreement': lower_limit,
                'Percentage Difference': percentage_diff
            })
# Convert results to DataFrame for better visualization
results_df = pd.DataFrame(results_table)
# Display the results
print("\nBland-Altman Results:")
print(results_df)



Bland-Altman Results:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN    783.285696            778.413190         4.872506   
1     POS    SDNN    189.343303            147.745004        41.598299   
2     POS   RMSSD    254.519162            168.860546        85.658616   
3     POS   pNN50     71.407451             43.796627        27.610824   
4     POS      LF      0.036044              0.028815         0.007230   
5     POS      HF      0.080164              0.037439         0.042725   
6     POS   LF_HF      0.501429              0.990990        -0.489561   
7     POS     SD1    180.616267            119.688613        60.927654   
8     POS     SD2    196.512886            169.378610        27.134275   
9     LGI  MeanNN    774.318758            778.413190        -4.094433   
10    LGI    SDNN    162.301864            147.745004        14.556860   
11    LGI   RMSSD    212.664789            168.860546        43.804243   
12    LGI   pNN

In [143]:
### Calculate which methods are within 20% difference and the best in terms of minimal percentage difference
within_20_percent = results_df[results_df['Percentage Difference'] <= 20]
print("\nMethods within 20% difference:")
print(within_20_percent)


Methods within 20% difference:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN    783.285696             778.41319         4.872506   
9     LGI  MeanNN    774.318758             778.41319        -4.094433   
18   OMIT  MeanNN    772.193614             778.41319        -6.219576   
27  GREEN  MeanNN    828.475124             778.41319        50.061934   
36  CHROM  MeanNN    767.495303             778.41319       -10.917887   

    Standard Deviation  Upper Limit of Agreement  Lower Limit of Agreement  \
0           135.132279                269.731773               -259.986761   
9           110.977195                213.420870               -221.609735   
18          110.945257                211.233128               -223.672280   
27          190.770558                423.972228               -323.848360   
36          109.419424                203.544184               -225.379958   

    Percentage Difference  
0               10.320545 

### Conclussion : 2 Minute Window

Stuff

In [144]:
## Store the rPPG hrv metrics into the csv
output_path = "rest_rppg_hrv_metrics_window-120s.csv"

## Convert the feature of the CHROM within the HRV Means to be the DataFrame
#   MeanNN: Correlation = 0.6109, p-value = 0.0119
#   SD1: Correlation = 0.5190, p-value = 0.0474
#   RMSSD: Correlation = 0.5185, p-value = 0.0477
#   LF: Correlation = 0.3975, p-value = 0.1423
#   SDNN: Correlation = 0.3676, p-value = 0.1959
## Take only the CHROM method and the MeanNN, SD1, RMSSD, LF, SDNN
chrom_hrv_metrics = {
    'MeanNN': [],
    'pNN50': [],
    'RMSSD': [],
    'SDNN': []
}

for subject_id in hrv_means['CHROM'].keys():
    chrom_hrv_metrics['MeanNN'].append(hrv_means['CHROM'][subject_id]['MeanNN'])
    chrom_hrv_metrics['pNN50'].append(hrv_means['CHROM'][subject_id]['pNN50'])
    chrom_hrv_metrics['RMSSD'].append(hrv_means['CHROM'][subject_id]['RMSSD'])
    chrom_hrv_metrics['SDNN'].append(hrv_means['CHROM'][subject_id]['SDNN'])

## Convert the chrom_hrv_metrics to a DataFrame
chrom_df = pd.DataFrame(chrom_hrv_metrics)

## Add label Rest to the dataFrame
chrom_df['Label'] = 'Rest'

chrom_df.head()

## Save the DataFrame to a CSV file
chrom_df.to_csv(output_path, index=False)