### Study Correlation Plan

For the purpose of getting the HRV data, we will use the library Neurokit2 to handle the proceess to get the data short window and the full one.

### Flow of the Study

- Takes the Windowed version of the data (30 seconds, 1 minute and 2 minute)
- Calculate the HRV Metrics / Features
- Take the signal of the full length
- Take the study correlation

### HRV Metrics that we're going to use

| **Domain**     | **HRV Feature** | **Unit** | **Description**                                                                 |
|----------------|------------------|----------|----------------------------------------------------------------------------------|
| **Time**       | MeanNN           | ms       | Mean RR interval                                                                 |
|                | SDNN             | ms       | Standard deviation of the RR intervals                                           |
|                | NN50             | -        | Number of pairs of differences between adjacent RR intervals > 50 ms             |
|                | pNN50            | %        | NN50 count divided by the total number of all RR intervals                       |
|                | RMSSD            | ms       | Root mean square of successive RR interval differences                           |
|                | MeanHR           | bpm      | Mean heart rate                                                                  |
|                | SDHR             | bpm      | Standard deviation of the heart rate                                             |
| **Frequency**  | LF               | ms²      | Power of low frequency band (0.04–0.15 Hz)                                       |
|                | HF               | ms²      | Power of high frequency band (0.15–0.4 Hz)                                       |
|                | LF/HF            | -        | Ratio of LF to HF                                                                |
| **Non-linear**  | CSI              | -        | Cardiac sympathetic index                                                        |
|                | CVI              | -        | Cardiac vagal index                                                              |
|                | SD1              | -        | Standard deviation of Poincaré plot projection on the line perpendicular to line y=x |
|                | SD2              | -        | Standard deviation of Poincaré plot projection on the line y=x                  |


### Setup Requirements

In [1]:
# UST HRV and Normal HRV Correlation Analysis for Stress Detection
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import os
from glob import glob
import warnings
import neurokit2 as nk
warnings.filterwarnings('ignore')

# Set plot style
plt.style.use('ggplot')
sns.set(font_scale=1.2)
sns.set_style("whitegrid")

In [2]:
import scipy 

def preprocess_ppg(signal, fs = 35):
    """ Computes the Preprocessed PPG Signal, this steps include the following:
        1. Moving Average Smoothing
        2. Bandpass Filtering
        
        Parameters:
        ----------
        signal (numpy array): 
            The PPG Signal to be preprocessed
        fs (float): 
            The Sampling Frequency of the Signal
            
        Returns:
        --------
        numpy array: 
            The Preprocessed PPG Signal
    
    """ 

    b, a = scipy.signal.butter(3, [0.5, 2.5], btype='band', fs=fs)
    filtered = scipy.signal.filtfilt(b, a, signal)

    return filtered

# 30 Seconds Plot Correlation

For 30 seconds window, the averaging purpose will be done under windowing each short rPPG segment with the **strides** of 15 seconds (means the different between each short window is 15 seconds).

The test will be done under certain scenario of the Task 1, Task 2 UBFC, Physio Rest 2 and Rest 6

In [3]:
base_path=f"PhysioItera"
subjects=os.listdir(base_path)
tasks = ["T1"]

# Store ground truth and rPPG data
gt_data = {}
rppg_data = {
    'POS': {},
    'LGI': {},
    'OMIT': {},
    'GREEN': {},
    'CHROM': {}
}
# Expected sampling rates (adjust if different for your dataset)
sample_rate_gt = 64  # Hz
sample_rate_video = 35 # Hz


In [4]:
## Process for each subject and task
for subject in subjects:
    for task in tasks:
        subject_task_id = f"{subject}_{task}"
        
        ## Make sure the folder ends in "2" for this subject
        if not subject.endswith("6"):
            # print(f"Subject {subject} not found, skipping")
            continue

        working_folder = base_path + f"/{subject}"

        pos = np.load(os.path.join(working_folder, f"Landmark-{subject}-POS-rppg.npy"))
        lgi = np.load(os.path.join(working_folder, f"Landmark-{subject}-LGI-rppg.npy"))
        omit = np.load(os.path.join(working_folder, f"Landmark-{subject}-OMIT-rppg.npy"))
        green = np.load(os.path.join(working_folder, f"Landmark-{subject}-GREEN-rppg.npy"))
        chrom = np.load(os.path.join(working_folder, f"Landmark-{subject}-CHROM-rppg.npy"))
        gt_path = os.path.join(working_folder, f"vernier/{subject}_vernier_ecg.csv")
        GT = pd.read_csv(gt_path, usecols=[1], header=None).values
        GT = GT.flatten()

        ## process rPPG signals
        rppg_data["POS"][subject_task_id] = preprocess_ppg(pos, fs=sample_rate_video)
        rppg_data["LGI"][subject_task_id] = preprocess_ppg(lgi, fs=sample_rate_video)
        rppg_data["OMIT"][subject_task_id] = preprocess_ppg(omit, fs=sample_rate_video)
        rppg_data["GREEN"][subject_task_id] = preprocess_ppg(green, fs=sample_rate_video)
        rppg_data["CHROM"][subject_task_id] = preprocess_ppg(chrom, fs=sample_rate_video)
        
        GT = preprocess_ppg(GT, fs=sample_rate_gt)
        gt_data[subject_task_id] = GT

print(f"Done Process the Signals")
    

Done Process the Signals


In [5]:
"""
Steps to reproduce getting the short term of 30 seconds for each subject + averaging:
1. Loop through each subject.
2. For each short rppg segment (30 seconds), compute the hrv metrics with the neurokit2 package and store it.
3. Average the HRV metrics across all segments for each subject.
4. Compare the correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics.
# Note: The above code is a preprocessing step. The next steps would involve calculating HRV metrics and performing correlation analysis.
""" 

## Iterate for each subject and compute HRV metrics
hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
}

## Store the HRV metrics for each rPPG method for each subject
rppg_hrv_metrics = {
    method: {
        subject_id: {
            key: [] for key in hrv_metrics.keys()
        } for subject_id in rppg_data[method].keys()
    } for method in rppg_data.keys()
}

## Iterate through each subject and compute HRV for each segments
for rppg_method in rppg_data.keys():
    for subject_task_id, rppg_signal in rppg_data[rppg_method].items():
        print(f"Processing {subject_task_id} for {rppg_method}")

        ## Applied the window of 30 seconds with stride of 15 seconds
        segment_length = 30 * sample_rate_video
        stride_length = 15 * sample_rate_video
        
        ## Making the segments
        for start in range(0, len(rppg_signal) - segment_length + 1, stride_length):
            segment = rppg_signal[start:start + segment_length]
            ## If the segment is less than the segment length, we skip it
            if len(segment) < segment_length:
                continue

            ## Compute the HRV metrics using neurokit2
            signals, _ = nk.ppg_process(segment, sampling_rate=sample_rate_video)
            peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=sample_rate_video)

            # Getting the HRV Metrics

            ## Time Domain
            hrv_time = nk.hrv_time(peaks, sampling_rate=sample_rate_video)

            ## Add into the hrv_metrics dictionary
            rppg_hrv_metrics[rppg_method][subject_task_id]['MeanNN'].append(hrv_time['HRV_MeanNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SDNN'].append(hrv_time['HRV_SDNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['RMSSD'].append(hrv_time['HRV_RMSSD'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['pNN50'].append(hrv_time['HRV_pNN50'])

            ## Frequency Domain
            hrv_freq = nk.hrv_frequency(peaks, sampling_rate=sample_rate_video, psd_method="welch")
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF'].append(hrv_freq['HRV_LF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['HF'].append(hrv_freq['HRV_HF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF_HF'].append(hrv_freq['HRV_LFHF'])

            ## Non-Linear Domain
            # hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=sample_rate_video)
            # rppg_hrv_metrics[rppg_method][subject_task_id]['SD1'].append(hrv_non_linear['HRV_SD1'])
            # rppg_hrv_metrics[rppg_method][subject_task_id]['SD2'].append(hrv_non_linear['HRV_SD2'])

Processing ades6_T1 for POS
Processing adin6_T1 for POS
Processing agus6_T1 for POS
Processing aice6_T1 for POS
Processing alana6_T1 for POS
Processing alex6_T1 for POS
Processing ali6_T1 for POS
Processing anggur6_T1 for POS
Processing ara6_T1 for POS
Processing arnold6_T1 for POS
Processing bunny6_T1 for POS
Processing cici6_T1 for POS
Processing citra6_T1 for POS
Processing dadu6_T1 for POS
Processing dede6_T1 for POS
Processing deka6_T1 for POS
Processing fitsan6_T1 for POS
Processing fote6_T1 for POS
Processing gab6_T1 for POS
Processing tryx6_T1 for POS
Processing ades6_T1 for LGI
Processing adin6_T1 for LGI
Processing agus6_T1 for LGI
Processing aice6_T1 for LGI
Processing alana6_T1 for LGI
Processing alex6_T1 for LGI
Processing ali6_T1 for LGI
Processing anggur6_T1 for LGI
Processing ara6_T1 for LGI
Processing arnold6_T1 for LGI
Processing bunny6_T1 for LGI
Processing cici6_T1 for LGI
Processing citra6_T1 for LGI
Processing dadu6_T1 for LGI
Processing dede6_T1 for LGI
Processin

In [6]:
### Calculate the average HRV metrics for each segment for each subject per method

hrv_means = {}
for method in rppg_hrv_metrics:
    hrv_means[method] = {}

    for subject in rppg_hrv_metrics[method]:
        hrv_means[method][subject] = {}

        for metric, values in rppg_hrv_metrics[method][subject].items():
            if values:
                hrv_means[method][subject][metric] = np.mean(values)
            else:
                hrv_means[method][subject][metric] = np.nan

print(hrv_means)

{'POS': {'ades6_T1': {'MeanNN': 704.4632487141197, 'SDNN': 193.34698486336737, 'RMSSD': 282.57260647497, 'pNN50': 90.38908246225319, 'LF': nan, 'HF': 0.1643258366321684, 'LF_HF': nan}, 'adin6_T1': {'MeanNN': 704.1326530612243, 'SDNN': 253.1903550285819, 'RMSSD': 336.6491525134344, 'pNN50': 87.73809523809524, 'LF': nan, 'HF': 0.14634854422727495, 'LF_HF': nan}, 'agus6_T1': {'MeanNN': 741.3919413919415, 'SDNN': 296.12235967413574, 'RMSSD': 433.3567428748041, 'pNN50': 84.61538461538461, 'LF': nan, 'HF': 0.15209384526878103, 'LF_HF': nan}, 'aice6_T1': {'MeanNN': 780.0751879699249, 'SDNN': 294.01777738044314, 'RMSSD': 431.5159206313488, 'pNN50': 93.2748538011696, 'LF': nan, 'HF': 0.17006906703552832, 'LF_HF': nan}, 'alana6_T1': {'MeanNN': 689.3894143023062, 'SDNN': 275.14628350043523, 'RMSSD': 307.91523822973215, 'pNN50': 89.19860627177701, 'LF': nan, 'HF': 0.05008660629804491, 'LF_HF': nan}, 'alex6_T1': {'MeanNN': 721.0714285714287, 'SDNN': 266.6516290584453, 'RMSSD': 319.6436198852756, 'p

### Getting the GT HRV Metrics

In [7]:
# Compare the Correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics

## Getting the ground truth HRV metrics

gt_hrv_metrics = {
    subject_id: {
        key: [] for key in hrv_metrics.keys()
    } for subject_id in gt_data.keys()
}

# Iterate through each subject and compute the full length HRV metrics for the ground truth
for subject_task_id, gt_signal in gt_data.items():
    print(f"Processing {subject_task_id} for ground truth")

    ## Compute the HRV metrics using neurokit2
    signals, _ = nk.ppg_process(gt_signal, sampling_rate=sample_rate_gt)
    peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=sample_rate_gt)

    # Getting the HRV Metrics

    ## Time Domain
    hrv_time = nk.hrv_time(peaks, sampling_rate=sample_rate_gt)

    ## Add into the hrv_metrics dictionary
    gt_hrv_metrics[subject_task_id]['MeanNN'] = (hrv_time['HRV_MeanNN'])
    gt_hrv_metrics[subject_task_id]['SDNN'] = (hrv_time['HRV_SDNN'])
    gt_hrv_metrics[subject_task_id]['RMSSD'] = (hrv_time['HRV_RMSSD'])
    gt_hrv_metrics[subject_task_id]['pNN50'] = (hrv_time['HRV_pNN50'])

    ## Frequency Domain
    hrv_freq = nk.hrv_frequency(peaks, sampling_rate=sample_rate_gt, psd_method="welch")
    gt_hrv_metrics[subject_task_id]['LF'] = (hrv_freq['HRV_LF'])
    gt_hrv_metrics[subject_task_id]['HF'] = (hrv_freq['HRV_HF'])
    gt_hrv_metrics[subject_task_id]['LF_HF'] = (hrv_freq['HRV_LFHF'])

    ## Non-Linear Domain
    # hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=sample_rate_gt)
    # gt_hrv_metrics[subject_task_id]['SD1'] = (hrv_non_linear['HRV_SD1'])
    # gt_hrv_metrics[subject_task_id]['SD2'] = (hrv_non_linear['HRV_SD2'])



Processing ades6_T1 for ground truth
Processing adin6_T1 for ground truth
Processing agus6_T1 for ground truth
Processing aice6_T1 for ground truth
Processing alana6_T1 for ground truth
Processing alex6_T1 for ground truth
Processing ali6_T1 for ground truth
Processing anggur6_T1 for ground truth
Processing ara6_T1 for ground truth
Processing arnold6_T1 for ground truth
Processing bunny6_T1 for ground truth
Processing cici6_T1 for ground truth
Processing citra6_T1 for ground truth
Processing dadu6_T1 for ground truth
Processing dede6_T1 for ground truth
Processing deka6_T1 for ground truth
Processing fitsan6_T1 for ground truth
Processing fote6_T1 for ground truth
Processing gab6_T1 for ground truth
Processing tryx6_T1 for ground truth


### Since we already get the Metrics HRV value of the rPPG, let's compare it with the GT to see the correlation

In [8]:
# First thing first is we need to remove the outlier from rppg, 
# and make to remove the same subjects from the ground truth as well
# Process of removing the outlier itself, is also done under the IQR method
def remove_outliers_iqr(data):
    """ Remove outliers using the IQR method.
    
    Parameters:
    ----------
    data (list or numpy array): The data from which to remove outliers.
    
    Returns:
    --------
    numpy array: Data with outliers removed.
    """
    data = np.asarray(data)  
    
    if len(data) == 0:
        return np.array([])

    q1 = np.percentile(data, 25)
    q3 = np.percentile(data, 75)
    iqr = q3 - q1
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr

    return np.array([x for x in data if lower_bound <= x <= upper_bound])

# Compute correlation between rPPG methods and ground truth HRV metrics
correlation_results = {}

for method in hrv_means.keys():
    correlation_results[method] = {}
    
    for metric in hrv_metrics.keys():
        # Collect all values for this metric across subjects
        all_metric_values = []
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value):
                    all_metric_values.append(value)
        
        # Remove outlier subjects for this metric
        cleaned_values = remove_outliers_iqr(all_metric_values)
        
        # Prepare data for correlation
        rppg_values = []
        gt_values = []
        
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value) and value in cleaned_values:
                    # Subject is not an outlier, include in analysis
                    rppg_values.append(value)
                    
                    # Add corresponding ground truth
                    if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                        if not gt_hrv_metrics[subject_id][metric].empty:
                            gt_value = gt_hrv_metrics[subject_id][metric][0] if isinstance(gt_hrv_metrics[subject_id][metric], pd.Series) else gt_hrv_metrics[subject_id][metric]
                            gt_values.append(gt_value)
        
        # Calculate correlation
        if len(rppg_values) > 1 and len(gt_values) > 1:
            correlation, p_value = stats.pearsonr(rppg_values, gt_values)
            correlation_results[method][metric] = {
                'correlation': correlation,
                'p_value': p_value,
                'n_subjects': len(rppg_values)
            }

In [9]:
## Print the correlation results
for method, metrics in correlation_results.items():
    print(f"Method: {method}")
    for metric, result in metrics.items():
        print(f"  {metric}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")

Method: POS
  MeanNN: Correlation = 0.0844, p-value = 0.7393
  SDNN: Correlation = 0.0658, p-value = 0.7952
  RMSSD: Correlation = 0.0974, p-value = 0.6917
  pNN50: Correlation = 0.1420, p-value = 0.5620
  HF: Correlation = 0.1244, p-value = 0.6228


Method: LGI
  MeanNN: Correlation = -0.0986, p-value = 0.6880
  SDNN: Correlation = 0.1437, p-value = 0.5573
  RMSSD: Correlation = 0.0926, p-value = 0.7061
  pNN50: Correlation = 0.4087, p-value = 0.1160
  HF: Correlation = -0.0880, p-value = 0.7203


Method: OMIT
  MeanNN: Correlation = -0.0912, p-value = 0.7105
  SDNN: Correlation = 0.1370, p-value = 0.5759
  RMSSD: Correlation = 0.0962, p-value = 0.6952
  pNN50: Correlation = 0.0317, p-value = 0.9007
  HF: Correlation = -0.1002, p-value = 0.6831


Method: GREEN
  MeanNN: Correlation = -0.1985, p-value = 0.4154
  SDNN: Correlation = -0.1091, p-value = 0.6565
  RMSSD: Correlation = -0.0820, p-value = 0.7543
  pNN50: Correlation = 0.0336, p-value = 0.8947
  HF: Correlation = -0.0107, p-va

In [10]:
# ### Plot the correlation scatter plots for each method and metric
# def plot_correlation_scatter(rppg_values, gt_values, method, metric):
#     """ Plot the correlation scatter plot for rPPG values and ground truth values.
    
#     Parameters:
#     ----------
#     rppg_values (list): List of rPPG values.
#     gt_values (list): List of ground truth values.
#     method (str): The rPPG method used.
#     metric (str): The HRV metric being analyzed.
#     """
#     plt.figure(figsize=(8, 6))
#     sns.scatterplot(x=rppg_values, y=gt_values)
#     plt.title(f"{method} - {metric} Correlation")
#     plt.xlabel(f"{method} {metric}")
#     plt.ylabel(f"Ground Truth {metric}")
    
#     # Fit a regression line
#     sns.regplot(x=rppg_values, y=gt_values, scatter=False, color='red', line_kws={"label": "Fit Line"})
    
#     plt.legend()
#     plt.grid(True)
#     plt.show()

# # Plot the correlation scatter plots for each method and metric
# for method in hrv_means.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         # Collect values for plotting
#         for subject_id in hrv_means[method].keys():
#             if subject_id in rppg_hrv_metrics[method] and metric in rppg_hrv_metrics[method][subject_id]:
#                 original_values = rppg_hrv_metrics[method][subject_id][metric]
#                 cleaned_values = remove_outliers_iqr(original_values)
                
#                 if len(cleaned_values) > 0:
#                     rppg_value = np.mean(cleaned_values)
                    
#                     gt_hrv_temp = gt_hrv_metrics.get(subject_id, {})
#                     if metric in gt_hrv_temp and not gt_hrv_temp[metric].empty:
#                         gt_value = gt_hrv_temp[metric][0] if isinstance(gt_hrv_temp[metric], pd.Series) else gt_hrv_temp[metric]
                        
#                         rppg_values.append(rppg_value)
#                         gt_values.append(gt_value)

#         # Plot if we have enough data points
#         if len(rppg_values) > 1 and len(gt_values) > 1:
#             plot_correlation_scatter(rppg_values, gt_values, method, metric)

In [11]:
# Calculate the top 5 features with the highest correlation for each rPPG method
top_features = {}
for method, metrics in correlation_results.items():
    sorted_metrics = sorted(metrics.items(), key=lambda x: abs(x[1]['correlation']), reverse=True)
    top_features[method] = sorted_metrics[:5]
print("Top 5 Features with Highest Correlation:")
for method, features in top_features.items():
    print(f"Method: {method}")
    for feature, result in features:
        print(f"  {feature}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")
    

Top 5 Features with Highest Correlation:
Method: POS
  pNN50: Correlation = 0.1420, p-value = 0.5620
  HF: Correlation = 0.1244, p-value = 0.6228
  RMSSD: Correlation = 0.0974, p-value = 0.6917
  MeanNN: Correlation = 0.0844, p-value = 0.7393
  SDNN: Correlation = 0.0658, p-value = 0.7952


Method: LGI
  pNN50: Correlation = 0.4087, p-value = 0.1160
  SDNN: Correlation = 0.1437, p-value = 0.5573
  MeanNN: Correlation = -0.0986, p-value = 0.6880
  RMSSD: Correlation = 0.0926, p-value = 0.7061
  HF: Correlation = -0.0880, p-value = 0.7203


Method: OMIT
  SDNN: Correlation = 0.1370, p-value = 0.5759
  HF: Correlation = -0.1002, p-value = 0.6831
  RMSSD: Correlation = 0.0962, p-value = 0.6952
  MeanNN: Correlation = -0.0912, p-value = 0.7105
  pNN50: Correlation = 0.0317, p-value = 0.9007


Method: GREEN
  MeanNN: Correlation = -0.1985, p-value = 0.4154
  SDNN: Correlation = -0.1091, p-value = 0.6565
  RMSSD: Correlation = -0.0820, p-value = 0.7543
  pNN50: Correlation = 0.0336, p-value =

### Check the Bland-Altman, to see the mean bias nad the interlva of the Limit of Aggrement, make sure the point fall within the LoA

In [12]:
# # Check the value of the rPPG and GT with the Bland-Altman plot and 
# # see the measurement agreement between the rPPG methods and the ground truth

# def plot_bland_altman(rppg_values, gt_values, method, metric):
#     """ Plot Bland-Altman plot for rPPG values against ground truth values """
#     mean_diff = np.mean(rppg_values - gt_values)
#     std_diff = np.std(rppg_values - gt_values)

#     plt.figure(figsize=(10, 6))
#     plt.scatter((rppg_values + gt_values) / 2, rppg_values - gt_values, alpha=0.5)
#     plt.axhline(mean_diff, color='red', linestyle='--', label='Mean Difference')
#     plt.axhline(mean_diff + 1.96 * std_diff, color='green', linestyle='--', label='Upper Limit of Agreement')
#     plt.axhline(mean_diff - 1.96 * std_diff, color='blue', linestyle='--', label='Lower Limit of Agreement')
    
#     plt.title(f'Bland-Altman Plot: {method} - {metric}')
#     plt.xlabel('Mean of rPPG and GT Values')
#     plt.ylabel('Difference (rPPG - GT)')
#     plt.legend()
#     plt.grid()
#     plt.show()

# # Plot Bland-Altman plots for each method and metric
# for method in rppg_hrv_metrics.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         for subject_id in rppg_hrv_metrics[method].keys():
#             # Use hrv_means for the rPPG values
#             if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
#                 rppg_values.append(hrv_means[method][subject_id][metric])
            
#             # For ground truth, get the first value from the list or calculate mean
#             if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
#                 if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
#                     gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

#         if len(rppg_values) > 0 and len(gt_values) > 0:
#             plot_bland_altman(np.array(rppg_values), np.array(gt_values), method, metric)


In [13]:
## Calculate the mean bias, average, standard deviation and the interval of the LOA
## Put inside the table and show the results
## Calculate the LoA percentage for each method and metric and see if the percentage is within 20% difference

def calculate_bland_altman_stats(rppg_values, gt_values):
    """ Calculate the Bland-Altman statistics """
    mean_diff = np.mean(rppg_values - gt_values)
    std_diff = np.std(rppg_values - gt_values)
    
    upper_limit = mean_diff + 1.96 * std_diff
    lower_limit = mean_diff - 1.96 * std_diff
    
    return mean_diff, std_diff, upper_limit, lower_limit

def calculate_percentage_difference(rppg_values, gt_values):
    """ Calculate the percentage difference between rPPG and ground truth values """
    percentage_diff = np.abs((rppg_values - gt_values) / gt_values) * 100
    return np.mean(percentage_diff)

# Prepare the results table
results_table = []  
for method in rppg_hrv_metrics.keys():
    for metric in hrv_metrics.keys():
        rppg_values = []
        gt_values = []

        for subject_id in rppg_hrv_metrics[method].keys():
            # Use hrv_means for the rPPG values
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                rppg_values.append(hrv_means[method][subject_id][metric])
            
            # For ground truth, get the first value from the list or calculate mean
            if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
                    gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

        if len(rppg_values) > 0 and len(gt_values) > 0:
            mean_diff, std_diff, upper_limit, lower_limit = calculate_bland_altman_stats(np.array(rppg_values), np.array(gt_values))
            percentage_diff = calculate_percentage_difference(np.array(rppg_values), np.array(gt_values))

            results_table.append({
                'Method': method,
                'Metric': metric,
                'Mean Average': np.mean(rppg_values),
                'Ground Truth Average': np.mean(gt_values),
                'Mean Difference': mean_diff,
                'Standard Deviation': std_diff,
                'Upper Limit of Agreement': upper_limit,
                'Lower Limit of Agreement': lower_limit,
                'Percentage Difference': percentage_diff
            })
# Convert results to DataFrame for better visualization
results_df = pd.DataFrame(results_table)
# Display the results
print("\nBland-Altman Results:")
print(results_df)



Bland-Altman Results:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN           NaN            854.067654              NaN   
1     POS    SDNN           NaN            223.131393              NaN   
2     POS   RMSSD           NaN            382.748026              NaN   
3     POS   pNN50           NaN             90.091822              NaN   
4     POS      LF           NaN              0.006132              NaN   
5     POS      HF           NaN              0.010668              NaN   
6     POS   LF_HF           NaN              1.005352              NaN   
7     LGI  MeanNN           NaN            854.067654              NaN   
8     LGI    SDNN           NaN            223.131393              NaN   
9     LGI   RMSSD           NaN            382.748026              NaN   
10    LGI   pNN50           NaN             90.091822              NaN   
11    LGI      LF           NaN              0.006132              NaN   
12    LGI      

In [14]:
### Calculate which methods are within 20% difference and the best in terms of minimal percentage difference
within_20_percent = results_df[results_df['Percentage Difference'] <= 20]
print("\nMethods within 20% difference:")
print(within_20_percent)


Methods within 20% difference:
Empty DataFrame
Columns: [Method, Metric, Mean Average, Ground Truth Average, Mean Difference, Standard Deviation, Upper Limit of Agreement, Lower Limit of Agreement, Percentage Difference]
Index: []


### Conclussion : 30 Seconds window

The study correlation within the 30 seconds rppg hrv metrics compare to the GT shows weak / moderate relation with the GT.

Using the bland-altman itself it shows one feature. The MeanNN (time it takes between each heart beat) have acceptable agreement with the reference based on your 20% threshold.

---