### Study Correlation Plan

For the purpose of getting the HRV data, we will use the library Neurokit2 to handle the proceess to get the data short window and the full one.

### Flow of the Study

- Takes the Windowed version of the data (30 seconds, 1 minute and 2 minute)
- Calculate the HRV Metrics / Features
- Take the signal of the full length
- Take the study correlation

### HRV Metrics that we're going to use

| **Domain**     | **HRV Feature** | **Unit** | **Description**                                                                 |
|----------------|------------------|----------|----------------------------------------------------------------------------------|
| **Time**       | MeanNN           | ms       | Mean RR interval                                                                 |
|                | SDNN             | ms       | Standard deviation of the RR intervals                                           |
|                | NN50             | -        | Number of pairs of differences between adjacent RR intervals > 50 ms             |
|                | pNN50            | %        | NN50 count divided by the total number of all RR intervals                       |
|                | RMSSD            | ms       | Root mean square of successive RR interval differences                           |
|                | MeanHR           | bpm      | Mean heart rate                                                                  |
|                | SDHR             | bpm      | Standard deviation of the heart rate                                             |
| **Frequency**  | LF               | ms²      | Power of low frequency band (0.04–0.15 Hz)                                       |
|                | HF               | ms²      | Power of high frequency band (0.15–0.4 Hz)                                       |
|                | LF/HF            | -        | Ratio of LF to HF                                                                |
| **Non-linear**  | CSI              | -        | Cardiac sympathetic index                                                        |
|                | CVI              | -        | Cardiac vagal index                                                              |
|                | SD1              | -        | Standard deviation of Poincaré plot projection on the line perpendicular to line y=x |
|                | SD2              | -        | Standard deviation of Poincaré plot projection on the line y=x                  |


### Setup Requirements

In [149]:
# UST HRV and Normal HRV Correlation Analysis for Stress Detection
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import os
from glob import glob
import warnings
import neurokit2 as nk
warnings.filterwarnings('ignore')

# Set plot style
plt.style.use('ggplot')
sns.set(font_scale=1.2)
sns.set_style("whitegrid")

In [None]:
import scipy 

def preprocess_ppg(signal, fs = 35):
    """ Computes the Preprocessed PPG Signal, this steps include the following:
        1. Moving Average Smoothing
        2. Bandpass Filtering
        
        Parameters:
        ----------
        signal (numpy array): 
            The PPG Signal to be preprocessed
        fs (float): 
            The Sampling Frequency of the Signal
            
        Returns:
        --------
        numpy array: 
            The Preprocessed PPG Signal
    
    """ 

    # 2. Bandpass filter to isolate the cardiac component (0.7-2.5 Hz)
    b_bp, a_bp = scipy.signal.butter(3, [0.7, 2.5], btype='band', fs=fs)
    filtered = scipy.signal.filtfilt(b_bp, a_bp, signal)

    # 3. Normalize the signal
    filtered = (filtered - np.mean(filtered)) / np.std(filtered)
    
    # 3. Upsample the signal to 100Hz (better temporal resolution for peak detection)
    upsampled = nk.signal_resample(filtered, sampling_rate=fs, desired_sampling_rate=200)

    window_size = int(100 * 0.1)  # 100ms window at 100Hz
    if window_size % 2 == 0:      # ensure odd window size for centered smoothing
        window_size += 1
    smoothed = scipy.signal.savgol_filter(upsampled, window_size, 3)

    return smoothed

# 30 Seconds Plot Correlation

For 30 seconds window, the averaging purpose will be done under windowing each short rPPG segment with the **strides** of 15 seconds (means the different between each short window is 15 seconds).

The test will be done under certain scenario of the Task 1, Task 2 UBFC, Physio Rest 2 and Rest 6

In [151]:
root_path = "UBFC-Phys"
subjects = ["s41", "s42", "s43", "s44","s45","s46","s47","s48","s49","s50","s51","s52", "s53","s54","s55","s56"]
tasks = ["T1"]

# Store ground truth and rPPG data
gt_data = {}
rppg_data = {
    'POS': {},
    'LGI': {},
    'OMIT': {},
    'GREEN': {},
    'CHROM': {}
}
# Expected sampling rates (adjust if different for your dataset)
sample_rate_gt = 64  # Hz
sample_rate_video = 35 # Hz
desired_sample_rate = 200  # Hz


In [152]:
## Process for each subject and task
for subject in subjects:
    for task in tasks:
        subject_task_id = f"{subject}_{task}"

        # Load rPPG signals from different methods
        pos = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_POS_rppg.npy"))
        lgi = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_LGI_rppg.npy"))
        omit = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_OMIT_rppg.npy"))
        green = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_GREEN_rppg.npy"))
        chrom = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_CHROM_rppg.npy"))

        # Load ground truth BVP
        GT = pd.read_csv(os.path.join(root_path, subject, f"bvp_{subject}_{task}.csv")).values
        GT = GT.flatten()

        ## process rPPG signals
        rppg_data["POS"][subject_task_id] = preprocess_ppg(pos, fs=sample_rate_video)
        rppg_data["LGI"][subject_task_id] = preprocess_ppg(lgi, fs=sample_rate_video)
        rppg_data["OMIT"][subject_task_id] = preprocess_ppg(omit, fs=sample_rate_video)
        rppg_data["GREEN"][subject_task_id] = preprocess_ppg(green, fs=sample_rate_video)
        rppg_data["CHROM"][subject_task_id] = preprocess_ppg(chrom, fs=sample_rate_video)
        
        GT = preprocess_ppg(GT, fs=sample_rate_gt)
        gt_data[subject_task_id] = GT

print(f"Done Process the Signals")
    

Done Process the Signals


In [153]:
"""
Steps to reproduce getting the short term of 30 seconds for each subject + averaging:
1. Loop through each subject.
2. For each short rppg segment (30 seconds), compute the hrv metrics with the neurokit2 package and store it.
3. Average the HRV metrics across all segments for each subject.
4. Compare the correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics.
# Note: The above code is a preprocessing step. The next steps would involve calculating HRV metrics and performing correlation analysis.
""" 

## Iterate for each subject and compute HRV metrics
hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
}

## Store the HRV metrics for each rPPG method for each subject
rppg_hrv_metrics = {
    method: {
        subject_id: {
            key: [] for key in hrv_metrics.keys()
        } for subject_id in rppg_data[method].keys()
    } for method in rppg_data.keys()
}

## Iterate through each subject and compute HRV for each segments
for rppg_method in rppg_data.keys():
    for subject_task_id, rppg_signal in rppg_data[rppg_method].items():
        print(f"Processing {subject_task_id} for {rppg_method}")

        ## Applied the window of 30 seconds with stride of 15 seconds
        segment_length = 30 * desired_sample_rate
        stride_length = 15 * desired_sample_rate
        
        ## Making the segments
        for start in range(0, len(rppg_signal) - segment_length + 1, stride_length):
            segment = rppg_signal[start:start + segment_length]
            ## If the segment is less than the segment length, we skip it
            if len(segment) < segment_length:
                continue

            ## Compute the HRV metrics using neurokit2
            signals, _ = nk.ppg_process(segment, sampling_rate=desired_sample_rate)
            peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=desired_sample_rate)

            # Getting the HRV Metrics

            ## Time Domain
            hrv_time = nk.hrv_time(peaks, sampling_rate=desired_sample_rate)

            ## Add into the hrv_metrics dictionary
            rppg_hrv_metrics[rppg_method][subject_task_id]['MeanNN'].append(hrv_time['HRV_MeanNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SDNN'].append(hrv_time['HRV_SDNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['RMSSD'].append(hrv_time['HRV_RMSSD'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['pNN50'].append(hrv_time['HRV_pNN50'])

            ## Frequency Domain
            hrv_freq = nk.hrv_frequency(peaks, sampling_rate=desired_sample_rate, psd_method="welch")
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF'].append(hrv_freq['HRV_LF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['HF'].append(hrv_freq['HRV_HF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF_HF'].append(hrv_freq['HRV_LFHF'])

            ## Non-Linear Domain
            # hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=sample_rate_video)
            # rppg_hrv_metrics[rppg_method][subject_task_id]['SD1'].append(hrv_non_linear['HRV_SD1'])
            # rppg_hrv_metrics[rppg_method][subject_task_id]['SD2'].append(hrv_non_linear['HRV_SD2'])

Processing s41_T1 for POS
Processing s42_T1 for POS
Processing s43_T1 for POS
Processing s44_T1 for POS
Processing s45_T1 for POS
Processing s46_T1 for POS
Processing s47_T1 for POS
Processing s48_T1 for POS
Processing s49_T1 for POS
Processing s50_T1 for POS
Processing s51_T1 for POS
Processing s52_T1 for POS
Processing s53_T1 for POS
Processing s54_T1 for POS
Processing s55_T1 for POS
Processing s56_T1 for POS
Processing s41_T1 for LGI
Processing s42_T1 for LGI
Processing s43_T1 for LGI
Processing s44_T1 for LGI
Processing s45_T1 for LGI
Processing s46_T1 for LGI
Processing s47_T1 for LGI
Processing s48_T1 for LGI
Processing s49_T1 for LGI
Processing s50_T1 for LGI
Processing s51_T1 for LGI
Processing s52_T1 for LGI
Processing s53_T1 for LGI
Processing s54_T1 for LGI
Processing s55_T1 for LGI
Processing s56_T1 for LGI
Processing s41_T1 for OMIT
Processing s42_T1 for OMIT
Processing s43_T1 for OMIT
Processing s44_T1 for OMIT
Processing s45_T1 for OMIT
Processing s46_T1 for OMIT
Proces

In [154]:
### Calculate the average HRV metrics for each segment for each subject per method

hrv_means = {}
for method in rppg_hrv_metrics:
    hrv_means[method] = {}

    for subject in rppg_hrv_metrics[method]:
        hrv_means[method][subject] = {}

        for metric, values in rppg_hrv_metrics[method][subject].items():
            if values:
                hrv_means[method][subject][metric] = np.mean(values)
            else:
                hrv_means[method][subject][metric] = np.nan

print(hrv_means)

{'POS': {'s41_T1': {'MeanNN': 644.3661682206957, 'SDNN': 133.08636528285615, 'RMSSD': 186.88091076016815, 'pNN50': 63.26712377182441, 'LF': nan, 'HF': 0.13671169748977266, 'LF_HF': nan}, 's42_T1': {'MeanNN': 837.3773045438811, 'SDNN': 160.36750687115656, 'RMSSD': 213.94218894839952, 'pNN50': 68.92188846522077, 'LF': nan, 'HF': 0.11663441690418228, 'LF_HF': nan}, 's43_T1': {'MeanNN': 665.3020288966377, 'SDNN': 156.42342900497712, 'RMSSD': 214.58793268815077, 'pNN50': 67.10277037972597, 'LF': nan, 'HF': 0.10311459666963824, 'LF_HF': nan}, 's44_T1': {'MeanNN': 846.4318091632274, 'SDNN': 247.758821352666, 'RMSSD': 315.31514158583394, 'pNN50': 83.70318641806048, 'LF': nan, 'HF': 0.1194652396994715, 'LF_HF': nan}, 's45_T1': {'MeanNN': 778.9812863953764, 'SDNN': 158.52563264247473, 'RMSSD': 201.45471379767915, 'pNN50': 74.0839028653134, 'LF': nan, 'HF': 0.12162630482841243, 'LF_HF': nan}, 's46_T1': {'MeanNN': 800.2957665457666, 'SDNN': 86.77145018468428, 'RMSSD': 108.58393882735282, 'pNN50': 

### Getting the GT HRV Metrics

In [155]:
# Compare the Correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics

## Getting the ground truth HRV metrics

gt_hrv_metrics = {
    subject_id: {
        key: [] for key in hrv_metrics.keys()
    } for subject_id in gt_data.keys()
}

# Iterate through each subject and compute the full length HRV metrics for the ground truth
for subject_task_id, gt_signal in gt_data.items():
    print(f"Processing {subject_task_id} for ground truth")

    ## Compute the HRV metrics using neurokit2
    signals, _ = nk.ppg_process(gt_signal, sampling_rate=desired_sample_rate)
    peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=desired_sample_rate)

    # Getting the HRV Metrics

    ## Time Domain
    hrv_time = nk.hrv_time(peaks, sampling_rate=desired_sample_rate)

    ## Add into the hrv_metrics dictionary
    gt_hrv_metrics[subject_task_id]['MeanNN'] = (hrv_time['HRV_MeanNN'])
    gt_hrv_metrics[subject_task_id]['SDNN'] = (hrv_time['HRV_SDNN'])
    gt_hrv_metrics[subject_task_id]['RMSSD'] = (hrv_time['HRV_RMSSD'])
    gt_hrv_metrics[subject_task_id]['pNN50'] = (hrv_time['HRV_pNN50'])

    ## Frequency Domain
    hrv_freq = nk.hrv_frequency(peaks, sampling_rate=desired_sample_rate, psd_method="welch")
    gt_hrv_metrics[subject_task_id]['LF'] = (hrv_freq['HRV_LF'])
    gt_hrv_metrics[subject_task_id]['HF'] = (hrv_freq['HRV_HF'])
    gt_hrv_metrics[subject_task_id]['LF_HF'] = (hrv_freq['HRV_LFHF'])

    ## Non-Linear Domain
    # hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=sample_rate_gt)
    # gt_hrv_metrics[subject_task_id]['SD1'] = (hrv_non_linear['HRV_SD1'])
    # gt_hrv_metrics[subject_task_id]['SD2'] = (hrv_non_linear['HRV_SD2'])



Processing s41_T1 for ground truth
Processing s42_T1 for ground truth
Processing s43_T1 for ground truth
Processing s44_T1 for ground truth
Processing s45_T1 for ground truth
Processing s46_T1 for ground truth
Processing s47_T1 for ground truth
Processing s48_T1 for ground truth
Processing s49_T1 for ground truth
Processing s50_T1 for ground truth
Processing s51_T1 for ground truth
Processing s52_T1 for ground truth
Processing s53_T1 for ground truth
Processing s54_T1 for ground truth
Processing s55_T1 for ground truth
Processing s56_T1 for ground truth


### Since we already get the Metrics HRV value of the rPPG, let's compare it with the GT to see the correlation

In [156]:
# First thing first is we need to remove the outlier from rppg, 
# and make to remove the same subjects from the ground truth as well
# Process of removing the outlier itself, is also done under the IQR method
def remove_outliers_iqr(data):
    """ Remove outliers using the IQR method.
    
    Parameters:
    ----------
    data (list or numpy array): The data from which to remove outliers.
    
    Returns:
    --------
    numpy array: Data with outliers removed.
    """
    data = np.asarray(data)  
    
    if len(data) == 0:
        return np.array([])

    q1 = np.percentile(data, 25)
    q3 = np.percentile(data, 75)
    iqr = q3 - q1
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr

    return np.array([x for x in data if lower_bound <= x <= upper_bound])

# Compute correlation between rPPG methods and ground truth HRV metrics
correlation_results = {}

for method in hrv_means.keys():
    correlation_results[method] = {}
    
    for metric in hrv_metrics.keys():
        # Collect all values for this metric across subjects
        all_metric_values = []
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value):
                    all_metric_values.append(value)
        
        # Remove outlier subjects for this metric
        cleaned_values = remove_outliers_iqr(all_metric_values)
        
        # Prepare data for correlation
        rppg_values = []
        gt_values = []
        
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value) and value in cleaned_values:
                    # Subject is not an outlier, include in analysis
                    rppg_values.append(value)
                    
                    # Add corresponding ground truth
                    if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                        if not gt_hrv_metrics[subject_id][metric].empty:
                            gt_value = gt_hrv_metrics[subject_id][metric][0] if isinstance(gt_hrv_metrics[subject_id][metric], pd.Series) else gt_hrv_metrics[subject_id][metric]
                            gt_values.append(gt_value)
        
        # Calculate correlation
        if len(rppg_values) > 1 and len(gt_values) > 1:
            correlation, p_value = stats.pearsonr(rppg_values, gt_values)
            correlation_results[method][metric] = {
                'correlation': correlation,
                'p_value': p_value,
                'n_subjects': len(rppg_values)
            }

In [157]:
## Print the correlation results
for method, metrics in correlation_results.items():
    print(f"Method: {method}")
    for metric, result in metrics.items():
        print(f"  {metric}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")

Method: POS
  MeanNN: Correlation = 0.7235, p-value = 0.0015
  SDNN: Correlation = 0.1640, p-value = 0.5923
  RMSSD: Correlation = 0.1025, p-value = 0.7390
  pNN50: Correlation = 0.3590, p-value = 0.1721
  HF: Correlation = -0.2292, p-value = 0.3932


Method: LGI
  MeanNN: Correlation = 0.7095, p-value = 0.0021
  SDNN: Correlation = 0.2235, p-value = 0.4424
  RMSSD: Correlation = 0.2256, p-value = 0.4379
  pNN50: Correlation = 0.4299, p-value = 0.0965
  HF: Correlation = -0.2178, p-value = 0.4179


Method: OMIT
  MeanNN: Correlation = 0.7345, p-value = 0.0012
  SDNN: Correlation = 0.2206, p-value = 0.4485
  RMSSD: Correlation = 0.2294, p-value = 0.4301
  pNN50: Correlation = 0.4256, p-value = 0.1002
  HF: Correlation = -0.2604, p-value = 0.3300


Method: GREEN
  MeanNN: Correlation = 0.7550, p-value = 0.0011
  SDNN: Correlation = 0.3764, p-value = 0.1667
  RMSSD: Correlation = 0.4113, p-value = 0.1277
  pNN50: Correlation = 0.2461, p-value = 0.3582
  HF: Correlation = -0.0494, p-value 

In [158]:
# ### Plot the correlation scatter plots for each method and metric
# def plot_correlation_scatter(rppg_values, gt_values, method, metric):
#     """ Plot the correlation scatter plot for rPPG values and ground truth values.
    
#     Parameters:
#     ----------
#     rppg_values (list): List of rPPG values.
#     gt_values (list): List of ground truth values.
#     method (str): The rPPG method used.
#     metric (str): The HRV metric being analyzed.
#     """
#     plt.figure(figsize=(8, 6))
#     sns.scatterplot(x=rppg_values, y=gt_values)
#     plt.title(f"{method} - {metric} Correlation")
#     plt.xlabel(f"{method} {metric}")
#     plt.ylabel(f"Ground Truth {metric}")
    
#     # Fit a regression line
#     sns.regplot(x=rppg_values, y=gt_values, scatter=False, color='red', line_kws={"label": "Fit Line"})
    
#     plt.legend()
#     plt.grid(True)
#     plt.show()

# # Plot the correlation scatter plots for each method and metric
# for method in hrv_means.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         # Collect values for plotting
#         for subject_id in hrv_means[method].keys():
#             if subject_id in rppg_hrv_metrics[method] and metric in rppg_hrv_metrics[method][subject_id]:
#                 original_values = rppg_hrv_metrics[method][subject_id][metric]
#                 cleaned_values = remove_outliers_iqr(original_values)
                
#                 if len(cleaned_values) > 0:
#                     rppg_value = np.mean(cleaned_values)
                    
#                     gt_hrv_temp = gt_hrv_metrics.get(subject_id, {})
#                     if metric in gt_hrv_temp and not gt_hrv_temp[metric].empty:
#                         gt_value = gt_hrv_temp[metric][0] if isinstance(gt_hrv_temp[metric], pd.Series) else gt_hrv_temp[metric]
                        
#                         rppg_values.append(rppg_value)
#                         gt_values.append(gt_value)

#         # Plot if we have enough data points
#         if len(rppg_values) > 1 and len(gt_values) > 1:
#             plot_correlation_scatter(rppg_values, gt_values, method, metric)

In [159]:
# Calculate the top 5 features with the highest correlation for each rPPG method
top_features = {}
for method, metrics in correlation_results.items():
    sorted_metrics = sorted(metrics.items(), key=lambda x: abs(x[1]['correlation']), reverse=True)
    top_features[method] = sorted_metrics[:5]
print("Top 5 Features with Highest Correlation:")
for method, features in top_features.items():
    print(f"Method: {method}")
    for feature, result in features:
        print(f"  {feature}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")
    

Top 5 Features with Highest Correlation:
Method: POS
  MeanNN: Correlation = 0.7235, p-value = 0.0015
  pNN50: Correlation = 0.3590, p-value = 0.1721
  HF: Correlation = -0.2292, p-value = 0.3932
  SDNN: Correlation = 0.1640, p-value = 0.5923
  RMSSD: Correlation = 0.1025, p-value = 0.7390


Method: LGI
  MeanNN: Correlation = 0.7095, p-value = 0.0021
  pNN50: Correlation = 0.4299, p-value = 0.0965
  RMSSD: Correlation = 0.2256, p-value = 0.4379
  SDNN: Correlation = 0.2235, p-value = 0.4424
  HF: Correlation = -0.2178, p-value = 0.4179


Method: OMIT
  MeanNN: Correlation = 0.7345, p-value = 0.0012
  pNN50: Correlation = 0.4256, p-value = 0.1002
  HF: Correlation = -0.2604, p-value = 0.3300
  RMSSD: Correlation = 0.2294, p-value = 0.4301
  SDNN: Correlation = 0.2206, p-value = 0.4485


Method: GREEN
  MeanNN: Correlation = 0.7550, p-value = 0.0011
  RMSSD: Correlation = 0.4113, p-value = 0.1277
  SDNN: Correlation = 0.3764, p-value = 0.1667
  pNN50: Correlation = 0.2461, p-value = 0.3

### Check the Bland-Altman, to see the mean bias nad the interlva of the Limit of Aggrement, make sure the point fall within the LoA

In [160]:
# # Check the value of the rPPG and GT with the Bland-Altman plot and 
# # see the measurement agreement between the rPPG methods and the ground truth

# def plot_bland_altman(rppg_values, gt_values, method, metric):
#     """ Plot Bland-Altman plot for rPPG values against ground truth values """
#     mean_diff = np.mean(rppg_values - gt_values)
#     std_diff = np.std(rppg_values - gt_values)

#     plt.figure(figsize=(10, 6))
#     plt.scatter((rppg_values + gt_values) / 2, rppg_values - gt_values, alpha=0.5)
#     plt.axhline(mean_diff, color='red', linestyle='--', label='Mean Difference')
#     plt.axhline(mean_diff + 1.96 * std_diff, color='green', linestyle='--', label='Upper Limit of Agreement')
#     plt.axhline(mean_diff - 1.96 * std_diff, color='blue', linestyle='--', label='Lower Limit of Agreement')
    
#     plt.title(f'Bland-Altman Plot: {method} - {metric}')
#     plt.xlabel('Mean of rPPG and GT Values')
#     plt.ylabel('Difference (rPPG - GT)')
#     plt.legend()
#     plt.grid()
#     plt.show()

# # Plot Bland-Altman plots for each method and metric
# for method in rppg_hrv_metrics.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         for subject_id in rppg_hrv_metrics[method].keys():
#             # Use hrv_means for the rPPG values
#             if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
#                 rppg_values.append(hrv_means[method][subject_id][metric])
            
#             # For ground truth, get the first value from the list or calculate mean
#             if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
#                 if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
#                     gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

#         if len(rppg_values) > 0 and len(gt_values) > 0:
#             plot_bland_altman(np.array(rppg_values), np.array(gt_values), method, metric)


In [161]:
## Calculate the mean bias, average, standard deviation and the interval of the LOA
## Put inside the table and show the results
## Calculate the LoA percentage for each method and metric and see if the percentage is within 20% difference

def calculate_bland_altman_stats(rppg_values, gt_values):
    """ Calculate the Bland-Altman statistics """
    mean_diff = np.mean(rppg_values - gt_values)
    std_diff = np.std(rppg_values - gt_values)
    
    upper_limit = mean_diff + 1.96 * std_diff
    lower_limit = mean_diff - 1.96 * std_diff
    
    return mean_diff, std_diff, upper_limit, lower_limit

def calculate_percentage_difference(rppg_values, gt_values):
    """ Calculate the percentage difference between rPPG and ground truth values """
    percentage_diff = np.abs((rppg_values - gt_values) / gt_values) * 100
    return np.mean(percentage_diff)

# Prepare the results table
results_table = []  
for method in rppg_hrv_metrics.keys():
    for metric in hrv_metrics.keys():
        rppg_values = []
        gt_values = []

        for subject_id in rppg_hrv_metrics[method].keys():
            # Use hrv_means for the rPPG values
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                rppg_values.append(hrv_means[method][subject_id][metric])
            
            # For ground truth, get the first value from the list or calculate mean
            if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
                    gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

        if len(rppg_values) > 0 and len(gt_values) > 0:
            mean_diff, std_diff, upper_limit, lower_limit = calculate_bland_altman_stats(np.array(rppg_values), np.array(gt_values))
            percentage_diff = calculate_percentage_difference(np.array(rppg_values), np.array(gt_values))

            results_table.append({
                'Method': method,
                'Metric': metric,
                'Mean Average': np.mean(rppg_values),
                'Ground Truth Average': np.mean(gt_values),
                'Mean Difference': mean_diff,
                'Standard Deviation': std_diff,
                'Upper Limit of Agreement': upper_limit,
                'Lower Limit of Agreement': lower_limit,
                'Percentage Difference': percentage_diff
            })
# Convert results to DataFrame for better visualization
results_df = pd.DataFrame(results_table)
# Display the results
print("\nBland-Altman Results:")
print(results_df)



Bland-Altman Results:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN    765.657957            779.046092       -13.388135   
1     POS    SDNN    173.184279            146.760900        26.423379   
2     POS   RMSSD    230.450384            168.140605        62.309779   
3     POS   pNN50     71.216949             44.797938        26.419011   
4     POS      LF           NaN              0.030068              NaN   
5     POS      HF      0.112195              0.040887         0.071308   
6     POS   LF_HF           NaN              0.972035              NaN   
7     LGI  MeanNN    768.107060            779.046092       -10.939032   
8     LGI    SDNN    152.580982            146.760900         5.820082   
9     LGI   RMSSD    196.598409            168.140605        28.457803   
10    LGI   pNN50     63.557601             44.797938        18.759663   
11    LGI      LF           NaN              0.030068              NaN   
12    LGI      

In [162]:
### Calculate which methods are within 20% difference and the best in terms of minimal percentage difference
within_20_percent = results_df[results_df['Percentage Difference'] <= 20]
print("\nMethods within 20% difference:")
print(within_20_percent)


Methods within 20% difference:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN    765.657957            779.046092       -13.388135   
7     LGI  MeanNN    768.107060            779.046092       -10.939032   
14   OMIT  MeanNN    765.534213            779.046092       -13.511878   
21  GREEN  MeanNN    818.508109            779.046092        39.462017   
28  CHROM  MeanNN    764.312160            779.046092       -14.733931   

    Standard Deviation  Upper Limit of Agreement  Lower Limit of Agreement  \
0            94.171024                171.187073               -197.963342   
7            96.798399                178.785830               -200.663893   
14           92.745381                168.269068               -195.292824   
21          157.637692                348.431893               -269.507859   
28          104.553886                190.191684               -219.659547   

    Percentage Difference  
0                8.224618 

### Conclussion : 30 Seconds window

The study correlation within the 30 seconds rppg hrv metrics compare to the GT shows weak / moderate relation with the GT.

Using the bland-altman itself it shows one feature. The MeanNN (time it takes between each heart beat) have acceptable agreement with the reference based on your 20% threshold.

---

# 1 Minute Plot Correlation

For 1 minute window, the averaging purpose will be done under windowing each short rPPG segment with the **strides** of 30 seconds (means the different between each short window is 30 seconds).

The test will be done under certain scenario of the Task 1, Task 2 UBFC, Physio Rest 2 and Rest 6

In [163]:
root_path = "UBFC-Phys"
subjects = ["s41", "s42", "s43", "s44","s45","s46","s47","s48","s49","s50","s51","s52", "s53","s54","s55","s56"]
tasks = ["T1"]

# Store ground truth and rPPG data
gt_data = {}
rppg_data = {
    'POS': {},
    'LGI': {},
    'OMIT': {},
    'GREEN': {},
    'CHROM': {}
}
# Expected sampling rates (adjust if different for your dataset)
sample_rate_gt = 64  # Hz
sample_rate_video = 35 # Hz


In [164]:
## Process for each subject and task
for subject in subjects:
    for task in tasks:
        subject_task_id = f"{subject}_{task}"

        # Load rPPG signals from different methods
        pos = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_POS_rppg.npy"))
        lgi = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_LGI_rppg.npy"))
        omit = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_OMIT_rppg.npy"))
        green = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_GREEN_rppg.npy"))
        chrom = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_CHROM_rppg.npy"))

        # Load ground truth BVP
        GT = pd.read_csv(os.path.join(root_path, subject, f"bvp_{subject}_{task}.csv")).values
        GT = GT.flatten()

        ## process rPPG signals
        rppg_data["POS"][subject_task_id] = preprocess_ppg(pos, fs=sample_rate_video)
        rppg_data["LGI"][subject_task_id] = preprocess_ppg(lgi, fs=sample_rate_video)
        rppg_data["OMIT"][subject_task_id] = preprocess_ppg(omit, fs=sample_rate_video)
        rppg_data["GREEN"][subject_task_id] = preprocess_ppg(green, fs=sample_rate_video)
        rppg_data["CHROM"][subject_task_id] = preprocess_ppg(chrom, fs=sample_rate_video)
        
        GT = preprocess_ppg(GT, fs=sample_rate_gt)
        gt_data[subject_task_id] = GT

print(f"Done Process the Signals")
    

Done Process the Signals


In [165]:
"""
Steps to reproduce getting the short term of 30 seconds for each subject + averaging:
1. Loop through each subject.
2. For each short rppg segment (30 seconds), compute the hrv metrics with the neurokit2 package and store it.
3. Average the HRV metrics across all segments for each subject.
4. Compare the correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics.
# Note: The above code is a preprocessing step. The next steps would involve calculating HRV metrics and performing correlation analysis.
""" 

## Iterate for each subject and compute HRV metrics
hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
    'SD1': [],
    'SD2': [],
}

## Store the HRV metrics for each rPPG method for each subject
rppg_hrv_metrics = {
    method: {
        subject_id: {
            key: [] for key in hrv_metrics.keys()
        } for subject_id in rppg_data[method].keys()
    } for method in rppg_data.keys()
}

## Iterate through each subject and compute HRV for each segments
for rppg_method in rppg_data.keys():
    for subject_task_id, rppg_signal in rppg_data[rppg_method].items():
        print(f"Processing {subject_task_id} for {rppg_method}")

        ## Applied the window of 30 seconds with stride of 15 seconds
        segment_length = 60 * desired_sample_rate
        stride_length = 30 * desired_sample_rate
        
        ## Making the segments
        for start in range(0, len(rppg_signal) - segment_length + 1, stride_length):
            segment = rppg_signal[start:start + segment_length]
            ## If the segment is less than the segment length, we skip it
            if len(segment) < segment_length:
                continue

            ## Compute the HRV metrics using neurokit2
            signals, _ = nk.ppg_process(segment, sampling_rate=desired_sample_rate)
            peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=sample_rate_video)

            # Getting the HRV Metrics

            ## Time Domain
            hrv_time = nk.hrv_time(peaks, sampling_rate=desired_sample_rate)

            ## Add into the hrv_metrics dictionary
            rppg_hrv_metrics[rppg_method][subject_task_id]['MeanNN'].append(hrv_time['HRV_MeanNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SDNN'].append(hrv_time['HRV_SDNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['RMSSD'].append(hrv_time['HRV_RMSSD'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['pNN50'].append(hrv_time['HRV_pNN50'])

            ## Frequency Domain
            hrv_freq = nk.hrv_frequency(peaks, sampling_rate=desired_sample_rate, psd_method="welch")
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF'].append(hrv_freq['HRV_LF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['HF'].append(hrv_freq['HRV_HF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF_HF'].append(hrv_freq['HRV_LFHF'])

            ## Non-Linear Domain
            hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=desired_sample_rate)
            rppg_hrv_metrics[rppg_method][subject_task_id]['SD1'].append(hrv_non_linear['HRV_SD1'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SD2'].append(hrv_non_linear['HRV_SD2'])

Processing s41_T1 for POS
Processing s42_T1 for POS
Processing s43_T1 for POS
Processing s44_T1 for POS
Processing s45_T1 for POS
Processing s46_T1 for POS
Processing s47_T1 for POS
Processing s48_T1 for POS
Processing s49_T1 for POS
Processing s50_T1 for POS
Processing s51_T1 for POS
Processing s52_T1 for POS
Processing s53_T1 for POS
Processing s54_T1 for POS
Processing s55_T1 for POS
Processing s56_T1 for POS
Processing s41_T1 for LGI
Processing s42_T1 for LGI
Processing s43_T1 for LGI
Processing s44_T1 for LGI
Processing s45_T1 for LGI
Processing s46_T1 for LGI
Processing s47_T1 for LGI
Processing s48_T1 for LGI
Processing s49_T1 for LGI
Processing s50_T1 for LGI
Processing s51_T1 for LGI
Processing s52_T1 for LGI
Processing s53_T1 for LGI
Processing s54_T1 for LGI
Processing s55_T1 for LGI
Processing s56_T1 for LGI
Processing s41_T1 for OMIT
Processing s42_T1 for OMIT
Processing s43_T1 for OMIT
Processing s44_T1 for OMIT
Processing s45_T1 for OMIT
Processing s46_T1 for OMIT
Proces

In [166]:
### Calculate the average HRV metrics for each segment for each subject per method

hrv_means = {}
for method in rppg_hrv_metrics:
    hrv_means[method] = {}

    for subject in rppg_hrv_metrics[method]:
        hrv_means[method][subject] = {}

        for metric, values in rppg_hrv_metrics[method][subject].items():
            if values:
                hrv_means[method][subject][metric] = np.mean(values)
            else:
                hrv_means[method][subject][metric] = np.nan

print(hrv_means)

{'POS': {'s41_T1': {'MeanNN': 641.1774334463973, 'SDNN': 147.8118722388765, 'RMSSD': 211.46102191481495, 'pNN50': 62.98622674478364, 'LF': 0.02571113386568542, 'HF': 0.12501250256871416, 'LF_HF': 0.22169564114831744, 'SD1': 150.3153695531522, 'SD2': 140.4602223687617}, 's42_T1': {'MeanNN': 810.8483157070114, 'SDNN': 185.39675281680417, 'RMSSD': 233.5127668542262, 'pNN50': 71.70606362536083, 'LF': 0.027794776954217358, 'HF': 0.10531754066934682, 'LF_HF': 0.2766610818732813, 'SD1': 166.25029247739764, 'SD2': 201.9799237593209}, 's43_T1': {'MeanNN': 653.5648665266477, 'SDNN': 154.60557897893676, 'RMSSD': 218.82445644442336, 'pNN50': 67.03816987126945, 'LF': 0.04209285111517554, 'HF': 0.0919815130423132, 'LF_HF': 0.5028995860341092, 'SD1': 155.59450232764027, 'SD2': 153.40800720987}, 's44_T1': {'MeanNN': 832.6808260327464, 'SDNN': 274.59081894819013, 'RMSSD': 339.66609134510435, 'pNN50': 85.28133460618827, 'LF': 0.04137479753978425, 'HF': 0.10085478223815918, 'LF_HF': 0.44851280986376574, 

### Getting the GT HRV Metrics

In [167]:
# Compare the Correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics

## Getting the ground truth HRV metrics

gt_hrv_metrics = {
    subject_id: {
        key: [] for key in hrv_metrics.keys()
    } for subject_id in gt_data.keys()
}

# Iterate through each subject and compute the full length HRV metrics for the ground truth
for subject_task_id, gt_signal in gt_data.items():
    print(f"Processing {subject_task_id} for ground truth")

    ## Compute the HRV metrics using neurokit2
    signals, _ = nk.ppg_process(gt_signal, sampling_rate=desired_sample_rate)
    peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=desired_sample_rate)

    # Getting the HRV Metrics

    ## Time Domain
    hrv_time = nk.hrv_time(peaks, sampling_rate=desired_sample_rate)

    ## Add into the hrv_metrics dictionary
    gt_hrv_metrics[subject_task_id]['MeanNN'] = (hrv_time['HRV_MeanNN'])
    gt_hrv_metrics[subject_task_id]['SDNN'] = (hrv_time['HRV_SDNN'])
    gt_hrv_metrics[subject_task_id]['RMSSD'] = (hrv_time['HRV_RMSSD'])
    gt_hrv_metrics[subject_task_id]['pNN50'] = (hrv_time['HRV_pNN50'])

    ## Frequency Domain
    hrv_freq = nk.hrv_frequency(peaks, sampling_rate=desired_sample_rate, psd_method="welch")
    gt_hrv_metrics[subject_task_id]['LF'] = (hrv_freq['HRV_LF'])
    gt_hrv_metrics[subject_task_id]['HF'] = (hrv_freq['HRV_HF'])
    gt_hrv_metrics[subject_task_id]['LF_HF'] = (hrv_freq['HRV_LFHF'])

    ## Non-Linear Domain
    hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=desired_sample_rate)
    gt_hrv_metrics[subject_task_id]['SD1'] = (hrv_non_linear['HRV_SD1'])
    gt_hrv_metrics[subject_task_id]['SD2'] = (hrv_non_linear['HRV_SD2'])



Processing s41_T1 for ground truth
Processing s42_T1 for ground truth
Processing s43_T1 for ground truth
Processing s44_T1 for ground truth
Processing s45_T1 for ground truth
Processing s46_T1 for ground truth
Processing s47_T1 for ground truth
Processing s48_T1 for ground truth
Processing s49_T1 for ground truth
Processing s50_T1 for ground truth
Processing s51_T1 for ground truth
Processing s52_T1 for ground truth
Processing s53_T1 for ground truth
Processing s54_T1 for ground truth
Processing s55_T1 for ground truth
Processing s56_T1 for ground truth


### Since we already get the Metrics HRV value of the rPPG, let's compare it with the GT to see the correlation

In [168]:
# First thing first is we need to remove the outlier from rppg, 
# and make to remove the same subjects from the ground truth as well
# Process of removing the outlier itself, is also done under the IQR method
def remove_outliers_iqr(data):
    """ Remove outliers using the IQR method.
    
    Parameters:
    ----------
    data (list or numpy array): The data from which to remove outliers.
    
    Returns:
    --------
    numpy array: Data with outliers removed.
    """
    data = np.asarray(data)  
    
    if len(data) == 0:
        return np.array([])

    q1 = np.percentile(data, 25)
    q3 = np.percentile(data, 75)
    iqr = q3 - q1
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr

    return np.array([x for x in data if lower_bound <= x <= upper_bound])

# Compute correlation between rPPG methods and ground truth HRV metrics
correlation_results = {}

for method in hrv_means.keys():
    correlation_results[method] = {}
    
    for metric in hrv_metrics.keys():
        # Collect all values for this metric across subjects
        all_metric_values = []
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value):
                    all_metric_values.append(value)
        
        # Remove outlier subjects for this metric
        cleaned_values = remove_outliers_iqr(all_metric_values)
        
        # Prepare data for correlation
        rppg_values = []
        gt_values = []
        
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value) and value in cleaned_values:
                    # Subject is not an outlier, include in analysis
                    rppg_values.append(value)
                    
                    # Add corresponding ground truth
                    if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                        if not gt_hrv_metrics[subject_id][metric].empty:
                            gt_value = gt_hrv_metrics[subject_id][metric][0] if isinstance(gt_hrv_metrics[subject_id][metric], pd.Series) else gt_hrv_metrics[subject_id][metric]
                            gt_values.append(gt_value)
        
        # Calculate correlation
        if len(rppg_values) > 1 and len(gt_values) > 1:
            correlation, p_value = stats.pearsonr(rppg_values, gt_values)
            correlation_results[method][metric] = {
                'correlation': correlation,
                'p_value': p_value,
                'n_subjects': len(rppg_values)
            }

In [169]:
## Print the correlation results
for method, metrics in correlation_results.items():
    print(f"Method: {method}")
    for metric, result in metrics.items():
        print(f"  {metric}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")


Method: POS
  MeanNN: Correlation = 0.8630, p-value = 0.0001
  SDNN: Correlation = 0.2858, p-value = 0.3219
  RMSSD: Correlation = 0.2295, p-value = 0.4299
  pNN50: Correlation = 0.4195, p-value = 0.1058
  LF: Correlation = -0.0906, p-value = 0.7387
  HF: Correlation = -0.4497, p-value = 0.1652
  LF_HF: Correlation = 0.1740, p-value = 0.5520
  SD1: Correlation = 0.2302, p-value = 0.4284
  SD2: Correlation = 0.2967, p-value = 0.3030


Method: LGI
  MeanNN: Correlation = 0.8156, p-value = 0.0002
  SDNN: Correlation = 0.2791, p-value = 0.3339
  RMSSD: Correlation = 0.2372, p-value = 0.4142
  pNN50: Correlation = 0.4395, p-value = 0.0885
  LF: Correlation = -0.0793, p-value = 0.7788
  HF: Correlation = 0.0157, p-value = 0.9539
  LF_HF: Correlation = 0.3309, p-value = 0.2283
  SD1: Correlation = 0.2376, p-value = 0.4134
  SD2: Correlation = 0.2962, p-value = 0.3038


Method: OMIT
  MeanNN: Correlation = 0.8354, p-value = 0.0001
  SDNN: Correlation = 0.3611, p-value = 0.1860
  RMSSD: Correla

In [170]:
# Calculate the top 5 features with the highest correlation for each rPPG method
top_features = {}
for method, metrics in correlation_results.items():
    sorted_metrics = sorted(metrics.items(), key=lambda x: abs(x[1]['correlation']), reverse=True)
    top_features[method] = sorted_metrics[:5]
print("Top 5 Features with Highest Correlation:")
for method, features in top_features.items():
    print(f"Method: {method}")
    for feature, result in features:
        print(f"  {feature}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")
    

Top 5 Features with Highest Correlation:
Method: POS
  MeanNN: Correlation = 0.8630, p-value = 0.0001
  HF: Correlation = -0.4497, p-value = 0.1652
  pNN50: Correlation = 0.4195, p-value = 0.1058
  SD2: Correlation = 0.2967, p-value = 0.3030
  SDNN: Correlation = 0.2858, p-value = 0.3219


Method: LGI
  MeanNN: Correlation = 0.8156, p-value = 0.0002
  pNN50: Correlation = 0.4395, p-value = 0.0885
  LF_HF: Correlation = 0.3309, p-value = 0.2283
  SD2: Correlation = 0.2962, p-value = 0.3038
  SDNN: Correlation = 0.2791, p-value = 0.3339


Method: OMIT
  MeanNN: Correlation = 0.8354, p-value = 0.0001
  pNN50: Correlation = 0.4434, p-value = 0.0854
  SDNN: Correlation = 0.3611, p-value = 0.1860
  LF_HF: Correlation = 0.3558, p-value = 0.1931
  SD2: Correlation = 0.3492, p-value = 0.2020


Method: GREEN
  HF: Correlation = -0.6227, p-value = 0.0100
  LF: Correlation = -0.4947, p-value = 0.0721
  LF_HF: Correlation = 0.4686, p-value = 0.1063
  MeanNN: Correlation = 0.4370, p-value = 0.1033
 

---

### Check the Bland-Altman, to see the mean bias nad the interlva of the Limit of Aggrement, make sure the point fall within the LoA

In [171]:
# # Check the value of the rPPG and GT with the Bland-Altman plot and 
# # see the measurement agreement between the rPPG methods and the ground truth

# def plot_bland_altman(rppg_values, gt_values, method, metric):
#     """ Plot Bland-Altman plot for rPPG values against ground truth values """
#     mean_diff = np.mean(rppg_values - gt_values)
#     std_diff = np.std(rppg_values - gt_values)

#     plt.figure(figsize=(10, 6))
#     plt.scatter((rppg_values + gt_values) / 2, rppg_values - gt_values, alpha=0.5)
#     plt.axhline(mean_diff, color='red', linestyle='--', label='Mean Difference')
#     plt.axhline(mean_diff + 1.96 * std_diff, color='green', linestyle='--', label='Upper Limit of Agreement')
#     plt.axhline(mean_diff - 1.96 * std_diff, color='blue', linestyle='--', label='Lower Limit of Agreement')
    
#     plt.title(f'Bland-Altman Plot: {method} - {metric}')
#     plt.xlabel('Mean of rPPG and GT Values')
#     plt.ylabel('Difference (rPPG - GT)')
#     plt.legend()
#     plt.grid()
#     plt.show()

# # Plot Bland-Altman plots for each method and metric
# for method in rppg_hrv_metrics.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         for subject_id in rppg_hrv_metrics[method].keys():
#             # Use hrv_means for the rPPG values
#             if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
#                 rppg_values.append(hrv_means[method][subject_id][metric])
            
#             # For ground truth, get the first value from the list or calculate mean
#             if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
#                 if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
#                     gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

#         if len(rppg_values) > 0 and len(gt_values) > 0:
#             plot_bland_altman(np.array(rppg_values), np.array(gt_values), method, metric)


In [172]:
## Calculate the mean bias, average, standard deviation and the interval of the LOA
## Put inside the table and show the results
## Calculate the LoA percentage for each method and metric and see if the percentage is within 20% difference

def calculate_bland_altman_stats(rppg_values, gt_values):
    """ Calculate the Bland-Altman statistics """
    mean_diff = np.mean(rppg_values - gt_values)
    std_diff = np.std(rppg_values - gt_values)
    
    upper_limit = mean_diff + 1.96 * std_diff
    lower_limit = mean_diff - 1.96 * std_diff
    
    return mean_diff, std_diff, upper_limit, lower_limit

def calculate_percentage_difference(rppg_values, gt_values):
    """ Calculate the percentage difference between rPPG and ground truth values """
    percentage_diff = np.abs((rppg_values - gt_values) / gt_values) * 100
    return np.mean(percentage_diff)

# Prepare the results table
results_table = []  
for method in rppg_hrv_metrics.keys():
    for metric in hrv_metrics.keys():
        rppg_values = []
        gt_values = []

        for subject_id in rppg_hrv_metrics[method].keys():
            # Use hrv_means for the rPPG values
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                rppg_values.append(hrv_means[method][subject_id][metric])
            
            # For ground truth, get the first value from the list or calculate mean
            if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
                    gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

        if len(rppg_values) > 0 and len(gt_values) > 0:
            mean_diff, std_diff, upper_limit, lower_limit = calculate_bland_altman_stats(np.array(rppg_values), np.array(gt_values))
            percentage_diff = calculate_percentage_difference(np.array(rppg_values), np.array(gt_values))

            results_table.append({
                'Method': method,
                'Metric': metric,
                'Mean Average': np.mean(rppg_values),
                'Ground Truth Average': np.mean(gt_values),
                'Mean Difference': mean_diff,
                'Standard Deviation': std_diff,
                'Upper Limit of Agreement': upper_limit,
                'Lower Limit of Agreement': lower_limit,
                'Percentage Difference': percentage_diff
            })
# Convert results to DataFrame for better visualization
results_df = pd.DataFrame(results_table)
# Display the results
print("\nBland-Altman Results:")
print(results_df)



Bland-Altman Results:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN    803.957819            779.046092        24.911727   
1     POS    SDNN    279.786139            146.760900       133.025239   
2     POS   RMSSD    375.886551            168.140605       207.745946   
3     POS   pNN50     72.965638             44.797938        28.167700   
4     POS      LF      0.034292              0.030068         0.004225   
5     POS      HF      0.101735              0.040887         0.060848   
6     POS   LF_HF      0.649824              0.972035        -0.322211   
7     POS     SD1    268.902640            119.180134       149.722506   
8     POS     SD2    290.005165            168.125976       121.879189   
9     LGI  MeanNN    800.994797            779.046092        21.948705   
10    LGI    SDNN    228.469697            146.760900        81.708797   
11    LGI   RMSSD    293.244008            168.140605       125.103403   
12    LGI   pNN

In [173]:
### Calculate which methods are within 20% difference and the best in terms of minimal percentage difference
within_20_percent = results_df[results_df['Percentage Difference'] <= 20]
print("\nMethods within 20% difference:")
print(within_20_percent)


Methods within 20% difference:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN    803.957819            779.046092        24.911727   
9     LGI  MeanNN    800.994797            779.046092        21.948705   
18   OMIT  MeanNN    798.181830            779.046092        19.135738   
36  CHROM  MeanNN    791.102197            779.046092        12.056105   

    Standard Deviation  Upper Limit of Agreement  Lower Limit of Agreement  \
0           198.407677                413.790775               -363.967320   
9           202.006689                417.881816               -373.984406   
18          200.238379                411.602961               -373.331484   
36          202.204692                408.377301               -384.265091   

    Percentage Difference  
0               14.491225  
9               13.233113  
18              12.845495  
36              13.091281  


### Conclussion : 1 Minute Window

Stuff

In [174]:
## Store the rPPG hrv metrics into the csv
output_path = "rest_rppg_hrv_metrics.csv"

## Convert the feature of the CHROM within the HRV Means to be the DataFrame
#   MeanNN: Correlation = 0.6109, p-value = 0.0119
#   SD1: Correlation = 0.5190, p-value = 0.0474
#   RMSSD: Correlation = 0.5185, p-value = 0.0477
#   LF: Correlation = 0.3975, p-value = 0.1423
#   SDNN: Correlation = 0.3676, p-value = 0.1959
## Take only the CHROM method and the MeanNN, SD1, RMSSD, LF, SDNN
chrom_hrv_metrics = {
    'MeanNN': [],
    'SD1': [],
    'RMSSD': [],
    'LF': [],
    'SDNN': []
}

for subject_id in hrv_means['CHROM'].keys():
    chrom_hrv_metrics['MeanNN'].append(hrv_means['CHROM'][subject_id]['MeanNN'])
    chrom_hrv_metrics['SD1'].append(hrv_means['CHROM'][subject_id]['SD1'])
    chrom_hrv_metrics['RMSSD'].append(hrv_means['CHROM'][subject_id]['RMSSD'])
    chrom_hrv_metrics['LF'].append(hrv_means['CHROM'][subject_id]['LF'])
    chrom_hrv_metrics['SDNN'].append(hrv_means['CHROM'][subject_id]['SDNN'])

## Convert the chrom_hrv_metrics to a DataFrame
chrom_df = pd.DataFrame(chrom_hrv_metrics)

## Add label Rest to the dataFrame
chrom_df['Label'] = 'Rest'

chrom_df.head()

## Save the DataFrame to a CSV file
chrom_df.to_csv(output_path, index=False)

---

# 2 Minute Plot Correlation

For 2 minute window, the averaging purpose will be done under windowing each short rPPG segment with the **strides** of 60 seconds (means the different between each short window is 60 seconds).

The test will be done under certain scenario of the Task 1, Task 2 UBFC, Physio Rest 2 and Rest 6

In [175]:
root_path = "UBFC-Phys"
subjects = ["s41", "s42", "s43", "s44","s45","s46","s47","s48","s49","s50","s51","s52", "s53","s54","s55","s56"]
tasks = ["T1"]

# Store ground truth and rPPG data
gt_data = {}
rppg_data = {
    'POS': {},
    'LGI': {},
    'OMIT': {},
    'GREEN': {},
    'CHROM': {}
}
# Expected sampling rates (adjust if different for your dataset)
sample_rate_gt = 64  # Hz
sample_rate_video = 35 # Hz


In [176]:
## Process for each subject and task
for subject in subjects:
    for task in tasks:
        subject_task_id = f"{subject}_{task}"

        # Load rPPG signals from different methods
        pos = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_POS_rppg.npy"))
        lgi = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_LGI_rppg.npy"))
        omit = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_OMIT_rppg.npy"))
        green = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_GREEN_rppg.npy"))
        chrom = np.load(os.path.join(root_path, subject, f"Landmark_{subject}_{task}_CHROM_rppg.npy"))

        # Load ground truth BVP
        GT = pd.read_csv(os.path.join(root_path, subject, f"bvp_{subject}_{task}.csv")).values
        GT = GT.flatten()

        ## process rPPG signals
        rppg_data["POS"][subject_task_id] = preprocess_ppg(pos, fs=sample_rate_video)
        rppg_data["LGI"][subject_task_id] = preprocess_ppg(lgi, fs=sample_rate_video)
        rppg_data["OMIT"][subject_task_id] = preprocess_ppg(omit, fs=sample_rate_video)
        rppg_data["GREEN"][subject_task_id] = preprocess_ppg(green, fs=sample_rate_video)
        rppg_data["CHROM"][subject_task_id] = preprocess_ppg(chrom, fs=sample_rate_video)
        
        GT = preprocess_ppg(GT, fs=sample_rate_gt)
        gt_data[subject_task_id] = GT

print(f"Done Process the Signals")
    

Done Process the Signals


In [177]:
"""
Steps to reproduce getting the short term of 30 seconds for each subject + averaging:
1. Loop through each subject.
2. For each short rppg segment (30 seconds), compute the hrv metrics with the neurokit2 package and store it.
3. Average the HRV metrics across all segments for each subject.
4. Compare the correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics.
# Note: The above code is a preprocessing step. The next steps would involve calculating HRV metrics and performing correlation analysis.
""" 

## Iterate for each subject and compute HRV metrics
hrv_metrics = {
    'MeanNN': [],
    'SDNN': [],
    'RMSSD': [],
    'pNN50': [],
    'LF': [],
    'HF': [],
    'LF_HF': [],
    'SD1': [],
    'SD2': [],
}

## Store the HRV metrics for each rPPG method for each subject
rppg_hrv_metrics = {
    method: {
        subject_id: {
            key: [] for key in hrv_metrics.keys()
        } for subject_id in rppg_data[method].keys()
    } for method in rppg_data.keys()
}

## Iterate through each subject and compute HRV for each segments
for rppg_method in rppg_data.keys():
    for subject_task_id, rppg_signal in rppg_data[rppg_method].items():
        print(f"Processing {subject_task_id} for {rppg_method}")

        ## Applied the window of 30 seconds with stride of 15 seconds
        segment_length = 120 * desired_sample_rate
        stride_length = 60 * desired_sample_rate
        
        ## Making the segments
        for start in range(0, len(rppg_signal) - segment_length + 1, stride_length):
            segment = rppg_signal[start:start + segment_length]
            ## If the segment is less than the segment length, we skip it
            if len(segment) < segment_length:
                continue

            ## Compute the HRV metrics using neurokit2
            signals, _ = nk.ppg_process(segment, sampling_rate=desired_sample_rate)
            peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=desired_sample_rate)

            # Getting the HRV Metrics

            ## Time Domain
            hrv_time = nk.hrv_time(peaks, sampling_rate=desired_sample_rate)

            ## Add into the hrv_metrics dictionary
            rppg_hrv_metrics[rppg_method][subject_task_id]['MeanNN'].append(hrv_time['HRV_MeanNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SDNN'].append(hrv_time['HRV_SDNN'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['RMSSD'].append(hrv_time['HRV_RMSSD'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['pNN50'].append(hrv_time['HRV_pNN50'])

            ## Frequency Domain
            hrv_freq = nk.hrv_frequency(peaks, sampling_rate=desired_sample_rate, psd_method="welch")
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF'].append(hrv_freq['HRV_LF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['HF'].append(hrv_freq['HRV_HF'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['LF_HF'].append(hrv_freq['HRV_LFHF'])

            ## Non-Linear Domain
            hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=desired_sample_rate)
            rppg_hrv_metrics[rppg_method][subject_task_id]['SD1'].append(hrv_non_linear['HRV_SD1'])
            rppg_hrv_metrics[rppg_method][subject_task_id]['SD2'].append(hrv_non_linear['HRV_SD2'])

Processing s41_T1 for POS
Processing s42_T1 for POS
Processing s43_T1 for POS
Processing s44_T1 for POS
Processing s45_T1 for POS
Processing s46_T1 for POS
Processing s47_T1 for POS
Processing s48_T1 for POS
Processing s49_T1 for POS
Processing s50_T1 for POS
Processing s51_T1 for POS
Processing s52_T1 for POS
Processing s53_T1 for POS
Processing s54_T1 for POS
Processing s55_T1 for POS
Processing s56_T1 for POS
Processing s41_T1 for LGI
Processing s42_T1 for LGI
Processing s43_T1 for LGI
Processing s44_T1 for LGI
Processing s45_T1 for LGI
Processing s46_T1 for LGI
Processing s47_T1 for LGI
Processing s48_T1 for LGI
Processing s49_T1 for LGI
Processing s50_T1 for LGI
Processing s51_T1 for LGI
Processing s52_T1 for LGI
Processing s53_T1 for LGI
Processing s54_T1 for LGI
Processing s55_T1 for LGI
Processing s56_T1 for LGI
Processing s41_T1 for OMIT
Processing s42_T1 for OMIT
Processing s43_T1 for OMIT
Processing s44_T1 for OMIT
Processing s45_T1 for OMIT
Processing s46_T1 for OMIT
Proces

In [178]:
### Calculate the average HRV metrics for each segment for each subject per method

hrv_means = {}
for method in rppg_hrv_metrics:
    hrv_means[method] = {}

    for subject in rppg_hrv_metrics[method]:
        hrv_means[method][subject] = {}

        for metric, values in rppg_hrv_metrics[method][subject].items():
            if values:
                hrv_means[method][subject][metric] = np.mean(values)
            else:
                hrv_means[method][subject][metric] = np.nan

print(hrv_means)

{'POS': {'s41_T1': {'MeanNN': 642.8243243243244, 'SDNN': 137.47800169924847, 'RMSSD': 192.57551305948596, 'pNN50': 64.86486486486487, 'LF': 0.04172197611224074, 'HF': 0.1138506184564474, 'LF_HF': 0.37072980609556094, 'SD1': 136.53106985169404, 'SD2': 135.7228480181879}, 's42_T1': {'MeanNN': 848.3376419302515, 'SDNN': 167.3560727150577, 'RMSSD': 213.2177286843795, 'pNN50': 69.04653284671534, 'LF': 0.026898770625759405, 'HF': 0.07990595085180345, 'LF_HF': 0.32981848489751064, 'SD1': 151.30151689956637, 'SD2': 180.40350707058687}, 's43_T1': {'MeanNN': 664.4332360792104, 'SDNN': 152.8383560630227, 'RMSSD': 204.68533064410914, 'pNN50': 69.3618474144888, 'LF': 0.058559088037841545, 'HF': 0.08133437202146881, 'LF_HF': 0.7278457473801943, 'SD1': 145.14010199630468, 'SD2': 157.66505583713305}, 's44_T1': {'MeanNN': 860.2300333646126, 'SDNN': 256.3067544555602, 'RMSSD': 324.1914498654444, 'pNN50': 85.55677197372538, 'LF': 0.04490163778567523, 'HF': 0.10521772185474769, 'LF_HF': 0.4270521414111748

### Getting the GT HRV Metrics

In [179]:
# Compare the Correlation between the averaged HRV metrics of the rPPG methods and the ground truth HRV metrics

## Getting the ground truth HRV metrics

gt_hrv_metrics = {
    subject_id: {
        key: [] for key in hrv_metrics.keys()
    } for subject_id in gt_data.keys()
}

# Iterate through each subject and compute the full length HRV metrics for the ground truth
for subject_task_id, gt_signal in gt_data.items():
    print(f"Processing {subject_task_id} for ground truth")

    ## Compute the HRV metrics using neurokit2
    signals, _ = nk.ppg_process(gt_signal, sampling_rate=desired_sample_rate)
    peaks, _ = nk.ppg_peaks(signals["PPG_Clean"], sampling_rate=desired_sample_rate)

    # Getting the HRV Metrics

    ## Time Domain
    hrv_time = nk.hrv_time(peaks, sampling_rate=desired_sample_rate)

    ## Add into the hrv_metrics dictionary
    gt_hrv_metrics[subject_task_id]['MeanNN'] = (hrv_time['HRV_MeanNN'])
    gt_hrv_metrics[subject_task_id]['SDNN'] = (hrv_time['HRV_SDNN'])
    gt_hrv_metrics[subject_task_id]['RMSSD'] = (hrv_time['HRV_RMSSD'])
    gt_hrv_metrics[subject_task_id]['pNN50'] = (hrv_time['HRV_pNN50'])

    ## Frequency Domain
    hrv_freq = nk.hrv_frequency(peaks, sampling_rate=desired_sample_rate, psd_method="welch")
    gt_hrv_metrics[subject_task_id]['LF'] = (hrv_freq['HRV_LF'])
    gt_hrv_metrics[subject_task_id]['HF'] = (hrv_freq['HRV_HF'])
    gt_hrv_metrics[subject_task_id]['LF_HF'] = (hrv_freq['HRV_LFHF'])

    ## Non-Linear Domain
    hrv_non_linear = nk.hrv_nonlinear(peaks, sampling_rate=desired_sample_rate)
    gt_hrv_metrics[subject_task_id]['SD1'] = (hrv_non_linear['HRV_SD1'])
    gt_hrv_metrics[subject_task_id]['SD2'] = (hrv_non_linear['HRV_SD2'])



Processing s41_T1 for ground truth
Processing s42_T1 for ground truth
Processing s43_T1 for ground truth
Processing s44_T1 for ground truth
Processing s45_T1 for ground truth
Processing s46_T1 for ground truth
Processing s47_T1 for ground truth
Processing s48_T1 for ground truth
Processing s49_T1 for ground truth
Processing s50_T1 for ground truth
Processing s51_T1 for ground truth
Processing s52_T1 for ground truth
Processing s53_T1 for ground truth
Processing s54_T1 for ground truth
Processing s55_T1 for ground truth
Processing s56_T1 for ground truth


### Since we already get the Metrics HRV value of the rPPG, let's compare it with the GT to see the correlation

In [180]:
# First thing first is we need to remove the outlier from rppg, 
# and make to remove the same subjects from the ground truth as well
# Process of removing the outlier itself, is also done under the IQR method
def remove_outliers_iqr(data):
    """ Remove outliers using the IQR method.
    
    Parameters:
    ----------
    data (list or numpy array): The data from which to remove outliers.
    
    Returns:
    --------
    numpy array: Data with outliers removed.
    """
    data = np.asarray(data)  
    
    if len(data) == 0:
        return np.array([])

    q1 = np.percentile(data, 25)
    q3 = np.percentile(data, 75)
    iqr = q3 - q1
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr

    return np.array([x for x in data if lower_bound <= x <= upper_bound])

# Compute correlation between rPPG methods and ground truth HRV metrics
correlation_results = {}

for method in hrv_means.keys():
    correlation_results[method] = {}
    
    for metric in hrv_metrics.keys():
        # Collect all values for this metric across subjects
        all_metric_values = []
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value):
                    all_metric_values.append(value)
        
        # Remove outlier subjects for this metric
        cleaned_values = remove_outliers_iqr(all_metric_values)
        
        # Prepare data for correlation
        rppg_values = []
        gt_values = []
        
        for subject_id in hrv_means[method].keys():
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                value = hrv_means[method][subject_id][metric]
                if not np.isnan(value) and value in cleaned_values:
                    # Subject is not an outlier, include in analysis
                    rppg_values.append(value)
                    
                    # Add corresponding ground truth
                    if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                        if not gt_hrv_metrics[subject_id][metric].empty:
                            gt_value = gt_hrv_metrics[subject_id][metric][0] if isinstance(gt_hrv_metrics[subject_id][metric], pd.Series) else gt_hrv_metrics[subject_id][metric]
                            gt_values.append(gt_value)
        
        # Calculate correlation
        if len(rppg_values) > 1 and len(gt_values) > 1:
            correlation, p_value = stats.pearsonr(rppg_values, gt_values)
            correlation_results[method][metric] = {
                'correlation': correlation,
                'p_value': p_value,
                'n_subjects': len(rppg_values)
            }

In [181]:
## Print the correlation results
for method, metrics in correlation_results.items():
    print(f"Method: {method}")
    for metric, result in metrics.items():
        print(f"  {metric}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")


Method: POS
  MeanNN: Correlation = 0.8260, p-value = 0.0001
  SDNN: Correlation = 0.2086, p-value = 0.4940
  RMSSD: Correlation = 0.1088, p-value = 0.7234
  pNN50: Correlation = 0.3320, p-value = 0.2091
  LF: Correlation = 0.2823, p-value = 0.2894
  HF: Correlation = -0.4385, p-value = 0.0893
  LF_HF: Correlation = -0.1073, p-value = 0.6925
  SD1: Correlation = 0.1097, p-value = 0.7213
  SD2: Correlation = 0.2798, p-value = 0.3545


Method: LGI
  MeanNN: Correlation = 0.6053, p-value = 0.0130
  SDNN: Correlation = 0.2735, p-value = 0.3441
  RMSSD: Correlation = 0.2754, p-value = 0.3406
  pNN50: Correlation = 0.4194, p-value = 0.1059
  LF: Correlation = 0.1919, p-value = 0.4764
  HF: Correlation = 0.0648, p-value = 0.8115
  LF_HF: Correlation = 0.3797, p-value = 0.1469
  SD1: Correlation = 0.2758, p-value = 0.3398
  SD2: Correlation = 0.2534, p-value = 0.3820


Method: OMIT
  MeanNN: Correlation = 0.6151, p-value = 0.0112
  SDNN: Correlation = 0.2377, p-value = 0.4131
  RMSSD: Correlat

In [182]:
# Calculate the top 5 features with the highest correlation for each rPPG method
top_features = {}
for method, metrics in correlation_results.items():
    sorted_metrics = sorted(metrics.items(), key=lambda x: abs(x[1]['correlation']), reverse=True)
    top_features[method] = sorted_metrics[:5]
print("Top 5 Features with Highest Correlation:")
for method, features in top_features.items():
    print(f"Method: {method}")
    for feature, result in features:
        print(f"  {feature}: Correlation = {result['correlation']:.4f}, p-value = {result['p_value']:.4f}")
    print("\n")
    

Top 5 Features with Highest Correlation:
Method: POS
  MeanNN: Correlation = 0.8260, p-value = 0.0001
  HF: Correlation = -0.4385, p-value = 0.0893
  pNN50: Correlation = 0.3320, p-value = 0.2091
  LF: Correlation = 0.2823, p-value = 0.2894
  SD2: Correlation = 0.2798, p-value = 0.3545


Method: LGI
  MeanNN: Correlation = 0.6053, p-value = 0.0130
  pNN50: Correlation = 0.4194, p-value = 0.1059
  LF_HF: Correlation = 0.3797, p-value = 0.1469
  SD1: Correlation = 0.2758, p-value = 0.3398
  RMSSD: Correlation = 0.2754, p-value = 0.3406


Method: OMIT
  MeanNN: Correlation = 0.6151, p-value = 0.0112
  pNN50: Correlation = 0.4302, p-value = 0.0963
  LF_HF: Correlation = 0.3090, p-value = 0.2442
  SD1: Correlation = 0.2708, p-value = 0.3491
  RMSSD: Correlation = 0.2703, p-value = 0.3499


Method: GREEN
  MeanNN: Correlation = 0.7794, p-value = 0.0006
  SD1: Correlation = 0.4238, p-value = 0.1155
  RMSSD: Correlation = 0.4236, p-value = 0.1156
  SDNN: Correlation = 0.3994, p-value = 0.1403


### Check the Bland-Altman, to see the mean bias nad the interlva of the Limit of Aggrement, make sure the point fall within the LoA

In [183]:
# # Check the value of the rPPG and GT with the Bland-Altman plot and 
# # see the measurement agreement between the rPPG methods and the ground truth

# def plot_bland_altman(rppg_values, gt_values, method, metric):
#     """ Plot Bland-Altman plot for rPPG values against ground truth values """
#     mean_diff = np.mean(rppg_values - gt_values)
#     std_diff = np.std(rppg_values - gt_values)

#     plt.figure(figsize=(10, 6))
#     plt.scatter((rppg_values + gt_values) / 2, rppg_values - gt_values, alpha=0.5)
#     plt.axhline(mean_diff, color='red', linestyle='--', label='Mean Difference')
#     plt.axhline(mean_diff + 1.96 * std_diff, color='green', linestyle='--', label='Upper Limit of Agreement')
#     plt.axhline(mean_diff - 1.96 * std_diff, color='blue', linestyle='--', label='Lower Limit of Agreement')
    
#     plt.title(f'Bland-Altman Plot: {method} - {metric}')
#     plt.xlabel('Mean of rPPG and GT Values')
#     plt.ylabel('Difference (rPPG - GT)')
#     plt.legend()
#     plt.grid()
#     plt.show()

# # Plot Bland-Altman plots for each method and metric
# for method in rppg_hrv_metrics.keys():
#     for metric in hrv_metrics.keys():
#         rppg_values = []
#         gt_values = []

#         for subject_id in rppg_hrv_metrics[method].keys():
#             # Use hrv_means for the rPPG values
#             if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
#                 rppg_values.append(hrv_means[method][subject_id][metric])
            
#             # For ground truth, get the first value from the list or calculate mean
#             if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
#                 if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
#                     gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

#         if len(rppg_values) > 0 and len(gt_values) > 0:
#             plot_bland_altman(np.array(rppg_values), np.array(gt_values), method, metric)


In [184]:
## Calculate the mean bias, average, standard deviation and the interval of the LOA
## Put inside the table and show the results
## Calculate the LoA percentage for each method and metric and see if the percentage is within 20% difference

def calculate_bland_altman_stats(rppg_values, gt_values):
    """ Calculate the Bland-Altman statistics """
    mean_diff = np.mean(rppg_values - gt_values)
    std_diff = np.std(rppg_values - gt_values)
    
    upper_limit = mean_diff + 1.96 * std_diff
    lower_limit = mean_diff - 1.96 * std_diff
    
    return mean_diff, std_diff, upper_limit, lower_limit

def calculate_percentage_difference(rppg_values, gt_values):
    """ Calculate the percentage difference between rPPG and ground truth values """
    percentage_diff = np.abs((rppg_values - gt_values) / gt_values) * 100
    return np.mean(percentage_diff)

# Prepare the results table
results_table = []  
for method in rppg_hrv_metrics.keys():
    for metric in hrv_metrics.keys():
        rppg_values = []
        gt_values = []

        for subject_id in rppg_hrv_metrics[method].keys():
            # Use hrv_means for the rPPG values
            if subject_id in hrv_means[method] and metric in hrv_means[method][subject_id]:
                rppg_values.append(hrv_means[method][subject_id][metric])
            
            # For ground truth, get the first value from the list or calculate mean
            if subject_id in gt_hrv_metrics and metric in gt_hrv_metrics[subject_id]:
                if not gt_hrv_metrics[subject_id][metric].empty:  # Check if the list is not empty
                    gt_values.append(gt_hrv_metrics[subject_id][metric][0])  # Get first element from list

        if len(rppg_values) > 0 and len(gt_values) > 0:
            mean_diff, std_diff, upper_limit, lower_limit = calculate_bland_altman_stats(np.array(rppg_values), np.array(gt_values))
            percentage_diff = calculate_percentage_difference(np.array(rppg_values), np.array(gt_values))

            results_table.append({
                'Method': method,
                'Metric': metric,
                'Mean Average': np.mean(rppg_values),
                'Ground Truth Average': np.mean(gt_values),
                'Mean Difference': mean_diff,
                'Standard Deviation': std_diff,
                'Upper Limit of Agreement': upper_limit,
                'Lower Limit of Agreement': lower_limit,
                'Percentage Difference': percentage_diff
            })
# Convert results to DataFrame for better visualization
results_df = pd.DataFrame(results_table)
# Display the results
print("\nBland-Altman Results:")
print(results_df)



Bland-Altman Results:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN    785.624175            779.046092         6.578083   
1     POS    SDNN    190.735063            146.760900        43.974164   
2     POS   RMSSD    256.455553            168.140605        88.314948   
3     POS   pNN50     72.969368             44.797938        28.171430   
4     POS      LF      0.036874              0.030068         0.006806   
5     POS      HF      0.079861              0.040887         0.038974   
6     POS   LF_HF      0.508300              0.972035        -0.463735   
7     POS     SD1    181.996040            119.180134        62.815906   
8     POS     SD2    197.843180            168.125976        29.717204   
9     LGI  MeanNN    775.326149            779.046092        -3.719943   
10    LGI    SDNN    165.767162            146.760900        19.006262   
11    LGI   RMSSD    218.640536            168.140605        50.499930   
12    LGI   pNN

In [185]:
### Calculate which methods are within 20% difference and the best in terms of minimal percentage difference
within_20_percent = results_df[results_df['Percentage Difference'] <= 20]
print("\nMethods within 20% difference:")
print(within_20_percent)


Methods within 20% difference:
   Method  Metric  Mean Average  Ground Truth Average  Mean Difference  \
0     POS  MeanNN    785.624175            779.046092         6.578083   
9     LGI  MeanNN    775.326149            779.046092        -3.719943   
18   OMIT  MeanNN    772.039060            779.046092        -7.007032   
27  GREEN  MeanNN    828.786503            779.046092        49.740412   
36  CHROM  MeanNN    768.973671            779.046092       -10.072421   

    Standard Deviation  Upper Limit of Agreement  Lower Limit of Agreement  \
0           141.652798                284.217568               -271.061402   
9           115.198200                222.068529               -229.508415   
18          113.175088                214.816142               -228.830205   
27          192.664930                427.363674               -327.882851   
36          114.850764                215.035078               -235.179919   

    Percentage Difference  
0               10.757436 

### Conclussion : 2 Minute Window

Stuff