# EmotiBit PPG Signal & HRV Analysis

This Jupyter Notebook is specifically configured to load, process, and visualize **PPG (Photoplethysmography)** sensor data from an EmotiBit device, with a focus on performing Heart Rate Variability (HRV) analysis.

**Features:**
1.  **Automatic Dependency Installation**: Installs all required Python packages (`pandas`, `matplotlib`, `neurokit2`).
2.  **Focused Data Loading**: Reads only the PPG (`PI`, `PR`, `PG`) CSV files from a specified folder.
3.  **HRV Analysis**: Applies a sliding window to a selected PPG signal to calculate HRV metrics over time.
4.  **HRV Visualization**: Plots key HRV metrics (e.g., RMSSD) to show their trend over the recording session.
5.  **PPG Signal Plotting**: Creates a separate, clean plot for each of the three PPG channels.
6.  **Save to File**: Automatically saves each generated plot as a PNG image.

## 1. Install All Dependencies

This cell will install all Python libraries required for this notebook to function correctly.

In [None]:
import sys
!{sys.executable} -m pip install pandas matplotlib openpyxl neurokit2
print("All required packages are installed.")

## 2. Configuration

Modify the data folder path. This notebook is pre-configured to only look for PPG signals.

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime
import glob

# --- User Configuration --- #

# 1. Set the path to the specific data folder you want to process.
DATA_FOLDER_PATH = '../PwD dataset/August 5 Morning AF 000233' # <-- CHANGE THIS to your data folder path

# 2. Define path for the output plots (will be created inside the data folder).
OUTPUT_FOLDER = DATA_FOLDER_PATH + ' plots'

# 3. Channels to plot are now limited to PPG signals only.
CHANNELS = {
    'PI': 'PPG Infrared',
    'PR': 'PPG Red',
    'PG': 'PPG Green',
}
SIGNALS_TO_PLOT = list(CHANNELS.keys())

# 4. Plot styling settings
FIGURE_SIZE = (22, 7)

## 3. Data Loading and Preprocessing

This function loads the EmotiBit sensor data from your files. The data loading process is executed at the end of the cell.

In [None]:
def load_emotibit_data(folder_path, signals_to_plot):
    """Loads and processes EmotiBit CSV files from a specified folder."""
    print(f"--- Starting EmotiBit Data Loading from '{folder_path}' ---")
    dataframes = {}
    try:
        all_files = glob.glob(os.path.join(folder_path, '*.csv'))
        if not all_files:
            print(f"Error: No CSV files found in '{folder_path}'. Please check the path.")
            return {}
    except Exception as e:
        print(f"Error: Could not access folder '{folder_path}'. {e}")
        return {}

    for signal_type in signals_to_plot:
        file_pattern = os.path.join(folder_path, f"*_{signal_type}.csv")
        found_files = glob.glob(file_pattern)
        if not found_files:
            continue
        file_path = found_files[0]
        print(f"  - Reading file: {os.path.basename(file_path)}")
        
        try:
            df = pd.read_csv(file_path)
            if 'LocalTimestamp' not in df.columns or df.shape[1] < 2:
                continue
            signal_column_name = df.columns[-1]
            # Convert Unix timestamp to a timezone-aware datetime object (US/Eastern)
            df['est_time'] = pd.to_datetime(df['LocalTimestamp'], unit='s').dt.tz_localize('UTC').dt.tz_convert('US/Eastern')
            df.rename(columns={signal_column_name: 'value'}, inplace=True)
            dataframes[signal_type] = df[['est_time', 'value']]
        except Exception as e:
            print(f"  - Error processing file '{os.path.basename(file_path)}': {e}")
    print(f"--- EmotiBit Data Loading Complete. Loaded {len(dataframes)} signals. ---\n")
    return dataframes

# --- Execute Data Loading --- #
print("--- Section 1-3: Setup & Data Loading ---")
print(f"Data folder set to: {DATA_FOLDER_PATH}")
os.makedirs(OUTPUT_FOLDER, exist_ok=True)
print(f"Output folder set to: {OUTPUT_FOLDER}\n")

emotibit_data = load_emotibit_data(DATA_FOLDER_PATH, SIGNALS_TO_PLOT)

## 4. HRV Analysis (Sliding Window Method)

This section performs the Heart Rate Variability (HRV) analysis on the raw PPG signal.

**Analysis Pipeline**:
1.  **Select Signal**: We'll use `PI` (PPG Infrared) by default, as it's often the most robust against motion artifacts.
2.  **Calculate Sampling Rate**: The signal's sampling frequency (Hz) is automatically calculated from its timestamps.
3.  **Apply Sliding Window**: A fixed-size window (e.g., 30 seconds) moves across the signal with a specific step size (e.g., 15 seconds), allowing for overlap between windows.
4.  **Compute HRV**: Within each window, the `neurokit2` library is used to filter the signal, detect heartbeats (peaks), and calculate a suite of time-domain and frequency-domain HRV metrics.
5.  **Visualize Results**: A few HRV metrics are plotted over time to observe its dynamics throughout the recording.

In [None]:
import neurokit2 as nk
import numpy as np

# --- HRV Analysis Configuration --- #
# For accurate frequency-domain analysis, a long window is recommended.
WINDOW_SECONDS = 300      # 300 seconds = 5 minutes (Standard)
STEP_SECONDS = 60         # 60 seconds = 1 minute step

# --- HRV Analysis Execution --- #
PPG_SIGNAL_TO_USE = 'PI'
print(f"\n--- Starting HRV Analysis for signal '{PPG_SIGNAL_TO_USE}' ---")

# 1. Get the selected PPG signal DataFrame
df_ppg = emotibit_data.get(PPG_SIGNAL_TO_USE)

if df_ppg is None or df_ppg.empty:
    print(f"Error: Signal '{PPG_SIGNAL_TO_USE}' not found in the loaded data. Skipping HRV analysis.")
else:
    # 2. Calculate the sampling rate (fs)
    time_diffs = df_ppg['est_time'].diff().dt.total_seconds()
    sampling_rate = 1 / time_diffs.median()
    print(f"  - Calculated Sampling Rate: {sampling_rate:.2f} Hz")

    # --- Data Quality Check Plot (Robust Version) ---
    print("\n--- Plotting Data Quality Check ---")
    
    total_samples = len(df_ppg)
    desired_samples = int(60 * sampling_rate) # Attempt to plot the first 60 seconds
    samples_to_plot = min(total_samples, desired_samples) # Only take what's available
    
    # NeuroKit needs a few seconds of data to work
    if samples_to_plot < (5 * sampling_rate):
        print("  - Data is too short to generate a quality plot.")
    else:
        # Interpolate to handle potential NaNs in the sample
        ppg_sample = df_ppg['value'].iloc[:samples_to_plot].interpolate(method='linear', limit_direction='both')
        try:
            ppg_signals_sample, info_sample = nk.ppg_process(ppg_sample, sampling_rate=sampling_rate)
            nk.ppg_plot(ppg_signals_sample, info_sample)
            plt.show()
        except Exception as e:
            print(f"  - Could not generate data quality plot. Error: {e}")
            
    # 3. Convert window and step size to number of samples
    window_size_samples = int(WINDOW_SECONDS * sampling_rate)
    step_size_samples = int(STEP_SECONDS * sampling_rate)
    
    hrv_results = []
    
    # 4. Iterate through the signal with a sliding window
    num_samples = len(df_ppg)
    for start_index in range(0, num_samples - window_size_samples + 1, step_size_samples):
        end_index = start_index + window_size_samples
        window_ppg_data = df_ppg['value'].iloc[start_index:end_index]
        window_timestamp = df_ppg['est_time'].iloc[start_index + window_size_samples // 2]

        # Handle any missing data points (NaNs) in the window before processing
        if window_ppg_data.isnull().any():
            window_ppg_data = window_ppg_data.interpolate(method='linear', limit_direction='both')

        try:
            # First, process the signal to find the peak locations
            ppg_signals, info = nk.ppg_process(window_ppg_data, sampling_rate=sampling_rate)
            
            # Check if enough peaks were found for a valid calculation
            if len(info['PPG_Peaks']) > 2:
                # Call hrv() explicitly with the peak locations to avoid ambiguity
                hrv_metrics = nk.hrv(peaks=info['PPG_Peaks'], 
                                     sampling_rate=sampling_rate, 
                                     show=False)
                
                hrv_metrics['Timestamp'] = window_timestamp
                hrv_results.append(hrv_metrics)
        except Exception as e:
            print(f"  - Warning: Could not process window at {window_timestamp}. Error: {e}")

    # 6. Convert results to a DataFrame
    if hrv_results:
        hrv_df = pd.concat(hrv_results, ignore_index=True)
        print("\n--- HRV Analysis Complete --- ")
        
        # Display the key columns from the dataframe
        columns_to_display = ['Timestamp', 'HRV_RMSSD', 'HRV_SDNN', 'HRV_MeanNN', 'HRV_LFHF', 'HRV_HFn', 'HRV_SD1', 'HRV_SD2', 'HRV_SampEn']
        existing_columns = [col for col in columns_to_display if col in hrv_df.columns]
        display(hrv_df[existing_columns].head())

        # 7. --- Plotting the 8-plot dashboard of HRV metrics ---
        print("\n--- Plotting Key HRV Metrics Dashboard ---")
        fig, axes = plt.subplots(4, 2, figsize=(22, 20), sharex=True)
        fig.suptitle(f'HRV Metrics Dashboard ({WINDOW_SECONDS}s window, {STEP_SECONDS}s step)', fontsize=20)
        
        # Dictionary defining the 8 plots
        plots = {
            (0, 0): {'metric': 'HRV_RMSSD', 'title': 'Time Domain: RMSSD (Short-term)', 'ylabel': 'RMSSD (ms)'},
            (0, 1): {'metric': 'HRV_SDNN', 'title': 'Time Domain: SDNN (Long-term)', 'ylabel': 'SDNN (ms)'},
            (1, 0): {'metric': 'HRV_MeanNN', 'title': 'Time Domain: MeanNN (Avg. Heart Rate)', 'ylabel': 'MeanNN (ms)'},
            (1, 1): {'metric': 'HRV_LFHF', 'title': 'Frequency Domain: LF/HF (Balance)', 'ylabel': 'LF/HF Ratio'},
            (2, 0): {'metric': 'HRV_HFn', 'title': 'Frequency Domain: HFn (Parasympathetic)', 'ylabel': 'HF Power (n.u.)'},
            (2, 1): {'metric': 'HRV_SampEn', 'title': 'Nonlinear: SampEn (Complexity)', 'ylabel': 'Sample Entropy'},
            (3, 0): {'metric': 'HRV_SD1', 'title': 'Nonlinear: SD1 (Short-term)', 'ylabel': 'SD1 (ms)'},
            (3, 1): {'metric': 'HRV_SD2', 'title': 'Nonlinear: SD2 (Long-term)', 'ylabel': 'SD2 (ms)'}
        }

        for (row, col), plot_info in plots.items():
            ax = axes[row, col]
            if plot_info['metric'] in hrv_df.columns:
                ax.plot(hrv_df['Timestamp'], hrv_df[plot_info['metric']], marker='.', linestyle='-')
                ax.set_title(plot_info['title'], fontsize=14)
                ax.set_ylabel(plot_info['ylabel'])

        # Formatting for all subplots
        for ax in axes.flat:
            ax.grid(True, linestyle=':')
            ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S', tz='US/Eastern'))
        
        fig.autofmt_xdate()
        plt.tight_layout(rect=[0, 0.03, 1, 0.97]) # Adjust layout for the main title
        plt.show()
        
        hrv_plot_path = os.path.join(OUTPUT_FOLDER, 'HRV_Dashboard_plot.png')
        fig.savefig(hrv_plot_path, dpi=150, bbox_inches='tight', pad_inches=0.1)
        print(f"  - HRV Dashboard plot saved to: {hrv_plot_path}")
        plt.close(fig)


        '''
        print(f"\n--- Plotting all {len(hrv_df.columns) - 1} calculated HRV metrics ---")
            
        hrv_metric_columns = [col for col in hrv_df.columns if col.startswith('HRV_')]

        for metric in hrv_metric_columns:
            fig, ax = plt.subplots(figsize=FIGURE_SIZE)
            ax.plot(hrv_df['Timestamp'], hrv_df[metric], marker='.', linestyle='-')
            ax.set_title(f'HRV Metric Over Time: {metric}', fontsize=18, pad=20)
            ax.set_xlabel('Time (EST)', fontsize=12)
            ax.set_ylabel(metric, fontsize=12)
            ax.grid(True, linestyle=':')
            ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S', tz='US/Eastern'))
            fig.autofmt_xdate()
            plt.show()
            
            plot_filename = f"{metric}_plot.png"
            plot_save_path = os.path.join(OUTPUT_FOLDER, plot_filename)
            try:
                fig.savefig(plot_save_path, dpi=150, bbox_inches='tight', pad_inches=0.1)
                print(f"  - Plot saved to: {plot_save_path}")
            except Exception as e:
                print(f"  - Could not save plot for {metric}. Error: {e}")
            finally:
                plt.close(fig)
        '''
        
    else:
        print("--- HRV Analysis Finished: No valid HRV data could be computed. ---")

## 5. Raw PPG Signal Visualization and Saving

This function generates a plot for each raw PPG signal and saves each plot as a PNG file.

In [None]:
def plot_and_save_signal(signal_name, df, channel_map, output_folder):
    """Plots a single signal and saves the figure."""
    if df.empty or df['est_time'].isnull().all():
        print(f"  - Skipping plot for {signal_name} due to empty data.")
        return

    fig, ax = plt.subplots(figsize=FIGURE_SIZE)
    
    # Plot the signal data
    ax.plot(df['est_time'], df['value'], label=signal_name, color='royalblue', linewidth=1.5)
    
    # Set plot titles and labels
    full_signal_name = channel_map.get(signal_name, signal_name)
    ax.set_title(f'EmotiBit Signal: {signal_name} ({full_signal_name})', fontsize=18, pad=20)
    ax.set_xlabel('Time (EST)', fontsize=12)
    ax.set_ylabel('Value', fontsize=12)
    
    # Formatting
    ax.grid(True, axis='y', which='major', linestyle=':', color='gray', linewidth=0.8, alpha=0.7)
    ax.legend(loc='upper left')
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S', tz='US/Eastern'))
    fig.autofmt_xdate()

    plt.show()
    
    # Save the figure to a file
    output_filename = f"{signal_name}_plot.png"
    full_output_path = os.path.join(output_folder, output_filename)
    try:
        fig.savefig(full_output_path, dpi=150, bbox_inches='tight', pad_inches=0.1)
        print(f"  - Raw signal plot saved to: {full_output_path}")
    except Exception as e:
        print(f"  - Error saving plot for {signal_name}: {e}")
    finally:
        plt.close(fig)

# --- Execute Plotting and Saving --- #
if not emotibit_data:
    print("Execution finished. No data was loaded, so no plots were generated.")
else:
    print(f"\n--- Generating and Saving {len(emotibit_data)} Raw Signal Plots ---")
    for signal, df in emotibit_data.items():
        plot_and_save_signal(signal, df, CHANNELS, OUTPUT_FOLDER)
    print("\n--- All plots have been generated and saved --- ")