## PPG Signal Analysis and Feature Extraction

In this notebook, you will explore and extract meaningful features from PPG (Photoplethysmogram) signals from the temporal and spectral domain.
This task will be performed using the common libraries for these tasks, HeartPy and NeuroKit2, respectively.

To get started, install heartpy and neurokit2. To do it, you can run the following command on the vscode terminal.
- pip install heartpy neurokit2


In [None]:
import heartpy as hp
import neurokit2 as nk
import matplotlib.pyplot as plt
import numpy as np

### Temporal Features Extraction

Now that we have the required imports, we can extract features from the signal.
To extract features, there are 2 approaches you can use:
- The libraries we are using already provide certain features. You can check what features are extracted automatically with the libraries in the last blocks of this notebook.
- Certain features are statistical. We can use numpy to perform statistical operations on properties of the signal to acquire them.

In the script below we have examples of both techniques, so let's unerstand what we are doing:
1. First, we extract working_data and measures from the signal, by processing it using heartpy.
2. In measures, we have features such as the Heart Rate (hr), pnn20 and sd1.
3. From working_data, we work with the RR_list, where we have a list that had the intervals between consecutive RR peaks, measured in miliseconds (ms).
4. From the RR_list, we extract some statistics, sucg as the mean RR interval (mean_rr), sdsd and kurtosis.

Now you can add and remove features according to the list of features you chose for your project.

In [None]:
from scipy.stats import kurtosis

def extract_ppg_temporal_features( ppg_signal:np.array, sampling_rate:int ) -> tuple :
    """
    Extracts key heart rate variability (HRV) and statistical temporal features from a PPG signal.

    Parameters:
        ppg_signal (np.array): The raw PPG signal as a 1D NumPy array.
        sampling_rate (int): The sampling rate of the signal in Hz.

    Returns:
        tuple with temporal features
    """
    working_data, measures = hp.process(ppg_signal, sampling_rate)
    rr_list = np.array(working_data['RR_list']) # RR_list is the list of intervals between consecutive RR peaks
    
    # add and remove features according to your needs. 
    mean_rr = np.mean(rr_list)
    rr_kurtosis = kurtosis(rr_list)
    sdsd = np.std(np.diff(rr_list), ddof=1)
    hr = measures['bpm']
    pnn20 = measures['pnn20']
    sd1 = measures['sd1']
    
    return (mean_rr, rr_kurtosis, sdsd, hr, pnn20)

### Spectral Features Extraction

Additionally, we can also extract spectral features from the signal.
NeuroKit2 extracts most of the features in the spectral domain, so all examples are using them. 
However, we encourage you to search how to extract any features you believe should be included and aren't in the standard NeuroKit2 features.

In the script below we:
1. First, we extract a processed format of the signal and its info.
2. From the processed signal, we select the peaks list. The PPG peaks marked as “1” in a list of zeros.
3. We extract the frequency features power in Low Frequency bands and LF/HF ratio.

Now you can add and remove features according to the list of features you chose for your project.

In [None]:
def extract_ppg_spectral_features( ppg_signal:np.array, sampling_rate:int ) -> tuple :
    """
    Extracts key statistical spectral features from a PPG signal.

    Parameters:
        ppg_signal (np.array): The raw PPG signal as a 1D NumPy array.
        sampling_rate (int): The sampling rate of the signal in Hz.

    Returns:
        tuple with spectral features
    """
    processed, info = nk.ppg_process(ppg_signal, sampling_rate=sampling_rate)
    peaks = processed["PPG_Peaks"]
    hrv_freq = nk.hrv_frequency(peaks, sampling_rate=4, show=False)
    
    # add and remove features according to your needs. 
    lf_power = hrv_freq['HRV_LF']
    lfhf_ratio = hrv_freq['HRV_LFHF']
    
    return (lf_power, lfhf_ratio)

### HeartPy structure

HeartPy processes PPG and ECG signals. By applying the process() method, we can obtain the working_data and measures.
The script below helps you explore the data structure resulting from the process() method.

In [None]:
ppg_signal = hp.get_data('data.csv')
sr = 100 #sample rate

working_data, measures = hp.process(ppg_signal, sr)

hp.plotter(working_data, measures)
plt.show()

print("\n--- HeartPy key informations ---")
for key in working_data:
    print(key)
print("\n--- HeartPy Metrics ---")
for key, value in measures.items():
    print(f"{key}: {value}")


### NeuroKit2 Structure

Neurokit2 has the power to process multiple signals. For this module, we will be using the method ppg_process() to obtain the working_data(DataFrame) and information(dict).
The script below is meant to help you understand how the DataFrame is structured.

In [None]:
ppg_signal = hp.get_data('data.csv')
sr = 100 #sample rate
signals, info = nk.ppg_process(ppg_signal, sampling_rate=sr)

# Plot main signal with peaks
plt.figure(figsize=(12, 4))
plt.plot(signals["PPG_Clean"], label="PPG Clean", color='royalblue')
plt.plot(signals["PPG_Peaks"] * signals["PPG_Clean"].max(), 'r.', label="Detected Peaks")
plt.title("📈 Processed PPG Signal with Peaks")
plt.xlabel("Samples")
plt.ylabel("Amplitude")
plt.legend()
plt.show()

print("\n📊 Available columns in processed signal:")
print(signals.columns.tolist())
print("\n🔍 First 5 rows of processed signal:")
print(signals.head())