## PPG Signal Analysis and Feature Extraction

In this notebook, you will explore and extract meaningful features from PPG (Photoplethysmogram) signals from the temporal and spectral domain.
This task will be performed using the common libraries for these tasks, HeartPy and NeuroKit2, respectively.

To get started, install heartpy and neurokit2. To do it, you can run the following command on the vscode terminal.
- pip install opensmile


In [6]:
import opensmile
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os

### Temporal Features Extraction

Now that we have the required imports, we can extract features from the signal.
To extract features, there are 2 approaches you can use:
- The libraries we are using already provide certain features. You can check what features are extracted automatically with the libraries in the last blocks of this notebook.
- Certain features are statistical. We can use numpy to perform statistical operations on properties of the signal to acquire them.

In the script below we have examples of both techniques, so let's unerstand what we are doing:
1. First, we extract working_data and measures from the signal, by processing it using heartpy.
2. In measures, we have features such as the Heart Rate (hr), pnn20 and sd1.
3. From working_data, we work with the RR_list, where we have a list that had the intervals between consecutive RR peaks, measured in miliseconds (ms).
4. From the RR_list, we extract some statistics, sucg as the mean RR interval (mean_rr), sdsd and kurtosis.

Now you can add and remove features according to the list of features you chose for your project.

In [None]:
from datasets import load_dataset

# Load English split (or another language if you prefer)
dataset = load_dataset("mozilla-foundation/common_voice_12_0", "en", split="train[:1%]")  # small subset for testing

In [None]:
smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.ComParE_2016,  # or choose emobase, IS10_paraling, etc.
    feature_level=opensmile.FeatureLevel.Functionals,
)

def extract_features(audio_path):
    """
    Extracts audio features from a given audio file.

    Args:
        audio_path (str): Path to the audio file (wav, mp3, etc.).

    Returns:
        pandas.Series: Extracted features (functionals) for the entire audio file.
    """
    try:
        features = smile.process_file(audio_path)
        return features.iloc[0]  # Return as Series (row)
    except Exception as e:
        print(f"Error processing {audio_path}: {e}")
        return None

### HeartPy structure

HeartPy processes PPG and ECG signals. By applying the process() method, we can obtain the working_data and measures.
The script below helps you explore the data structure resulting from the process() method.

In [None]:
smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.ComParE_2016,  # or choose emobase, IS10_paraling, etc.
    feature_level=opensmile.FeatureLevel.Functionals,
)

audio_path = ""
features_df = smile.process_file(audio_path)

# Print type and shape of the output
print(f"Type of output: {type(features_df)}")
print(f"Shape of output: {features_df.shape}")

# The output is a pandas DataFrame with one row (functionals) and many columns (features)
print("\nColumns (feature names):")
print(features_df.columns)

print("\nPreview of feature values:")
print(features_df.head())

# Optional: show statistical info
print("\nFeature statistics:")
print(features_df.describe())


### NeuroKit2 Structure

Neurokit2 has the power to process multiple signals. For this module, we will be using the method ppg_process() to obtain the working_data(DataFrame) and information(dict).
The script below is meant to help you understand how the DataFrame is structured.

In [8]:
smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.ComParE_2016,
    feature_level=opensmile.FeatureLevel.Functionals
)

audio_path = os.path.join(os.getcwd(), "common_voice_ka_21298144.mp3")

# Extract features
features_df = smile.process_file(audio_path)

# Print type and shape of the output
print(f"Type of output: {type(features_df)}")
print(f"Shape of output: {features_df.shape}")

# The output is a pandas DataFrame with one row and many columns
print("\nColumns (feature names):")
print(features_df.columns)

print("\nPreview of feature values:")
print(features_df.head())

print("\nFeature statistics:")
print(features_df.describe())

UnicodeEncodeError: 'ascii' codec can't encode character '\xed' in position 11: ordinal not in range(128)