## Voice Signal Analysis and Feature Extraction

In this notebook, you will explore and extract meaningful features from Voice signals from the temporal and spectral domain.
This task will be performed using the common librarie for these tasks, OpenSMILE.

To get started, install heartpy and neurokit2. To do it, you can run the following command on the vscode terminal.
- pip install opensmile


In [None]:
import opensmile
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import os

### Features Extraction

To extract the features of voice, we use libraries such as OpenSMILE.
This library has multiple feature sets that we can use by ourselves. In this example, we are using the GeMAPS feature set, a commonly used set with lots of features, such as f0, jitter and shimmer.
The feature level should stay as functionals, since the other levels use frame-level resolution, creating a feature matrix, not vector.

In the extraction below, we work as follows:
1. create the OpenSMILE processor with GeMAPS.
2. define the selected features we want to extract. You can find in the end of this notebook code that you can run to read the features avaliable on GeMAPS or search it on the internet.
3. select only the features selected from the entire feature set
4. return a tuple containing the features selected.

In [None]:
def extract_features(audio_path: str) -> tuple|None:
    """
    Extracts F0, jitter, and shimmer features from an audio file.

    Args:
        audio_path (str): Path to the audio file.

    Returns:
        tuple of features selected.
    """
    smile = opensmile.Smile(
        feature_set=opensmile.FeatureSet.GeMAPSv01b,
        feature_level=opensmile.FeatureLevel.Functionals
    )

    selected_features = [
        'F0semitoneFrom27.5Hz_sma3_amean',
        'F0semitoneFrom27.5Hz_sma3_stddevNorm',
        'F0semitoneFrom27.5Hz_sma3_quartile2',
        'jitterLocal_sma3_amean',
        'shimmerLocaldB_sma3_amean'
    ]
    try:
        features = smile.process_file(audio_path)
        
        selected = features[selected_features]
        
        return tuple(selected.iloc[0].values.tolist())
    
    except Exception as e:
        print(f"Error processing {audio_path}: {e}")
        return None

### OpenSMILE structure

OpenSMILE has a lot of comprehensible feature sets to extract from an audio, so feel free to explore them.
The feature_set variable defines which set you can use. these go from GeMAPS (a popular one with features like F0, jitter and shimmer).
The feature_level is the way the feature set will be exposed.
- LowLevelDescriptors for Frame-level (e.g., every 10ms). Useful for signal modeling or speech segmentation
- LowLevelDescriptors_Delta	for Frame-level plus deltas (first derivative)
- Functionals for Aggregated over the whole file (mean, std, min, etc.). Best for classification.

Keep in mind that to use frame-level features, you'll have to take into consideration time as well, so for simplicity, you should avoid it.

In [None]:
smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.GeMAPSv01b,
    feature_level=opensmile.FeatureLevel.Functionals
)

audio_path = os.path.join(os.getcwd(), "common_voice_ka_21298144.mp3")

# Extract features
features_df = smile.process_file(audio_path)

# Print type and shape of the output
print(f"Type of output: {type(features_df)}")
print(f"Shape of output: {features_df.shape}")

# The output is a pandas DataFrame with one row and many columns
print("\nColumns (feature names):")
print(features_df.columns)

print("\nPreview of feature values:")
print(features_df.head())

print("\nFeature statistics:")
print(features_df.describe())