# Extracting Audio Features using openSMILE

In this notebook, we will demonstrate how to extract audio features from a set of 10 random audio files using openSMILE. OpenSMILE is a popular open-source toolkit for extracting features from audio signals, which is widely used in speech processing and affective computing.

## Why openSMILE?
openSMILE provides a comprehensive set of audio features, including both low-level descriptors (LLDs) such as pitch, energy, and MFCCs, as well as high-level statistical functionals like means and standard deviations of these LLDs. This makes it a powerful tool for various audio analysis tasks.

## Features Extracted
We will use the `ComParE_2016` feature set at the `Functionals` level, which includes features such as:
- **Loudness**
- **MFCCs (Mel-frequency cepstral coefficients)**
- **Pitch and voicing related features**
- **Spectral features**


In [7]:

import os
import random
import pandas as pd
import librosa
import opensmile

# Path to audio files
path_to_audios = 'data/data_final/Audios'

# Collecting all .wav audio files
audios = []
for root, dirs, files in os.walk(path_to_audios):
    for name in files:
        if name.endswith('.wav'):
            audios.append(os.path.join(root, name))

# Select 10 random audio files
random_audios = random.sample(audios, 10)

# Function to read audio
def read_audio(path):
    y, sr = librosa.load(path, sr=44100)
    return y, sr

# Creating a DataFrame to hold audio data and features
df = pd.DataFrame(columns=['audiopath', 'audio_raw', 'sr', 'label'])
df['audiopath'] = random_audios

# Get audio data and sample rate
df[['audio_raw', 'sr']] = df['audiopath'].apply(lambda x: pd.Series(read_audio(x)))
df['label'] = df['audiopath'].apply(lambda x: x.split('/')[3])

# Initialize openSMILE feature extractor
smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.ComParE_2016,
    feature_level=opensmile.FeatureLevel.Functionals,
)

# Function to extract openSMILE features
def extract_features(audio, sr):
    result = smile.process_signal(audio, sr)
    return result.values.flatten()

# Extract features for each audio file and store in DataFrame
df['features'] = df.apply(lambda row: extract_features(row['audio_raw'], row['sr']), axis=1)

# Display the DataFrame with extracted features
df.head()

Unnamed: 0,audiopath,audio_raw,sr,label,features
0,data/data_final/Audios/Tonsill/Raw/aeiou/1/Ton...,"[0.00018310547, 9.1552734e-05, 0.00024414062, ...",44100,Tonsill,"[1.1715089, 0.705074, 0.24207188, 0.0570324, 0..."
1,data/data_final/Audios/Tonsill/U/3/Tonsill_ses...,"[0.044189453, 0.04348755, 0.04196167, 0.040832...",44100,Tonsill,"[0.31667173, 0.0, 0.98, 0.62788707, 0.643359, ..."
2,data/data_final/Audios/Contr/Brasero/3/Contr_s...,"[-0.002532959, -0.0026855469, -0.0010681152, 0...",44100,Contr,"[1.2833433, 0.17434211, 0.21052632, 0.43659222..."
3,data/data_final/Audios/Fess/U/2/FESS_ses2_u_00...,"[0.12445068, 0.12133789, 0.118621826, 0.115386...",44100,Fess,"[0.16239512, 0.7586207, 0.41379312, 1.0949354,..."
4,data/data_final/Audios/Tonsill/A/1/Tonsill_ses...,"[0.0047912598, -0.0005493164, -0.0050964355, -...",44100,Tonsill,"[0.7659278, 0.083333336, 0.8541667, 1.1094309,..."
