This notebook is used for extracting the features from extrinsic data (Microphone). This is done using the feature extraction library called Librosa which can handle audio .wav files.

In [None]:
import os
import librosa
import pandas as pd
import numpy as np  #pip install librosa and resampy

A function is created which automatically extracts a list of pre-chosen features from a given audio file. It then returns these features in an array with all the numerical values.

In [4]:
def extract_features(file_path):
    try:
        audio, sample_rate = librosa.load(file_path, res_type='kaiser_fast') 
        n_fft = 512  # Lower value for FFT window

        # Check if audio is shorter than n_fft and pad if necessary
        if len(audio) < n_fft:
            audio = np.pad(audio, (0, n_fft - len(audio)), 'constant')

        chroma_stft = librosa.feature.chroma_stft(y=audio, sr=sample_rate, n_fft=n_fft)
        spec_contrast = librosa.feature.spectral_contrast(y=audio, sr=sample_rate, n_fft=n_fft)
        tonnetz = librosa.feature.tonnetz(y=audio, sr=sample_rate)
        mfcc = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40, n_fft=n_fft)
        spectral_bandwidth = librosa.feature.spectral_bandwidth(y=audio, sr=sample_rate, n_fft=n_fft)
        zero_crossing_rate = librosa.feature.zero_crossing_rate(audio)
        spectral_rolloff = librosa.feature.spectral_rolloff(y=audio, sr=sample_rate, n_fft=n_fft)

        features = [np.mean(chroma_stft), np.mean(spec_contrast), np.mean(tonnetz), np.mean(spectral_bandwidth), np.mean(zero_crossing_rate), np.mean(spectral_rolloff)] + [np.mean(e) for e in mfcc]

    except Exception as e:
        print("Error encountered while parsing file: ", file_path)
        return None 

    return features





Now we can create the dataframe which will hold our features. We define the columns as our pre-chosen features and append the filename of the .wav clip aswell. Then we create an empty pandas dataframe to hold all the extrinsic features. Then a loop goes through all the audio files in the specified directory and extracts the features into a data dictionary. A data dictionary is basically just an array with column names and data combined, but we want it to be formatted in the dataframe for later perpouses. So we simply convert the dictionary by mapping each value in it, to its corresponding column in the dataframe. Once we have converted the dictionary to a dataframe we can merge it with our previously created feature dataframe using the concatenate merger.

In [6]:
# directory containing your audio files
root_dir = r"C:\Users\kaspe\Documents\GitHub\AAU-IoT-Solution-AI-REDGIO\data_ozren\Extrinsic data" # replace with your directory

# Construct the column names
column_names = ['chroma_stft', 'spec_contrast', 'tonnetz', 'spectral_bandwidth', 'zero_crossing_rate', 'spectral_rolloff'] + [f'mfcc_{i}' for i in range(1, 41)]

# add 'filename' to the column names
column_names.append('filename')

# Create a dataframe that will hold the features
features_df = pd.DataFrame(columns = column_names)

# Walk through the directory (and subdirectories)
for subdir, dirs, files in os.walk(root_dir):
    for file in files:
        # Only process .wav files
        if file.endswith(".wav"):
            file_path = subdir + os.sep + file
            data = extract_features(file_path)

            # Extract the base file name without extension
            filename = os.path.splitext(os.path.basename(file_path))[0]

            # Add the features to the dataframe
            data_dict = {column_names[i]: data[i] for i in range(len(data))}
            data_dict['filename'] = filename
            features_df = pd.concat([features_df, pd.DataFrame([data_dict])], ignore_index=True) #Older projects used pd.append, but this is deprecated in pandas 2.0 and newer. Note to self: remember to check for this


  features_df = pd.concat([features_df, pd.DataFrame([data_dict])], ignore_index=True) #Ozren used pd.append, but this is deprecated in pandas 2.0 and newer


For later purposes we want the filename (currently eg: e2012010141) to be reformatted to be an id in the format: id2012010141. This is to easily match the extrinsic features with the intrinsic and task features. 

In [7]:
# Filter rows where filename starts with 'e'
features = features_df[features_df['filename'].str.startswith('e')]

# Replace 'e' at the start of filenames with 'id'
features['filename'] = features['filename'].str.replace('^e', 'id', regex=True)

# Rename the column
features = features.rename(columns={'filename': 'id'})

print(features)

      chroma_stft  spec_contrast   tonnetz  spectral_bandwidth  \
0        0.597750      16.064072  0.001926         2454.136667   
1        0.551161      16.131209  0.000611         2444.964944   
2        0.560216      15.960120  0.008566         2440.323666   
3        0.523628      15.838458 -0.001173         2392.534258   
4        0.552583      16.056320 -0.002082         2416.713182   
...           ...            ...       ...                 ...   
1336     0.587797      15.865158  0.000048         2436.987172   
1337     0.616711      15.826213  0.002555         2436.819526   
1338     0.588844      15.805292  0.006448         2495.783106   
1339     0.519097      15.978502  0.014566         2419.118816   
1340     0.571080      15.686053  0.000093         2442.715556   

      zero_crossing_rate  spectral_rolloff      mfcc_1     mfcc_2     mfcc_3  \
0               0.417568       7691.967773 -321.242523 -34.783047 -53.732311   
1               0.410902       7480.993652 -329

Finally we extract the dataframe into csv file which can then be used for training.

In [8]:
# write features to .csv file
features.to_csv(r"C:\Users\kaspe\Documents\GitHub\AAU-IoT-Solution-AI-REDGIO\audio_features_clean.csv", index=False)