# Feature Flattening
----

Feature extraction of audio samples is a bit unique in that we necessarily have an element of timing that must be captured as it is important to our perception of the audio. All of the feature extraction processes implemented thus have framing, in which values for each feature are calculated frame by frame throughout an audio sample. Hence the resultant features are at least 1-dimensional arrays, though most of them are in fact 2-dimensional arrays. This presents a bit of difficulty in the classification process, as our currently implemented methods rely on single values. Hence, we have to do a bit of feature flattening.

We will achieve this process by literally flattening each feature vector (say the first feature vector of the MFCC feature matrix) to a single row where the new features are then also segmented by frame. This will allow us to stack appropriately. In order to account for the possibility that some samples will be longer than others, we will have to pad the end of any shorter rows with 0s, so as to represent that these samples don't contain any relevant features beyond that point. This should theoretically help classification more than padding with the mean or something similar.

Let's start by importing our required libraries.

In [1]:
import pandas as pd
import numpy as np

from audio_feature_extraction import BatchExtractor

In [2]:
%load_ext autoreload
%autoreload 2

import warnings
warnings.filterwarnings('ignore')

We'll initialize a new BatchExtractor object, which contains a method that will flatten all the merged features for each sample into a single (enormous) dataframe. This will obviously not be suitable directly for classification, as we will need to do some dimensionality reduction to reduce the number of features before classification will be not a painful process for me or the computer. Note that this new BatchExractor object will not be used for any actual extraction, it just contains the method we need to flatten things appropriately.

In [61]:
bird_index = pd.read_csv('bird_vocalization_index.csv', index_col=0)

In [62]:
bird_index = bird_index.drop(columns=[
    'country',
    'file_url',
    'license',
    'recordist',
    'recordist_url',
    'sonogram_url',
    'remarks',
    'latitude',
    'longitude',
    'location',
    'full_name',
    'file_id',
    'duration_seconds',
    'type'])

In [63]:
bird_index.head()

Unnamed: 0,english_cname,file_name,genus,species
0,Abert's Towhee,XC17804.mp3,Melozone,aberti
1,Abert's Towhee,XC177367.mp3,Melozone,aberti
2,Abert's Towhee,XC145505.mp3,Melozone,aberti
3,Abert's Towhee,XC228159.mp3,Melozone,aberti
4,Abert's Towhee,XC51313.mp3,Melozone,aberti


In [64]:
bird_index['label'] = bird_index.apply(lambda r: f'{r.genus} {r.species}', axis=1)

In [65]:
bird_index.head()

Unnamed: 0,english_cname,file_name,genus,species,label
0,Abert's Towhee,XC17804.mp3,Melozone,aberti,Melozone aberti
1,Abert's Towhee,XC177367.mp3,Melozone,aberti,Melozone aberti
2,Abert's Towhee,XC145505.mp3,Melozone,aberti,Melozone aberti
3,Abert's Towhee,XC228159.mp3,Melozone,aberti,Melozone aberti
4,Abert's Towhee,XC51313.mp3,Melozone,aberti,Melozone aberti


In [66]:
bird_index['name'] = bird_index.apply(lambda r: f'{r.file_name[:-4]}', axis=1)

In [67]:
bird_index

Unnamed: 0,english_cname,file_name,genus,species,label,name
0,Abert's Towhee,XC17804.mp3,Melozone,aberti,Melozone aberti,XC17804
1,Abert's Towhee,XC177367.mp3,Melozone,aberti,Melozone aberti,XC177367
2,Abert's Towhee,XC145505.mp3,Melozone,aberti,Melozone aberti,XC145505
3,Abert's Towhee,XC228159.mp3,Melozone,aberti,Melozone aberti,XC228159
4,Abert's Towhee,XC51313.mp3,Melozone,aberti,Melozone aberti,XC51313
...,...,...,...,...,...,...
2725,Yellow-breasted Chat,XC278880.mp3,Icteria,virens,Icteria virens,XC278880
2726,Yellow-breasted Chat,XC247723.mp3,Icteria,virens,Icteria virens,XC247723
2727,Yellow-breasted Chat,XC408122.mp3,Icteria,virens,Icteria virens,XC408122
2728,Yellow-breasted Chat,XC315271.mp3,Icteria,virens,Icteria virens,XC315271


In [68]:
bird_index = bird_index.set_index('name')

In [69]:
bird_index

Unnamed: 0_level_0,english_cname,file_name,genus,species,label
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
XC17804,Abert's Towhee,XC17804.mp3,Melozone,aberti,Melozone aberti
XC177367,Abert's Towhee,XC177367.mp3,Melozone,aberti,Melozone aberti
XC145505,Abert's Towhee,XC145505.mp3,Melozone,aberti,Melozone aberti
XC228159,Abert's Towhee,XC228159.mp3,Melozone,aberti,Melozone aberti
XC51313,Abert's Towhee,XC51313.mp3,Melozone,aberti,Melozone aberti
...,...,...,...,...,...
XC278880,Yellow-breasted Chat,XC278880.mp3,Icteria,virens,Icteria virens
XC247723,Yellow-breasted Chat,XC247723.mp3,Icteria,virens,Icteria virens
XC408122,Yellow-breasted Chat,XC408122.mp3,Icteria,virens,Icteria virens
XC315271,Yellow-breasted Chat,XC315271.mp3,Icteria,virens,Icteria virens


In [70]:
bird_index.to_csv('bird_index_clean.csv')

# The Actual Flattening:
----
This might also take a while, though doubtful to be as long as feature extraction.

In [75]:
be = BatchExtractor(frame_length=2048, n_mfcc=20, audio_folder='raw_data/', audio_index=bird_index)

In [83]:
list_of_features = list(be.extraction_dict.keys())

In [84]:
list_of_features

['mfcc',
 'melspec',
 'zcr',
 'ccqt',
 'cstft',
 'ccens',
 'rms',
 'centroid',
 'bandwidth',
 'contrast',
 'flatness',
 'rolloff',
 'tonnetz',
 'poly']

In [85]:
list_of_features.remove('poly')
list_of_features.remove('melspec')

In [86]:
list_of_features

['mfcc',
 'zcr',
 'ccqt',
 'cstft',
 'ccens',
 'rms',
 'centroid',
 'bandwidth',
 'contrast',
 'flatness',
 'rolloff',
 'tonnetz']

In [88]:
all_flat = be.merge_and_flatten_features(
    list_of_features,
    results_folder='feature_extraction/',
    label=True)

KeyboardInterrupt: 

In [None]:
all_flat.head()

In [None]:
all_flat.to_csv('flattened_raw_features.csv')