# Feature Engineering

In [103]:
# import packages

import pandas as pd
import numpy as np
import os

import librosa as lr

import warnings
warnings.filterwarnings("ignore")

In [736]:
# load dataframe

audio_df = pd.read_csv('/Users/spence/Documents/GitHub/Urban Sounds Classifier/UrbanSound8K/metadata/audio_features.csv')

In [737]:
audio_df

Unnamed: 0,slice_file_name,fsID,start,end,salience,fold,classID,class,num_channels,sample_rate,bit_depth
0,100032-3-0-0.wav,100032,0.000000,0.317551,1,5,3,dog_bark,2,44100,16
1,100263-2-0-117.wav,100263,58.500000,62.500000,1,5,2,children_playing,2,44100,16
2,100263-2-0-121.wav,100263,60.500000,64.500000,1,5,2,children_playing,2,44100,16
3,100263-2-0-126.wav,100263,63.000000,67.000000,1,5,2,children_playing,2,44100,16
4,100263-2-0-137.wav,100263,68.500000,72.500000,1,5,2,children_playing,2,44100,16
...,...,...,...,...,...,...,...,...,...,...,...
8719,99812-1-2-0.wav,99812,159.522205,163.522205,2,7,1,car_horn,2,44100,16
8720,99812-1-3-0.wav,99812,181.142431,183.284976,2,7,1,car_horn,2,44100,16
8721,99812-1-4-0.wav,99812,242.691902,246.197885,2,7,1,car_horn,2,44100,16
8722,99812-1-5-0.wav,99812,253.209850,255.741948,2,7,1,car_horn,2,44100,16


In [738]:
audio_df['sound_len'] = audio_df.end - audio_df.start 

## Feature Extraction

Now we will extract some features from the audio files. There are several audio characteristics that can be derived from our samples...

1. Mel Frequency Cepstral Cooeficient - This is by far the most popular feature extraction method for audio files. It makes use of the Mel Scale, which is a perceptual scale of pitches judged by listeners to be equally spaced apart. There is a formula for converting *hertz* to *mels* that makes use of a logarithmic function.  

MFCC's are dervied as follows: [Read more about MFCC's here](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum)
1. Take the Fourier transform of (a windowed excerpt of) a signal.
2. Map the powers of the spectrum obtained above onto the mel scale, using triangular overlapping windows or alternatively, cosine overlapping windows.
3. Take the logs of the powers at each of the mel frequencies.
4. Take the discrete cosine transform of the list of mel log powers, as if it were a signal.
5. The MFCCs are the amplitudes of the resulting spectrum.

Essentially, the MFCC function creates a frequency/amplitude-based image of the audio file. We will then feed these images into a Convolutional Neural Network and classify them, just like we would with regular image classification.

In [428]:
# extract the Mel Frequency Cepstral Cooeficient for each sound 
mfccs = []

# iterate through dataframe
for index, row in audio_df.iterrows():
    #access file path for each audio file
    f_name= '/Users/spence/Documents/GitHub/Urban Sounds Classifier/UrbanSound8K/audio/fold' + str(row["fold"]) + '/' + str(row["slice_file_name"])
    # use librosa to load the file
    audio_ts, sr = lr.load(f_name, res_type='kaiser_fast')
    # use librosa function to extract mfcc's
    mel_freq = lr.feature.mfcc(y=audio_ts, sr=sr,  n_mfcc=40)
    # set a padding around edges of sample, we use 174 because that correlates to the longest sample we have 
    padding = 174 - mel_freq.shape[1]
    # apply padding 
    mfcc = np.pad(mel_freq, pad_width=((0, 0), (0, padding)), mode='constant')
    # mel_scaled = np.mean(mel_freq.T, axis=0)
    # append each mfcc to list 
    mfccs.append(mfcc)


In [472]:
# check the length of each item in nested list 

len(mfccs) , len(mfccs[0]), len(mfccs[0][0])

(8724, 40, 174)

Understanding the dimensions of the list(mfccs) we created above:

1. We have a list with 8,724 items; one item for each audio file. 8,724 is also the amount of rows in the dataframe.
2. Within each of the aforementioned items we have an array that contains 40 lists. Each of those lists is an mfcc for that audio file(item). In the for-loop above we set n_mfcc to 40. The standard amount of mfccs to generate is 20-40. 
3. Each of those 40 mfcc's contains 174 integers. These are the numbers that essentially characterize the sound. 

In [739]:
# add extracted the mfcc's to dataframe as column 

audio_df['mfccs'] = mfccs 

In [475]:
# save dataframe 

audio_df.to_pickle('audio_df.pkl')

--------------

2. Chromagrams - The chroma feature of an audio file is known as a 'pitch class profile'. The values dervied from this extraction are typically used for music classification, due to the emphasis on classifying tone or pitch. I am interested in applying this method to urban sound because chroma features can supposedly capture changes in timbre. Timbre is the sound quality of a tone and might be useful in distinguishing between similar but distinct sounds, like two droning sounds that are generated by different sources; ex: an engine and an air conditioner. 

[More on Chromagrams](https://en.wikipedia.org/wiki/Chroma_feature#:~:text=Shifting%20the%20time%20window%20across,also%20referred%20to%20as%20chromagram.)


In [433]:
# extract the Chromagram for each sound 
chrome = []

for index, row in audio_df.iterrows():
    f_name= '/Users/spence/Documents/GitHub/Urban Sounds Classifier/UrbanSound8K/audio/fold' + str(row["fold"]) + '/' + str(row["slice_file_name"])
    audio_ts, sr = lr.load(f_name, res_type='kaiser_fast')
    s = np.abs(lr.stft(audio_ts))
    c = np.mean(lr.feature.chroma_stft(S=s, sr=sr).T, axis=0)
    chrome.append(c)

In [733]:
# observe dimensions of our newly create 'chrome' list 

len(chrome) , len(chrome[0])

(8724, 12)

In [478]:
# add zeros, as padding to match the chroma array length with mfcc

needed_pads = len(mfccs[0][0]) - len(chrome[0])

pads = [0 for i in range(0,needed_pads)]

chroma = []
for l in chrome:
    l = np.append(l, pads)
    chroma.append(l)


# observe new length 
len(chroma), len(chroma[0])

(8724, 174)

In [741]:
# append chroma to df 

audio_df['chroma'] = chroma 

In [742]:
# reshape chroma column to make it compatible with mfccs column; taking each value from a 1-D to a 2-D array 

audio_df['chroma'] = audio_df['chroma'].apply(lambda x: np.reshape(x, (1, 174)))

In [744]:
# concatenate the 'mfccs' and 'chroma' columns into one column 

audio_df['features'] = audio_df.apply( lambda row: np.concatenate((row['mfccs'], row['chroma']), axis=0) , axis=1 )

--------------

3. Zero Crossing Rate - This is the meaure of times that the amplitude of a signal crosses through the zero value in a given time interval. It is very useful in indentifying percussive sounds because it's sensitive to transients, and the ADSR characteristics of sound; that is quick sudden strikes vs. slow rising sound or sound that slowly tails off vs. sound that ends abruptly. I figured it could be useful here because there are diverse characteristics of sound with regard to ADSR in the data set.

[More on Zero Crossing Rate](https://www.sciencedirect.com/topics/engineering/zero-crossing-rate)


In [692]:
# extract the zero crossing rate for each sound 

zero_cr = []

for index, row in audio_df.iterrows():
    f_name= '/Users/spence/Documents/GitHub/Urban Sounds Classifier/UrbanSound8K/audio/fold' + str(row["fold"]) + '/' + str(row["slice_file_name"])
    audio_ts, sr = lr.load(f_name, res_type='kaiser_fast')
    z = lr.feature.zero_crossing_rate(y=audio_ts)
    zero_cr.append(z)

In [746]:
# check array lengths 

len(zero_cr), len(zero_cr[0]), len(zero_cr[0][0])

(8724, 1, 14)

In [727]:
# add zeros, as padding to match the array length with mfcc

zero = []
count = 0

for l in zero_cr:
    needed_pads = len(mfccs[0][0]) - len(zero_cr[count][0])
    pads = [0 for i in range(0,needed_pads)]
    m = np.append(l, pads)
    zero.append(m)
    count += 1


In [747]:
# observe new length 

len(zero), len(zero[0])

(8724, 174)

In [748]:
# create column in df

audio_df['zero_cr'] = zero 

In [749]:
# reshape column so it's compatible with 'features' column 

audio_df['zero_cr'] = audio_df['zero_cr'].apply(lambda x: np.reshape(x, (1, 174)))

In [750]:
# join our 'zero_cr' feature with the others into single column 

audio_df['features'] = audio_df.apply( lambda row: np.concatenate((row['features'], row['zero_cr']), axis=0) , axis=1 )

In [753]:
# save df 

audio_df.to_pickle('audio_df_xtra.pkl')

In [None]:
# audio_df.drop(['chrome'], axis=1, inplace=True)