This notebook is the exploratory phase for extracting the MFCC features.

Here I explore the manipulation of samples per window, samples per overlap, and mfcc generated.

Each variation in parameters will be saved according to the parameter that is changed.

The parameters are as follows:

n_mfcc=number of mfcc coefficicents generated

n_fft=number of samples/frames per window

hop_length=number of samples/frames of overlap between successive windows

The y_labels will not be saved more than once because it doesn't change.

In [1]:
#necessary libraries
import pandas as pd
import numpy as np
import librosa
import os

from IPython.display import Audio as au

### import the csv metadeta

In [2]:
#read the text file with speaker name and ID
df=pd.read_csv('Data/speakers.csv')

In [3]:
#the only critical feature is ID because it will be used in the supervised classification algorithm as the label
#the NAME is useful if interested in who specifically caused the greatest classification errors
df.head(3)

Unnamed: 0,ID,SEX,SUBSET,MINUTES,NAME
0,14,| F |,train-clean-360,| 25.03 |,Kristin LeMoine
1,16,| F |,train-clean-360,| 25.11 |,Alys AtteWater
2,17,| M |,train-clean-360,| 25.04 |,Gord Mackenzie


### sample load

In [4]:
#load a speech file to sample
speech, sr=librosa.load('Data/train-clean-100/8838/298545/8838-298545-0043.flac')

In [5]:
speech.shape

(343098,)

In [6]:
sr

22050

In [7]:
#test the speech
au('Data/train-clean-100/8838/298545/8838-298545-0044.flac')

### create a loop for the structure of the file location
### compute the MFCCs for each file

In [8]:
#shortcut to librosa packages
load=librosa.core.load
mfcc=librosa.feature.mfcc

#choose how long each speech file will be (pads with zeros if too short-which they all are)
#change the integer value if want the length to be shorter
sr=12000
dur=25
n_sample_fit=int(sr*dur)

#load speech files to extract mel coefficients & speaker ID
mels=[] #will hold feature data
labs=[] #will hold label data
#subfolder dive routine
path=os.path.abspath(os.path.join("./Data/train-other-500/"))
for subfolder in os.listdir(path):
    #print('subfolder:',subfolder)
    for subfolder2 in os.listdir(os.path.join(path,subfolder)):
        #print('subfolder2:',subfolder2)
        for filename in os.listdir(os.path.join(path,subfolder,subfolder2)):
            #print('filename:',filename)
            if filename.endswith('.flac'):
                #load the speech file
                speech, sr=load(os.path.join(path,subfolder,subfolder2,filename), res_type='kaiser_best', sr=sr)
                #because the number of frames are shorter than 25 seconds x SR
                n_sample=speech.shape[0]
                speech=np.hstack((speech, np.zeros((n_sample_fit - n_sample,))))
                #extract mfcc features from data
                x=mfcc(y=speech, n_mfcc=32, sr=12000, n_fft=2048, hop_length=512)
                mels.append(x)
                y=int(subfolder)
                labs.append(y)
                #print('Success! Filename=%s' % filename)
            else:
                continue

EOFError: 

In [9]:
len(mels)

18580

In [10]:
len(labs)

18580

In [11]:
#check the number of frames per melgram is constant across all audio files
count=0
for i in range(len(mels)):
    if mels[i].shape[1]==586:
        count=count+1
print(count)

18580


In [12]:
#convert the features to numpy array for saving
mfccs=np.array(mels)

### New Baseline Model
#### n_mfcc=4,8,32
#### n_fft=2048
#### hop_length=512

#new baseline - clean data
np.save('baseline4_x.npy', mfccs)
np.save('baseline4_y.npy', labs)

#new baseline - other data
np.save('baseline_other32_x.npy', mfccs)
np.save('baseline_other32_y.npy', labs)