# Speech Emotion Recognition - Feature Extraction

Databases used

* The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)
* Toronto emotional speech set (TESS)

### Import Libraries

Import necessary libraries

In [3]:
import glob
import os
import librosa
import time
import numpy as np
import pandas as pd
import resampy

In [4]:
import sys
!{sys.executable} -m pip install resampy





[notice] A new release of pip is available: 25.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### Load all files

We will create our numpy array extracting Mel-frequency cepstral coefficients (MFCCs) while the classes to predict will be extracted from the name of the file.

#### Defining emotions to classify 

Selecting the emotions to be classified. Note that the emotions 'neutral', 'calm' and 'surprised' are only found in RAVDESS dataset and 'pleasantly surprised' or 'ps' is only available in TESS dataset. To combine all the emotions from both datasets into 8 emotion classes, we have changed 'pleasantly surprised' to 'surprised' and 'fearful' to 'fear'

In [5]:
emotions={
  '01':'neutral',
  '02':'calm',
  '03':'happy',
  '04':'sad',
  '05':'angry',
  '06':'fear',
  '07':'disgust',
  '08':'surprised'
}

#defined tess emotions to test on TESS dataset only
tess_emotions=['angry','disgust','fear','ps','happy','sad']

##defined RAVDESS emotions to test on RAVDESS dataset only
ravdess_emotions=['neutral','calm','angry', 'happy','disgust','sad','fear','surprised']

observed_emotions = ['sad','angry','happy','disgust','surprised','neutral','calm','fear']

#### Feature extraction

Using librosa package we can extract the MFCC features. This function loads the file give the file path and after resampling and computing MFCC features, returns the features. We have selected the no. of MFCCs as 40.

https://librosa.org/librosa/generated/librosa.feature.mfcc.html

In [6]:
def extract_feature(file_name, mfcc):
    X, sample_rate = librosa.load(os.path.join(file_name), res_type='kaiser_fast')
    result = None
    if mfcc:
        mfccs=np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
        result=np.hstack((result, mfccs))
    return result

#### Choosing a dataset

Choose the dataset(s) you want to load using the following function

In [7]:
def dataset_options():
    # choose datasets
    ravdess = True
    tess = True
    data = {'ravdess':ravdess, 'tess':tess}
    print(data)
    return data

#### Load data

Load data from the datasets required which is obtained by calling the function dataset__options(). Extract features from each file with the selected emotions in chosen datasets using the extract_feature() function defined.

In [10]:
from sklearn.model_selection import train_test_split

def load_data(test_size=0.2): 
    X, y = [], []
    mfcc = True
    data = dataset_options()
    paths = []

    if data['ravdess']:
        paths.append("../datasets/RAVDESS/Actor_*/*.wav")
    elif data['ravdess_speech']:
        paths.append("../datasets/RAVDESS/audio_speech_actors_01-24/Actor_*/*.wav")
        
    for path in paths:
        for file in glob.glob(path):
            file_name = os.path.basename(file)
            emotion = emotions.get(file_name.split("-")[2])
            if emotion not in observed_emotions:
                continue
            feature = extract_feature(file, mfcc)
            X.append(feature)
            y.append(emotion)

    if data['tess']:
        for file in glob.glob("../datasets/TESS/*AF_*/*.wav"):
            file_name = os.path.basename(file)
            emotion = file_name.split("_")[2][:-4]
            if emotion == 'ps':
                emotion = 'surprised'
            if emotion not in observed_emotions:
                continue
            feature = extract_feature(file, mfcc)
            X.append(feature)
            y.append(emotion)

    return {'X': X, 'y': y}


In [11]:
start_time = time.time()

Trial_list = load_data(test_size = 0.3)

print("--- Data loaded. Loading time: %s seconds ---" % (time.time() - start_time))

{'ravdess': True, 'tess': True}
--- Data loaded. Loading time: 108.83538341522217 seconds ---


In [12]:
X = Trial_list['X']
y = Trial_list['y']


In [None]:
# converting x and y into dataframes
X = pd.DataFrame(X)
y = pd.DataFrame(y)
# printing the shape of x and y
print(X.shape, y.shape)



(4240, 41) (4240, 1)
         0
0  neutral
1  neutral
2  neutral


In [29]:
# removing the none column of x
X = X.drop([0], axis = 1)

In [30]:
#renaming the label column to emotion
y=y.rename(columns= {0: 'emotion'})

In [31]:
#concatinating the attributes and label into a single dataframe
data = pd.concat([X, y], axis =1)

In [32]:
data.head()

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,32,33,34,35,36,37,38,39,40,emotion
0,-700.048035,58.141853,-2.558607,15.606677,4.635053,3.539141,-6.117565,-0.382448,-13.615901,-0.362572,...,-2.726629,-2.060206,-2.52632,-2.485008,-2.288239,-0.331254,-2.540937,-2.723592,-2.317618,neutral
1,-695.18512,58.720722,-4.875793,19.315145,5.611961,2.971206,-4.385363,-2.403248,-14.377567,1.257611,...,-3.063972,-1.711842,-2.929794,-2.519809,-1.328666,-0.747359,-3.644397,-2.642019,-2.881524,neutral
2,-693.690125,61.060158,-2.849076,16.58725,2.475743,3.980026,-4.803674,-2.774134,-12.816862,-1.313836,...,-2.540189,-1.947149,-2.386609,-2.251025,-2.516198,-0.548676,-3.300256,-2.928508,-2.8335,neutral
3,-687.243042,58.965412,-0.275306,16.264652,4.040917,5.848977,-4.356924,-4.302236,-12.883506,-0.87125,...,-2.549757,-2.452884,-3.237183,-2.73627,-1.983761,-0.403979,-3.016366,-2.839689,-3.957229,neutral
4,-729.579956,65.916191,-0.407426,18.537952,4.73664,5.225765,-6.456389,-0.714811,-12.648291,-2.119253,...,-1.738711,-1.325561,-3.047879,-1.114102,-1.098809,-1.092104,-2.434083,-3.135654,-3.43898,calm


## Shuffling data

In [33]:
#reindexing to shuffle the data at random
data = data.reindex(np.random.permutation(data.index))

data.head()

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,32,33,34,35,36,37,38,39,40,emotion
1592,-395.823669,20.780153,-15.159817,19.788443,-7.45394,7.962642,-23.476196,-3.406429,-18.738638,-6.491398,...,2.654102,8.136974,2.654742,2.878156,-2.402546,4.308104,5.262727,1.226714,-3.296388,angry
3286,-340.642456,29.599724,-16.993773,29.22426,-7.011916,-1.924617,-16.214605,-2.39591,-14.465276,8.265388,...,-3.249768,-7.267468,-5.96172,2.313182,2.533934,-2.365486,1.930589,5.542202,7.500679,fear
1665,-461.427429,89.804863,9.866475,3.401636,6.330132,7.988198,-17.29118,11.002507,-18.272661,8.218304,...,4.16938,2.105675,6.14001,8.093767,9.421249,9.666543,8.946437,11.192577,7.34114,disgust
263,-750.482605,82.664307,11.087944,28.12616,4.056847,8.989374,-2.771456,0.636028,-5.697156,12.327049,...,-1.10849,0.065122,0.040041,1.43629,0.06382,-0.999991,-1.370145,-2.041402,0.107543,sad
2871,-394.24942,28.459082,-6.430383,29.318998,-16.961685,5.261229,-5.731931,-10.414945,-8.974883,1.560879,...,10.103578,5.839454,8.201289,6.291386,7.951701,4.236954,1.574697,0.377521,1.418085,angry


In [34]:
# Storing shuffled ravdess and tess data to avoid loading again
data.to_csv("RAVTESS_MFCC_Observed.csv")