#Speech Emotion Recognition with MLP Classifier



#Dataset
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) 

---
Audio-only files

Audio-only files of all actors (01-24) are available as two separate zip files (~200 MB each):

Speech file (Audio_Speech_Actors_01-24.zip, 215 MB) contains 1440 files: 60 trials per actor x 24 actors = 1440. 
Song file (Audio_Song_Actors_01-24.zip, 198 MB) contains 1012 files: 44 trials per actor x 23 actors = 1012.

Total=2452

---

---
Toronto emotional speech set (TESS)

---


There are a set of 200 target words were spoken in the carrier phrase "Say the word _' by two actresses (aged 26 and 64 years) and recordings were made of the set portraying each of seven emotions (anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral). There are 2800 data points (audio files) in total.

The dataset is organised such that each of the two female actor and their emotions are contain within its own folder. And within that, all 200 target words audio file can be found. The format of the audio file is a WAV format


---



# Mount google drive



In [1]:
from google.colab import drive
drive.mount('/content/drive/')

ModuleNotFoundError: No module named 'google'

# Install following libraries

In [1]:
!pip install librosa soundfile numpy sklearn pyaudio



In [2]:
!pip install soundfile



# Make the necessary imports

In [3]:
import librosa
import soundfile
import os, glob, pickle
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

Define a function extract_feature to extract the mfcc, chroma, and mel features from a sound file. This function takes 4 parameters- the file name and three Boolean parameters for the three features:

* mfcc: Mel Frequency Cepstral Coefficient, represents the short-term power spectrum of a sound
* chroma: Pertains to the 12 different pitch classes
* mel: Mel Spectrogram Frequency

In [4]:
def extract_feature(file_name, mfcc, chroma, mel):
    X, sample_rate = librosa.load(os.path.join(file_name), res_type='kaiser_fast')
    if chroma:
        stft=np.abs(librosa.stft(X))
    result=np.array([])
    if mfcc:
        mfccs=np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
        result=np.hstack((result, mfccs))
    if chroma:
        chroma=np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T,axis=0)
        result=np.hstack((result, chroma))
    if mel:
        mel=np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T,axis=0)
        result=np.hstack((result, mel))
    return result

Now, let’s define a dictionary to hold numbers and the emotions available in the RAVDESS & TESS dataset, and a list to hold all 8 emotions- neutral,calm,happy,sad,angry,fearful,disgust,surprised.

In [5]:
# Emotions in the RAVDESS & TESS dataset
emotions={
  '01':'neutral',
  '02':'calm',
  '03':'happy',
  '04':'sad',
  '05':'angry',
  '06':'fearful',
  '07':'disgust',
  '08':'surprised'
}
# Emotions to observe
observed_emotions=['neutral','calm','happy','sad','angry','fearful', 'disgust','surprised']

# Load the data and extract features for each sound file

In [11]:
def load_data(test_size=0.2):
    x,y=[],[]
    for file in glob.glob('/features/Actor_*/*.wav'):
        file_name=os.path.basename(file)
        emotion=emotions[file_name.split("-")[2]]
        if emotion not in observed_emotions:
            continue
        feature=extract_feature(file, mfcc=True, chroma=True, mel=True)
        x.append(feature)
        y.append(emotion)
    return train_test_split(np.array(x), y, test_size=test_size, train_size= 0.75,random_state=9)

In [19]:
import os
import glob

def load_data(test_size=0.2):
    x, y = [], []
    base_directory = 'C:\\Users\\ananya\\Speech-Emotion-Recognition-using-ML-and-DL\\features'
    
    # Define a list of emotions you want to consider
    observed_emotions = ['emotion1', 'emotion2', ...]  # Modify with your desired emotions

    for actor_folder in glob.glob(os.path.join(base_directory, 'Actor_*')):
        for file in glob.glob(os.path.join(actor_folder, '*.wav')):
            file_name = os.path.basename(file)
            emotion = emotions[file_name.split("-")[2]]

            # Check if the emotion is in the list of observed emotions
            if emotion not in observed_emotions:
                continue

            feature = extract_feature(file, mfcc=True, chroma=True, mel=True)
            x.append(feature)
            y.append(emotion)

    return train_test_split(np.array(x), y, test_size=test_size, train_size=0.75, random_state=9)

In [21]:
import os
import glob

def load_data(test_size=0.2):
    x, y = [], []
    base_directory = r'C:\Users\ananya\Speech-Emotion-Recognition-using-ML-and-DL\features'
    
    # Define a list of emotions you want to consider
    observed_emotions = ['emotion1', 'emotion2', ...]  # Modify with your desired emotions

    for actor_folder in glob.glob(os.path.join(base_directory, 'Actor_*')):
        for file in glob.glob(os.path.join(actor_folder, '*.wav')):
            file_name = os.path.basename(file)
            emotion = emotions[file_name.split("-")[2]]

            # Check if the emotion is in the list of observed emotions
            if emotion not in observed_emotions:
                continue

            feature = extract_feature(file, mfcc=True, chroma=True, mel=True)
            x.append(feature)
            y.append(emotion)

    return train_test_split(np.array(x), y, test_size=test_size, train_size=0.75, random_state=9)


In [6]:
import os
import glob

def load_data(test_size=0.2):
    x, y = [], []  # Initialize empty lists to store features (x) and labels (y).

    base_directory = 'C:\\Users\\ananya\\Speech-Emotion-Recognition-using-ML-and-DL\\TESS_Toronto_emotional_speech_set_data'  # Update with your directory path

    # Iterate through all folders in the base directory.
    for folder in glob.glob(os.path.join(base_directory, '*')):
        if not os.path.isdir(folder):
            continue  # Skip non-directory entries

        # Extract emotion from the folder name (e.g., 'OAF_angry' -> 'angry').
        emotion = os.path.basename(folder).split('_')[1]

        # Check if the extracted emotion is in the list of observed emotions.
        if emotion not in observed_emotions:
            continue

        # Iterate through audio files within the folder.
        for file in glob.glob(os.path.join(folder, '*.wav')):
            # Extract audio features (e.g., MFCC, chroma, mel) from the audio file.
            feature = extract_feature(file, mfcc=True, chroma=True, mel=True)

            # Append the extracted feature and its corresponding emotion label to the lists.
            x.append(feature)
            y.append(emotion)

    # Split the data into training and testing sets using train_test_split.
    return train_test_split(np.array(x), y, test_size=test_size, train_size=0.75, random_state=9)


# Split the Dataset
Time to split the dataset into training and testing sets! Let’s keep the test set 25% of everything and use the load_data function for this.

In [12]:
# Split the dataset
import time
x_train,x_test,y_train,y_test=load_data(test_size=0.25)

In [11]:
import os
import glob

def load_data(test_size=0.2):
    x, y = [], []  # Initialize empty lists to store features (x) and labels (y).

    base_directory = 'C:\\Users\\ananya\\Speech-Emotion-Recognition-using-ML-and-DL\\TESS_Toronto_emotional_speech_set_data'  # Update with your directory path

    # Iterate through all folders in the base directory.
    for folder in glob.glob(os.path.join(base_directory, '*')):
        if not os.path.isdir(folder):
            continue  # Skip non-directory entries

        # Extract emotion from the folder name (e.g., 'OAF_angry' -> 'angry').
        folder_name = os.path.basename(folder)

        # Split the folder name by underscores and check if it contains at least two elements.
        folder_parts = folder_name.split('_')
        if len(folder_parts) >= 2:
            emotion = folder_parts[1]
        else:
            # Handle the case where the folder name doesn't contain an underscore or has insufficient parts.
            # For example, you can set a default emotion or take other appropriate action.
            emotion = "unknown"  # Set to "unknown" if no emotion is found.

        # Check if the extracted emotion is in the list of observed emotions.
        if emotion not in observed_emotions:
            continue

        # Iterate through audio files within the folder.
        for file in glob.glob(os.path.join(folder, '*.wav')):
            # Extract audio features (e.g., MFCC, chroma, mel) from the audio file.
            feature = extract_feature(file, mfcc=True, chroma=True, mel=True)

            # Append the extracted feature and its corresponding emotion label to the lists.
            x.append(feature)
            y.append(emotion)

    # Split the data into training and testing sets using train_test_split.
    return train_test_split(np.array(x), y, test_size=test_size, train_size=0.75, random_state=9)


In [9]:
def extract_feature(file_name, mfcc, chroma, mel):
    # Load the audio file
    X, sample_rate = librosa.load(file_name, res_type='kaiser_fast')

    # Initialize an empty feature array
    result = np.array([])

    # Extract features as specified
    if mfcc:
        mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=13).T, axis=0)
        result = np.hstack((result, mfccs))
    if chroma:
        chroma = np.mean(librosa.feature.chroma_stft(y=X, sr=sample_rate).T, axis=0)
        result = np.hstack((result, chroma))
    if mel:
        mel = np.mean(librosa.feature.melspectrogram(y=X, sr=sample_rate).T, axis=0)
        result = np.hstack((result, mel))

    return result


(3939, 1313)


# Number of features extracted.

#Observe the shape of the training and testing datasets:

In [7]:
pip install resampy

Note: you may need to restart the kernel to use updated packages.


In [13]:
# Get the number of features extracted
print(f'Features extracted: {x_train.shape[1]}')

Features extracted: 153


# MLP Classifier

In [14]:
# Initialize the Multi Layer Perceptron Classifier
model=MLPClassifier(alpha=0.01, batch_size=256, epsilon=1e-08, hidden_layer_sizes=(300,), learning_rate='adaptive', max_iter=500)

#Fit/train the model.

In [15]:
# Train the model
model.fit(x_train,y_train)

# Predict the accuracy of our model

Let’s predict the values for the test set. This gives us y_pred (the predicted emotions for the features in the test set).

In [16]:
# Predict for the test set
y_pred=model.predict(x_test)

To calculate the accuracy of our model, we’ll call up the accuracy_score() function we imported from sklearn. Finally, we’ll round the accuracy to 2 decimal places and print it out

In [17]:
# Calculate the accuracy of our model
accuracy=accuracy_score(y_true=y_test, y_pred=y_pred)
# Print the accuracy
print("Accuracy: {:.2f}%".format(accuracy*100))

Accuracy: 99.56%


#classification Report

In [18]:
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))


              precision    recall  f1-score   support

       angry       0.99      1.00      0.99        92
     disgust       1.00      1.00      1.00       107
       happy       1.00      0.99      0.99        91
     neutral       0.99      1.00      1.00       114
         sad       1.00      0.98      0.99        46

    accuracy                           1.00       450
   macro avg       1.00      0.99      0.99       450
weighted avg       1.00      1.00      1.00       450



# Confusion Matrix

In [19]:
from sklearn.metrics import confusion_matrix
matrix = confusion_matrix(y_test,y_pred)
print (matrix)

[[ 92   0   0   0   0]
 [  0 107   0   0   0]
 [  1   0  90   0   0]
 [  0   0   0 114   0]
 [  0   0   0   1  45]]


In [22]:
# Extract audio features from the test audio file
test_feature = extract_feature('C:\\Users\\ananya\\Speech-Emotion-Recognition-using-ML-and-DL\\features\\Actor_05\\03-01-02-01-01-01-05.wav', mfcc=True, chroma=True, mel=True)

In [23]:
from sklearn.preprocessing import StandardScaler

# Load the same scaler used during training
scaler = StandardScaler()
scaler.fit(train_features)  # Assuming you have the training features
test_feature_normalized = scaler.transform(test_feature.reshape(1, -1))


NameError: name 'train_features' is not defined

#Thank You