Speech Emotion Recognition with librosa

#First we need to install the required liberaries for the Project


In [49]:
pip install librosa

Note: you may need to restart the kernel to use updated packages.


In [50]:
pip install soundfile

Note: you may need to restart the kernel to use updated packages.


#What is Librosa?

-> librosa is a Python library for analyzing audio and music. It has a flatter package layout, standardizes interfaces and names, backwards compatibility, modular functions, and readable code. Further, in this Python mini-project, we demonstrate how to install it (and a few other packages) with pip.

In [59]:
#Import the Required Liberaries

In [51]:
import os
import glob
import soundfile
import numpy as np
import librosa
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

In [None]:
# DataFlair - Extract features (mfcc, chroma, mel) from a sound file


In [60]:
def extract_feature(file_name, mfcc, chroma, mel):
    with soundfile.SoundFile(file_name) as sound_file:
        X = sound_file.read(dtype="float32")
        sample_rate = sound_file.samplerate
        if chroma:
            stft = np.abs(librosa.stft(X))
        result = np.array([])
        if mfcc:
            mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
            result = np.hstack((result, mfccs))
        if chroma:
            chroma = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T, axis=0)
            result = np.hstack((result, chroma))
        if mel:
            mel = np.mean(librosa.feature.melspectrogram(y=X, sr=sample_rate).T, axis=0)
            result = np.hstack((result, mel))
        return result


The extract_feature function is designed to extract audio features from a sound file. It takes the file name of the sound file as input along with three boolean parameters (mfcc, chroma, mel) that determine which features to extract.

1-MFCC (Mel-frequency cepstral coefficients): MFCCs are commonly used in speech and audio processing to represent the short-term power spectrum of a sound. They capture the timbral and spectral characteristics of the audio.

2-Chroma Features: Chroma features represent the energy distribution of pitch classes (i.e., notes) in the audio. They are useful for tasks related to harmony and chord recognition.

3-Mel Spectrogram: Mel spectrogram represents the power spectrum of the audio signal on the Mel frequency scale. It provides a compact representation of the spectral content of the audio.

Inside the function:

It reads the sound file using the soundfile library.
It computes the sample rate of the audio file.
If specified (chroma=True), it calculates the Short-Time Fourier Transform (STFT) of the audio.
It initializes an empty array (result) to store the extracted features.
If specified (mfcc=True), it computes the MFCCs using the librosa library and appends them to the result array.
If specified (chroma=True), it computes chroma features using the STFT and appends them to the result array.
If specified (mel=True), it computes the mel spectrogram features using the librosa library and appends them to the result array.
Finally, it returns the concatenated array of extracted features.

By calling this function and passing the file name along with the desired feature parameters, you can obtain a feature representation of the audio file suitable for various machine learning tasks such as speech emotion recognition, audio classification, and more.

In [64]:
# DataFlair - Emotions in the RAVDESS dataset
emotions = {
    '01': 'neutral',
    '02': 'calm',
    '03': 'happy',
    '04': 'sad',
    '05': 'angry',
    '06': 'fearful',
    '07': 'disgust',
    '08': 'surprised'
}
# DataFlair - Emotions to observe
observed_emotions = ['calm', 'happy', 'fearful', 'disgust']

Above we define a dictionary to hold numbers and the emotions available in the RAVDESS dataset, and a list to hold those we want- calm, happy, fearful, disgust.

In [63]:
# DataFlair - Load the data and extract features for each sound file


 Now, let’s load the data with a function load_data() – this takes in the relative size of the test set as parameter. x and y are empty lists; we’ll use the glob() function from the glob module to get all the pathnames for the sound files in our dataset. The pattern we use for this is: “C:\\Users\\AKHILESH\\Downloads\\ravdess data\\Actor_*\\*.wav”.

for each such path, get the basename of the file, the emotion by splitting the name around ‘-’ and extracting the third value:

Using our emotions dictionary, this number is turned into an emotion, and our function checks whether this emotion is in our list of observed_emotions; if not, it continues to the next file. It makes a call to extract_feature and stores what is returned in ‘feature’. Then, it appends the feature to x and the emotion to y. So, the list x holds the features and y holds the emotions. We call the function train_test_split with these, the test size, and a random state value, and return that.

In [None]:
def load_data(test_size=0.2):
    x, y = [], []
    for file in glob.glob("C:\\Users\\AKHILESH\\Downloads\\speech-emotion-recognition-ravdess-data\\Actor_*\\*.wav"):
        file_name = os.path.basename(file)
        emotion = emotions[file_name.split("-")[2]]
        if emotion not in observed_emotions:
            continue
        feature = extract_feature(file, mfcc=True, chroma=True, mel=True)
        x.append(feature)
        y.append(emotion)
    return train_test_split(np.array(x), y, test_size=test_size, random_state=9)

In [55]:
# DataFlair - Split the dataset
x_train, x_test, y_train, y_test = load_data(test_size=0.25)

In [56]:
# DataFlair - Initialize the Multi Layer Perceptron Classifier
model = MLPClassifier(alpha=0.01, batch_size=256, epsilon=1e-08, hidden_layer_sizes=(300,),
                      learning_rate='adaptive', max_iter=500)


In [57]:
# DataFlair - Train the model
model.fit(x_train, y_train)

# DataFlair - Predict for the test set
y_pred = model.predict(x_test)


In [58]:
# DataFlair - Calculate the accuracy of our model
accuracy = accuracy_score(y_true=y_test, y_pred=y_pred)

# DataFlair - Print the accuracy
print("Accuracy: {:.2f}%".format(accuracy * 100))


Accuracy: 71.35%
