<a href="https://colab.research.google.com/github/SaifAlmaliki/Speech-Emotion-Recognition/blob/main/Speech_Emotion_Recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

source: https://data-flair.training/blogs/python-mini-project-speech-emotion-recognition/

In [None]:
!pip install librosa 
!pip install soundfile 

In [59]:
import librosa
import soundfile
import numpy as np
import os, glob, pickle
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

**Extract features (mfcc, chroma, mel) from a sound file**

In [60]:
def extract_feature(file_name, mfcc, chroma, mel):
    with soundfile.SoundFile(file_name) as mySoundFile:
        X = mySoundFile.read(dtype="float32")
        sample_rate = mySoundFile.samplerate
        
        if chroma:    
            stft = np.abs(librosa.stft(X))
        result = np.array([])

        if mfcc:
            my_mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T, axis=0)
            result = np.hstack((result, my_mfccs))

        if chroma:
            chroma = np.mean(librosa.feature.chroma_stft(S=stft, sr=sample_rate).T, axis = 0)
            result = np.hstack((result, chroma))
        
        if mel:
            mel=np.mean(librosa.feature.melspectrogram(X, sr=sample_rate).T, axis=0)
            result = np.hstack((result, mel))

    return result

**Define a dictionary to hold numbers and the emotions available in the RAVDESS dataset, and a list to hold those we want- calm, happy, fearful, disgust.**

In [61]:
# Emotions in the RAVDESS dataset
emotions={
  '01':'neutral',
  '02':'calm',
  '03':'happy',
  '04':'sad',
  '05':'angry',
  '06':'fearful',
  '07':'disgust',
  '08':'surprised'
}

# Emotions to observe
observed_emotions=['calm', 'happy', 'fearful', 'disgust']

Get the basename of the file, the emotion by splitting the name around ‘-’ and extracting the **third value**

Using our emotions dictionary, **this number is turned into an emotion**, and our function checks whether this emotion is in our list of observed_emotions; if not, it continues to the next file.

It makes a call to extract_feature() and stores what is returned in ‘feature’. Then, it appends the **feature to x** and the **emotion to y**. 

So, the list x holds the features and y holds the emotions. We call the function train_test_split with these, the test size, and a random state value, and return that.

In [62]:
# Load the data and extract features for each sound file
def load_data(test_size = 0.2):
    x, y = [], []
    for file in glob.glob("/content/drive/My Drive/Colab Notebooks/Speech Emotion Recognition/ravdess-data/Actor_*/*.wav"):
        file_name = os.path.basename(file)
        emotion = emotions[file_name.split("-")[2]] # extracting the third value which is the motion number
        if emotion not in observed_emotions:
            continue
        feature = extract_feature(file, mfcc=True, chroma=True, mel=True)
        x.append(feature)
        y.append(emotion)

    split = train_test_split(np.array(x), y, test_size = test_size, random_state = 9) 
    return split

! Let’s keep the test set 25% of everything and use the load_data() function for this.



In [63]:
# Split the dataset
x_train, x_test, y_train, y_test = load_data(test_size=0.25)

**Observe the shape of the training and testing datasets**



In [64]:
print("Train data: ", x_train.shape[0], "\nTest data: ", x_test.shape[0])

Train data:  576 
Test data:  192


**Get the number of features(Emotions) extracted**

In [65]:
print(f'Features(Emotions) extracted: {x_train.shape[1]} emotions')

Features(Emotions) extracted: 180 emotions


**Now, let’s initialize an MLPClassifier**

It optimizes the log-loss function using LBFGS or stochastic gradient descent.

Unlike SVM or Naive Bayes, the MLPClassifier has an internal neural network for the purpose of classification.

In [66]:
# Initialize the Multi Layer Perceptron Classifier
model = MLPClassifier(alpha=0.01, 
                      batch_size=128,
                      hidden_layer_sizes = (100,200,), 
                      learning_rate='adaptive', 
                      max_iter=500)

In [67]:
# Train the model
model.fit(x_train, y_train)

MLPClassifier(activation='relu', alpha=0.01, batch_size=128, beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(100, 200), learning_rate='adaptive',
              learning_rate_init=0.001, max_fun=15000, max_iter=500,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=None, shuffle=True, solver='adam',
              tol=0.0001, validation_fraction=0.1, verbose=False,
              warm_start=False)

**Predict the values for the test set.**

This gives us **y_pred** (the predicted emotions for the features in the test set).

In [68]:
y_pred= model.predict(x_test)

**Calculate The Accuracy**

we’ll round the accuracy to 2 decimal places.

In [69]:
accuracy = accuracy_score(y_true=y_test, y_pred=y_pred)
print("Accuracy: {:.2f}%".format(accuracy*100))

Accuracy: 63.54%
