Python Project on Speech Emotion Recognition.

The main objective of this project is to build a model to recognize emotion from speech using the librosa and sklearn libraries and the RAVDESS dataset.

Speech Emotion Recognition is a process in order to recognize human emotion and affective states from speech.Voice often reflects underlying emotion through tone and pitch. Librosa, a python library used for analyzing audio and music, is used for this recognition. It has a flatter package layout, standardizes interfaces and names, backwards compatibility, modular functions, and readable code. Soundfile is used to read and write sound files. It is an audio library based on libsndfile, CFFI and NumPy.

In order to install librosa and soundfile, following are the commands with anaconda prompt:-

Librosa:-

 conda install -c conda-forge librosa
 conda install -c conda-forge/label/gcc7 librosa
 conda install -c conda-forge/label/cf201901 librosa

Soundfile:-

 conda install -c conda-forge pysoundfile
 conda install -c conda-forge/label/gcc7 pysoundfile
 conda install -c conda-forge/label/cf201901 pysoundfile

We will load the data, extract features from it, then split the dataset into training and testing sets. Then, we’ll initialize an MLPClassifier and train the model. MLPClassifier implements a multi-layer perceptron (MLP) algorithm that trains using Backpropagation.A Multi-layer Perceptron (MLP) is a supervised learning algorithm that learns a function f(.):R^m-->R^o by training on a dataset, where m is the number of dimensions for input and o is the number of dimensions for output. Given a set of features X and a target y, it can learn a non-linear function approximator for either classification or regression. It is different from logistic regression, in that between the input and the output layer, there can be one or more non-linear layers, called hidden layers. Figure shows a one hidden layer MLP with scalar output.

The leftmost layer, known as the input layer, consists of a set of neurons representing the input features. Each neuron in the hidden layer transforms the values from the previous layer with a weighted linear summation (w*x), followed by a non-linear activation function g(.):R-->R - like the hyperbolic tan function. The output layer receives the values from the last hidden layer and transforms them into output values.

The module contains the public attributes coefs_ and intercepts_. coefs_ is a list of weight matrices, where weight matrix at index i represents the weights between layer i and (i+1) layer . intercepts_ is a list of bias vectors, where the vector at index i represents the bias values added to layer (i+1).

Ravdess Dataset is the Ryerson Audio-Visual Database of Emotional Speech. It consists of data related to 24 actors each having 60 individual different kind of speech.This dataset looks like this:-

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Project1.ipynb		Project1.ipynb
RAVDESS dataset		RAVDESS dataset
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project1.ipynb

Project1.ipynb

RAVDESS dataset

RAVDESS dataset

README.md

README.md

Repository files navigation

Python Project on Speech Emotion Recognition.

About

Releases

Packages

Languages

Vishal1999-33/Speech-Emotion-Recognizer

Folders and files

Latest commit

History

Repository files navigation

Python Project on Speech Emotion Recognition.

About

Topics

Resources

Stars

Watchers

Forks

Languages