speech_emotion

Detect human emotion from audio.

Refer to some code in Speech_emotion_recognition_BLSTM, thanks a lot.

Get started

environment: Python 3

main dependencies

tensorflow- pip install tensorflow
keras- pip install keras: build and train the Bi-LSTM model
librosa- pip install librosa: audio resampling

dataset

Berlin Database of Emotional Speech, you can download and unzip it in data/ folder.

How use it

python train.py

Train the model. You can skip this because the trained model named "weights_blstm_hyperas_1.h5" has been uploaded. If you want to retrain the model, you will need to extract features from berlin dataset when you first run it. For saving time, the audio feature file named "berlin_db.p" and "berlin_features.p" has uploaded.
python predict.py

Predict emotion from audio. You should specify the file path of audio to be predicted. For good performance, the audio should be less than 5 second. You will get the result such as

"the top 2 emotion is: ('happiness', 0.20501734)\ the top 2 emotion is: ('neutral', 0.29067296)"

File structure

utility folder
- audioFeatureExtraction.py: extract feature from audio. Modify from pyAudioAnalysis
- functions.py: some utility about audio.
- globalvars.py: global variable.
berlin_db.p & berlin_features.p: berlin feature file.
dataset.py: the utility of dataset.
predict.py: predict emotion from audio file.
train.py: train the model by keras.
weights_blstm_hyperas_1.h5: trained model.

More details

Using attention mechanism and a Bi-LSTM. A "weighted pool" is constructed to process frames that are unrelated to emotion.

The silent frame is assigned a small weight. The pooling operation effectively filtering them out. Similarly, according to human emotions, non-slient frames have different weights. The attention model focuses not only on speech energy, but also on the emotional content. Attention mechanism is achieved by logistic regression(softmax).

The correct rate on the verification set is 60.87%.

Reference

S. Mirsamadi, E. Barsoum, and C. Zhang, “Automatic speech emotion recognition using recurrent neural networks with local attention,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, U.S.A., Mar. 2017, IEEE, pp. 2227–2231.

Connect

cnmengnan@gmail.com

blog: WinterColor blog

enjoy it

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
speech_emotion_transformer		speech_emotion_transformer
utility		utility
.gitignore		.gitignore
README.md		README.md
attention.py		attention.py
dataset.py		dataset.py
predict.py		predict.py
speech_emotion_transformer.zip		speech_emotion_transformer.zip
train.py		train.py
weights_blstm_hyperas_1.h5		weights_blstm_hyperas_1.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

speech_emotion

Get started

main dependencies

dataset

How use it

File structure

More details

Reference

Connect

About

Releases

Packages

Languages

cnlinxi/speech_emotion

Folders and files

Latest commit

History

Repository files navigation

speech_emotion

Get started

main dependencies

dataset

How use it

File structure

More details

Reference

Connect

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages