Speech Emotion Recognization

Hi, here....

  ____    _____   ____  
 / ___|  | ____|  |  _ \ 
 \___ \  |  _|    | |_) |
  ___) | | |___   |  _ < 
 |____/  |_____|  |_| \_\

🔍Dataset

Download dataset from kaggle Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS).

Notice that wav files of dataset/archive is incomplete, just a part of RAVDESS. If you download the datasets, replace of all files of dataset/archive.

1.1 Files

This portion of the RAVDESS contains 1440 files: 60 trials per actor x 24 actors = 1440. The RAVDESS contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech emotions includes calm, happy, sad, angry, fearful, surprise, and disgust expressions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression.

1.2 File naming convention

Each of the 1440 files has a unique filename. The filename consists of a 7-part numerical identifier (e.g., 03-01-06-01-02-01-12.wav). These identifiers define the stimulus characteristics:

Filename identifiers

Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
Vocal channel (01 = speech, 02 = song).
Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
Repetition (01 = 1st repetition, 02 = 2nd repetition).
Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

Filename example: 03-01-06-01-02-01-12.wav

Audio-only (03)
Speech (01)
Fearful (06)
Normal intensity (01)
Statement "dogs" (02)
1st Repetition (01)
12th Actor (12) Female, as the actor ID number is even.

A file is about 3.0s, waveform is like:

🚀Introduce

2.1 create virtual environment

python -m venv myenv

2.2 Activate virtual env

If you use Linux

source myenv/bin/activate    # deactivate

or use windows, in CMD

myenv/Scripts/activate.bat

or in Powershell

myenv/Scripts/Activate.ps1

2.3 Install dependences

pip install -r requirements.txt

🎯Download checkpoint

Download checkpoints by BaiduNetDisk.

⚠️ All you needed is just init directory and SSR_checkpoint.pt file. Of course, if you have the need of transfering to other frameworks such as Tensorflow or deploying it, you may need SSR_checkpoint.onnx file.

SSR_checkpoint.pt is simplified checkpoint of step-149. Although you can specify the step size (50, 100, 150), however, according to my experiment, step = win has the best effect. if you need checkpoints of other steps ,you can download step-50 or step-100.Notice that there are redundant parameters in .pth file of step-xxx directory, including scheduler params、optimizer params and so on.

🏆Evaluation

On the test dataset, here are some metrics.

4.1 Metric scores:

4.2 Confusion matrix:

4.3 Radar chart:

4.4 ROC & AUC

🔭 Future works

fine tuning the Wav2Vec2(Unfreeze parameter)
Extend datasets
Multi model data
Evaluate the model using deepcheck
ROC and AUC(How to plot ROC by threshold)

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
assets		assets
checkpoints		checkpoints
datasets		datasets
model		model
records		records
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
dataproc.csv		dataproc.csv
dataproc.py		dataproc.py
dataset.py		dataset.py
eval.py		eval.py
infer.py		infer.py
requirements.txt		requirements.txt
train.py		train.py
venvcmd.sh		venvcmd.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Emotion Recognization

🔍Dataset

1.1 Files

1.2 File naming convention

🚀Introduce

2.1 create virtual environment

2.2 Activate virtual env

2.3 Install dependences

🎯Download checkpoint

🏆Evaluation

4.1 Metric scores:

4.2 Confusion matrix:

4.3 Radar chart:

4.4 ROC & AUC

🔭 Future works

About

Releases

Packages

Languages

License

JingleCate/SpeechEmotionRecog

Folders and files

Latest commit

History

Repository files navigation

Speech Emotion Recognization

🔍Dataset

1.1 Files

1.2 File naming convention

🚀Introduce

2.1 create virtual environment

2.2 Activate virtual env

2.3 Install dependences

🎯Download checkpoint

🏆Evaluation

4.1 Metric scores:

4.2 Confusion matrix:

4.3 Radar chart:

4.4 ROC & AUC

🔭 Future works

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages