A neural network model for determining human speech emotions from audio recordings.
id | name |
---|---|
1 | The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) |
2 | SAVEE |
3 | CREMA-D (Crowd-sourced Emotional Mutimodal Actors Dataset) |
We are using names for emotions:
- happy
- surprise
- sad
- angry
- disgust
- fear
- neutral
- A 64-bit Windows, Linux, or Mac OS machine (certain libraries such as pyarrow don't work on 32-bit machines)
- The Conda package and environment manager https://conda.io/projects/conda/en/latest/user-guide/install/index.html
- Install Conda through Miniconda/Anaconda
- Clone the repo
- Create Conda environment using
conda env create -f env.yaml
- Activate the enviroment with
source activate sea
- Install Tensorflow for CPU using
conda install tensorflow=1.12.0
(ortensorflow-gpu
for GPU support) - Install keras using
conda install keras=2.2.4
- Download
Audio_Speech_Actors_01-24.zip
andAudio_Song_Actors_01-24.zip
from https://zenodo.org/record/1188976 - Place these zip files in a folder called
raw-data
in the main directory