# Speech Emotion Recognition using CREMA-D (Crowd-sourced Emotional Multimodal Actors Dataset)

## Project objective

The objective of this project is to build a speech emotion recognition system that automatically classifies human emotional states from talking signals, using the CREMA-D dataset. I will take raw speech audio as input, extract meaningful features and use a trained model to predict one of the six emotions: anger, disgust, fear, happy, neutral, sad.

## Data Source
The CREMA-D dataset was developed by West Chester University of Pennsylvania (WCU), USA
(in collaboration with other U.S. research institutions) and consists of American English speech recordings collected from actors of diverse demographic backgrounds.

The audio data used in this project originates from the CREMA-D (Crowd-sourced Emotional Multimodal Actors Dataset).

Original repository:
https://github.com/CheyneyComputerScience/CREMA-D

## Data Dictionary

CREMA-D is a data set of 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities (African America, Asian, Caucasian, Hispanic, and Unspecified).

Actors spoke from a selection of 12 sentences. The sentences were presented using one of six different emotions (Anger, Disgust, Fear, Happy, Neutral and Sad) and four different emotion levels (Low, Medium, High and Unspecified).

| Column Name     | Data Type  | Description                                                                                                                          |
| --------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------ |
| `file_name`     | `string`   | Name of the WAV file as stored in the dataset (e.g. `1001_DFA_ANG_MD.wav`).                                                          |
| `file_path`     | `string`   | Relative or absolute path to the audio file used for loading the WAV data.                                                           |
| `actor_id`      | `int`      | Unique 4-digit identifier of the actor who spoke the sentence. Extracted from the filename.                                          |
| `sentence_id`   | `string`   | Identifier of the sentence spoken in the recording (e.g. `DFA`, `IEO`).                                                              |
| `emotion`       | `category` | Intended emotion encoded in the filename: `ANG` (Anger), `DIS` (Disgust), `FEA` (Fear), `HAP` (Happy), `NEU` (Neutral), `SAD` (Sad). |
| `emotion_id`    | `int`      | Integer-encoded emotion label used for machine learning. Mapping: `ANG=0`, `DIS=1`, `FEA=2`, `HAP=3`, `NEU=4`, `SAD=5`.              |
| `emotion_level` | `category` | Intended emotion intensity level: `LO` (Low), `MD` (Medium), `HI` (High), `XX` (Unspecified).                                        |
| `duration_sec`  | `float`    | Duration of the audio clip in seconds, computed as `num_samples / sample_rate`.                                                      |
| `sample_rate`   | `int`      | Sampling rate of the audio file in Hertz (samples per second), e.g. `16000`.                                                         |
| `num_samples`   | `int`      | Total number of audio samples in the clip (length of the waveform array).                                                            |


### Small look up table that shows what the sentence codes stand for

| Sentence Code | Full Sentence                         |
| ------------- | ------------------------------------- |
| `IEO`         | It’s eleven o’clock                   |
| `TIE`         | That is exactly what happened         |
| `IOM`         | I’m on my way to the meeting          |
| `IWW`         | I wonder what this is about           |
| `TAI`         | The airplane is almost full           |
| `MTI`         | Maybe tomorrow it will be cold        |
| `IWL`         | I would like a new alarm clock        |
| `ITH`         | I think I have a doctor’s appointment |
| `DFA`         | Don’t forget a jacket                 |
| `ITS`         | I think I’ve seen this before         |
| `TSI`         | The surface is slick                  |
| `WSI`         | We’ll stop in a couple of minutes     |
