<a href="https://colab.research.google.com/github/ravdess/emotion_detection/blob/main/Emotion_Detection_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Emotion Dectection using RAVDESS Audio-Visual Dataset**

### **Description**

* The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) has 7,356 files totaling 24.8 GB.

* It consists of 24 professional actors including (12 Male, 12 Female) with neutral North American accent.

* **Emotions includes:**

  **Neutral 😐 Calm 😌 Happy 😊 Sad 😞 Angry 😠 Fearful 😨 Surprised 😯 & Disgust 🤢**

* Each emotion is recorded at normal and strong intensities, with an additional neutral expression.

* **Formats:**

  * Audio-files are in `.wav` format, 16-bit, 48kHz)
    * WAV files preserve all of the original audio data, making them ideal for professional settings where high-quality sound is required.
    * WAV files are larger than other formats because they are uncompressed.
  * Video-files are in `.mp4` format (no audio)
    * MP4 is a compressed format that uses less storage space and bandwidth than other formats.
    * It offers high-quality video and audio in a compact file size, making it ideal for streaming and sharing
* The RAVDESS was developed by Dr Steven R. Livingstone, who now leads the Affective Data Science Lab, and Dr Frank A. Russo who leads the SMART Lab.

### **Contents**

**File Summary**
* In total, the RAVDESS collection includes 7356 files (2880+2024+1440+1012 files).

**File naming convention**
* Each of the 7356 RAVDESS files has a unique filename. The filename consists of a 7-part numerical identifier.
* For example, **02-01-06-01-02-01-12.mp4**
* These identifiers define the stimulus characteristics:

**Filename identifiers**
* Modality (`01 = Full-AV`, `02 = Video-only`, `03 = Audio-only`).
* Vocal channel (`01 = Speech`, `02 = Song`).
* Emotion (`01 = Neutral`, `02 = Calm`, `03 = Happy`, `04 = Sad`, `05 = Angry`, `06 = Fearful`, `07 = Disgust`, `08 = Surprised`).
* Emotional intensity (`01 = Normal`, `02 = Strong`) ***Note**: There is no strong intensity for the 'Neutral' emotion.*
* Statement (`01 = "Kids are talking by the door"`, `02 = "Dogs are sitting by the door"`).
* Repetition (`01 = 1st repetition`, `02 = 2nd repetition`).
* Actor (01 to 24. Odd numbered actors are Male, even numbered actors are Female).

**Filename example:**
- Lets understand the following file **02-01-06-01-02-01-12.mp4**
    - Video-only (02)   
    * Speech (01)
    * Fearful (06)
    * Normal intensity (01)
    * Statement "dogs" (02)
    * 1st Repetition (01)
    * 12th Actor (12)
    * Female, as the actor ID number is even.

### **Objective**

* The objective of this project is to develop a machine learning model that can recognize emotions from audio and video data in the RAVDESS dataset.

* **Key goals include:**
  * **Data Preparation:** Organize and preprocess audio and video data.
  * **Feature Extraction:** Extract features from speech and facial expressions.
  * **Model Training:** Train models to classify emotions
  * **Evaluation:** Assess performance and improve accuracy for real-world applications.
  
* This project aims to build a reliable emotion recognition system that leverages multimodal data.

### **Applications**

**Emotion recognition from audio and video data has impactful applications across various sectors are as follows:**
* **Security:** Detects suspicious behaviors in public or high-security areas.
* **Entertainment :** Adapts content based on viewer emotions for a more engaging experience.
* **Human-Computer Interaction:** Personalizes virtual assistant interactions based on user mood.
* **Education:** Identifies student engagement and frustration in e-learning for content adjustments.
* **Healthcare:** Helps monitor the emotional well-being of non-verbal or cognitively impaired patients.
* **Mental Health:** Monitors emotional states in virtual therapy to detect stress, anxiety, or depression.
* **Customer Service:** Improves call center interactions by detecting emotions in real-time, adjusting responses, and enhancing satisfaction.

### **Data Preparation**

#### Importing Libraries

In [None]:
import os           # To interact with operating system for navigating thorught directories
import librosa      # To analyze audio to extract features
import pandas as pd # To manipulate and analyze the data
import numpy as np  # For numerical computations for arrays

from sklearn.preprocessing import LabelEncoder          # For handling categorical data

#### Defining Paths to Dataset

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
audio_song_actors_path = r"/content/drive/My Drive/Audio_Song_Actors_01-24"
audio_speech_actors_path = r"/content/drive/My Drive/Audio_Speech_Actors_01-24"

#### Mapping dictionaries for filename parts
* Each dictionary maps a two-digit code from the filename to its corresponding human-readable attribute based on file name identification

In [None]:
modality = {'01': 'Full-AV', '02': 'Video-only', '03': 'Audio-only'}
vocal_channel = {'01': 'Speech', '02': 'Song'}
emotion = {'01': 'Neutral', '02': 'Calm', '03': 'Happy', '04': 'Sad',
           '05': 'Angry', '06': 'Fearful', '07': 'Disgust', '08': 'Surprised'}
emotional_intensity = {'01': 'Normal', '02': 'Strong'}
statement = {'01': 'Kids are talking by the door', '02': 'Dogs are sitting by the door'}
repetition = {'01': '1st repetition', '02': '2nd repetition'}

#### Defining function to extract metadata from filename

In [37]:
def extract_metadata(file_path, is_song=True):
    file_name = os.path.basename(file_path)

    # Remove the .wav extension from the filename
    base_name = file_name.replace('.wav', '')
    parts = base_name.split('-')            #parts is string
    # print(f"{base_name=}")
    # print(f"{parts=}")

    metadata = {}
    # Assigning values to the dictionary
    metadata['full_path'] = file_path[:]
    metadata['parent_folder'] = os.path.basename(os.path.dirname(os.path.dirname(file_path)))
    metadata['actor_folder'] = os.path.basename(os.path.dirname(file_path))
    metadata['file_name'] = file_name
    metadata['actor_num'] = parts[6]
    metadata['modality'] = modality.get(parts[0], 'Unknown')
    metadata['vocal_channel'] = vocal_channel.get(parts[1], 'Unknown')
    metadata['emotion'] = emotion.get(parts[2], 'Unknown')
    metadata['emotional_intensity'] = emotional_intensity.get(parts[3], 'Unknown')
    metadata['statement'] = statement.get(parts[4], 'Unknown')
    metadata['repetition'] = repetition.get(parts[5], 'Unknown')

    # Assigining actor gender based on the actor number
    if int(parts[6]) % 2 != 0:
        metadata['actor_gender'] = 'Male'
    else:
        metadata['actor_gender'] = 'Female'

    return metadata

#### Iterating over Actor folders

In [38]:
data = []          # A list to store metadata for each audio file.
# Iterate over actor folder over base_path
def itr_actor_folder(base_path, is_song, data):
    for actor_folder in os.listdir(base_path):
        actor_folder_path = os.path.join(base_path, actor_folder)

        # Check if it's a directory
        if os.path.isdir(actor_folder_path):
            # Iterate through the audio files
            for audio_file in os.listdir(actor_folder_path):
                if audio_file.endswith('.wav'):
                    file_path = os.path.join(actor_folder_path, audio_file)
                    metadata = extract_metadata(file_path, is_song)
                    data.append(metadata)

#### Iterating through each dataset to process the file

In [39]:
dict_paths = [{'path': audio_song_actors_path, 'is_song': True},
              {'path': audio_speech_actors_path, 'is_song': False}]

# Iterate over the paths
for base in dict_paths:
    base_path = base['path']
    is_song = base['is_song']
    # Calling function to iterate over actor folder
    itr_actor_folder(base_path, is_song, data)

#### Creating a DataFrame of metadata

In [40]:
# List of metadata is converted to DataFrame
df = pd.DataFrame(data)
# Columns representing key metadata attributes for each file
df = df[['full_path', 'parent_folder', 'actor_folder', 'file_name', 'actor_num', 'modality', 'vocal_channel', 'emotion', 'emotional_intensity', 'statement', 'repetition', 'actor_gender']]

In [41]:
# Getting insights about dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2452 entries, 0 to 2451
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   full_path            2452 non-null   object
 1   parent_folder        2452 non-null   object
 2   actor_folder         2452 non-null   object
 3   file_name            2452 non-null   object
 4   actor_num            2452 non-null   object
 5   modality             2452 non-null   object
 6   vocal_channel        2452 non-null   object
 7   emotion              2452 non-null   object
 8   emotional_intensity  2452 non-null   object
 9   statement            2452 non-null   object
 10  repetition           2452 non-null   object
 11  actor_gender         2452 non-null   object
dtypes: object(12)
memory usage: 230.0+ KB


In [42]:
# View first '5' rows
df.head(5)

Unnamed: 0,full_path,parent_folder,actor_folder,file_name,actor_num,modality,vocal_channel,emotion,emotional_intensity,statement,repetition,actor_gender
0,/content/drive/My Drive/Audio_Song_Actors_01-2...,Audio_Song_Actors_01-24,Actor_02,03-02-01-01-01-01-02.wav,2,Audio-only,Song,Neutral,Normal,Kids are talking by the door,1st repetition,Female
1,/content/drive/My Drive/Audio_Song_Actors_01-2...,Audio_Song_Actors_01-24,Actor_02,03-02-01-01-01-02-02.wav,2,Audio-only,Song,Neutral,Normal,Kids are talking by the door,2nd repetition,Female
2,/content/drive/My Drive/Audio_Song_Actors_01-2...,Audio_Song_Actors_01-24,Actor_02,03-02-01-01-02-02-02.wav,2,Audio-only,Song,Neutral,Normal,Dogs are sitting by the door,2nd repetition,Female
3,/content/drive/My Drive/Audio_Song_Actors_01-2...,Audio_Song_Actors_01-24,Actor_02,03-02-02-01-01-01-02.wav,2,Audio-only,Song,Calm,Normal,Kids are talking by the door,1st repetition,Female
4,/content/drive/My Drive/Audio_Song_Actors_01-2...,Audio_Song_Actors_01-24,Actor_02,03-02-06-01-02-02-02.wav,2,Audio-only,Song,Fearful,Normal,Dogs are sitting by the door,2nd repetition,Female


In [43]:
# View last '5' rows
df.tail(5)

Unnamed: 0,full_path,parent_folder,actor_folder,file_name,actor_num,modality,vocal_channel,emotion,emotional_intensity,statement,repetition,actor_gender
2447,/content/drive/My Drive/Audio_Speech_Actors_01...,Audio_Speech_Actors_01-24,Actor_23,03-01-08-02-02-02-23.wav,23,Audio-only,Speech,Surprised,Strong,Dogs are sitting by the door,2nd repetition,Male
2448,/content/drive/My Drive/Audio_Speech_Actors_01...,Audio_Speech_Actors_01-24,Actor_23,03-01-04-02-01-02-23.wav,23,Audio-only,Speech,Sad,Strong,Kids are talking by the door,2nd repetition,Male
2449,/content/drive/My Drive/Audio_Speech_Actors_01...,Audio_Speech_Actors_01-24,Actor_23,03-01-01-01-02-01-23.wav,23,Audio-only,Speech,Neutral,Normal,Dogs are sitting by the door,1st repetition,Male
2450,/content/drive/My Drive/Audio_Speech_Actors_01...,Audio_Speech_Actors_01-24,Actor_23,03-01-08-02-01-01-23.wav,23,Audio-only,Speech,Surprised,Strong,Kids are talking by the door,1st repetition,Male
2451,/content/drive/My Drive/Audio_Speech_Actors_01...,Audio_Speech_Actors_01-24,Actor_23,03-01-02-02-02-01-23.wav,23,Audio-only,Speech,Calm,Strong,Dogs are sitting by the door,1st repetition,Male


In [44]:
# Check for null values
df.isna().sum()

Unnamed: 0,0
full_path,0
parent_folder,0
actor_folder,0
file_name,0
actor_num,0
modality,0
vocal_channel,0
emotion,0
emotional_intensity,0
statement,0


In [45]:
# Checking for unique values in each folder
df.nunique()

Unnamed: 0,0
full_path,2452
parent_folder,2
actor_folder,24
file_name,2452
actor_num,24
modality,1
vocal_channel,2
emotion,8
emotional_intensity,2
statement,2


In [46]:
# Removing the following columns because they are less relevant for further analysis
df_copy = df.copy()
df.drop(columns=['full_path', 'parent_folder', 'actor_folder', 'file_name'], axis=1, inplace=True)

### **Data Preprocessing**

#### Handling Categorical Features
* *`LabelEncoder():`* Used to convert categorical data into numerical values.
* *`fit_transform():`* Transforms values into integers ranging from `0` to `n-1` classes

In [None]:
# Selecting categorical features
cat_features = df.select_dtypes(include=['object']).columns
le = LabelEncoder()

# Iterate over each feature
for feature in cat_features:
    df[feature] = le.fit_transform(df[feature])
    df[feature] += 1         # Ensuring all values starting from 1 as default transform starts from 0

#### Display Label Encoded Features
* Function returing new DataFrame that contains first and last n_rows for quick viewing

In [49]:
def display_head_tail(df, n=5):
    first_row = df.head(n)
    last_row = df.tail(n)
    concat_df = pd.concat([first_row, last_row])
    return concat_df

In [50]:
# Calling function to first_and_last n_rows
display_head_tail(df, 6)

Unnamed: 0,actor_num,modality,vocal_channel,emotion,emotional_intensity,statement,repetition,actor_gender
0,2,Audio-only,Song,Neutral,Normal,Kids are talking by the door,1st repetition,Female
1,2,Audio-only,Song,Neutral,Normal,Kids are talking by the door,2nd repetition,Female
2,2,Audio-only,Song,Neutral,Normal,Dogs are sitting by the door,2nd repetition,Female
3,2,Audio-only,Song,Calm,Normal,Kids are talking by the door,1st repetition,Female
4,2,Audio-only,Song,Fearful,Normal,Dogs are sitting by the door,2nd repetition,Female
5,2,Audio-only,Song,Neutral,Normal,Dogs are sitting by the door,1st repetition,Female
2446,23,Audio-only,Speech,Angry,Normal,Dogs are sitting by the door,2nd repetition,Male
2447,23,Audio-only,Speech,Surprised,Strong,Dogs are sitting by the door,2nd repetition,Male
2448,23,Audio-only,Speech,Sad,Strong,Kids are talking by the door,2nd repetition,Male
2449,23,Audio-only,Speech,Neutral,Normal,Dogs are sitting by the door,1st repetition,Male


#### Summary Statistics of the dataset

In [51]:
df.describe()

Unnamed: 0,actor_num,modality,vocal_channel,emotion,emotional_intensity,statement,repetition,actor_gender
count,2452,2452,2452,2452,2452,2452,2452,2452
unique,24,1,2,8,2,2,2,2
top,2,Audio-only,Speech,Calm,Normal,Kids are talking by the door,1st repetition,Male
freq,104,2452,1440,376,1320,1226,1226,1248
