**1.Objective**

My project detects human emotions from audio recordings
The goal is to build a model that recognizes emotions like happy, sad, angry, calm, and fearful from speech.

**2. Dataset Selection**

I have selected Ryerson Audio-Visual Database of Emotional Speech that include emotions like neutral, calm, happy, sad, angry, fearful, disgust, surprised WAV audio format has speakers 24 actors 12 male and 12 female


**3. Dataset Description**

Dataset source: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio/data

Type of Data:
- Audio recordings from professional voice actors
- Emotional speech data captured via high-quality microphones (16bit, 48kHz .wav)

Collected Variables:
- Modality: Audio-only recordings
- Vocal Channel: Speech
- Emotion: 8 categories (neutral, calm, happy, sad, angry, fearful, disgust, surprised)
- Emotional Intensity: Normal or Strong
- Statement: 2 different phrases
- Repetition: 1st or 2nd repetition
- Actor: 24 professional actors (12 male, 12 female)
- Audio Features: Can extract (duration, sample rate, etc.)

Number of Samples:
- 1440 audio files (60 trials per actor × 24 actors)

In [5]:
TARGET_SR = 16000
FIXED_DURATION = 3.0
FEATURE_TYPE = "mel"
N_MELS = 128
N_MFCC = 40
SAVE_DIR_IN_DRIVE = "RAVDESS_preprocessed"

Preprocesses the RAVDESS speech dataset by loading all audio files, resampling to 16 kHz, truncating to 3 seconds, extracting MFCC or Mel features, normalizing them, and saving the features and labels to Google Drive for model training.

In [2]:
!pip install -q librosa soundfile kaggle tqdm


In [3]:
import os
import shutil
import zipfile
from pathlib import Path
import numpy as np
import pandas as pd
from tqdm import tqdm
import librosa
import soundfile as sf

In [6]:
from google.colab import drive
print("Mounting Google Drive...")
drive.mount('/content/drive')


DRIVE_BASE = Path('/content/drive/MyDrive') / SAVE_DIR_IN_DRIVE
DRIVE_BASE.mkdir(parents=True, exist_ok=True)
print(f"Preprocessed files will be saved to: {DRIVE_BASE}")

Mounting Google Drive...
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Preprocessed files will be saved to: /content/drive/MyDrive/RAVDESS_preprocessed


In [30]:
!pip install kaggle
from google.colab import files
files.upload()
import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content"
!kaggle datasets download -d uwrfkaggler/ravdess-emotional-speech-audio --unzip -p /content/ravdess




Saving kaggle.json to kaggle (1).json
Dataset URL: https://www.kaggle.com/datasets/uwrfkaggler/ravdess-emotional-speech-audio
License(s): CC-BY-NC-SA-4.0
Downloading ravdess-emotional-speech-audio.zip to /content/ravdess
 93% 399M/429M [00:03<00:00, 86.5MB/s]
100% 429M/429M [00:03<00:00, 143MB/s] 


In [31]:
!ls '/content/ravdess'

Actor_01  Actor_06  Actor_11  Actor_16	Actor_21
Actor_02  Actor_07  Actor_12  Actor_17	Actor_22
Actor_03  Actor_08  Actor_13  Actor_18	Actor_23
Actor_04  Actor_09  Actor_14  Actor_19	Actor_24
Actor_05  Actor_10  Actor_15  Actor_20	audio_speech_actors_01-24


In [32]:
from pathlib import Path
AUDIO_ROOT = Path("/content/ravdess")
speech_wavs = sorted([p for p in AUDIO_ROOT.rglob('*.wav') if p.name.startswith('03-01-')])
print(f"Total speech audio-only files found: {len(speech_wavs)}")
print("First 5 files:")
for f in speech_wavs[:5]:
    print(f)

Total speech audio-only files found: 2880
First 5 files:
/content/ravdess/Actor_01/03-01-01-01-01-01-01.wav
/content/ravdess/Actor_01/03-01-01-01-01-02-01.wav
/content/ravdess/Actor_01/03-01-01-01-02-01-01.wav
/content/ravdess/Actor_01/03-01-01-01-02-02-01.wav
/content/ravdess/Actor_01/03-01-02-01-01-01-01.wav


Recursively find all speech audio-only wav filesSpeech audio-only files start with '03-01-'

In [37]:
import librosa
import numpy as np
from sklearn.preprocessing import StandardScaler
features = []
labels = []
max_len = int(TARGET_SR * FIXED_DURATION)
print("Extracting MFCC features...")
for wav_path in tqdm(speech_wavs):
    y, sr = librosa.load(wav_path, sr=TARGET_SR)
    if len(y) < max_len:
        y = np.pad(y, (0, max_len - len(y)))
    else:
        y = y[:max_len]
    mfccs = librosa.feature.mfcc(y=y, sr=TARGET_SR, n_mfcc=N_MFCC)
    mfccs_mean = np.mean(mfccs.T, axis=0)
    features.append(mfccs_mean)
    emotion_code = int(wav_path.name.split('-')[2])
    labels.append(emotion_code)

features = np.array(features)
labels = np.array(labels)
print("Features shape:", features.shape)
print("Labels shape:", labels.shape)

scaler = StandardScaler()
features = scaler.fit_transform(features)
print("Features normalized.")


Extracting MFCC features...


100%|██████████| 2880/2880 [00:47<00:00, 61.16it/s]

Features shape: (2880, 40)
Labels shape: (2880,)
Features normalized.





In [38]:
save_path = DRIVE_BASE / "ravdess_features.pkl"
with open(save_path, 'wb') as f:
    pickle.dump({'features': features, 'labels': labels}, f)

print(f"Preprocessed features saved to Google Drive at: {save_path}")

emotion_map = {
    1: 'neutral',
    2: 'calm',
    3: 'happy',
    4: 'sad',
    5: 'angry',
    6: 'fearful',
    7: 'disgust',
    8: 'surprised'
}
labels_named = [emotion_map[code] for code in labels[:5]]
print("Example emotion labels:", labels_named)

Preprocessed features saved to Google Drive at: /content/drive/MyDrive/RAVDESS_preprocessed/ravdess_features.pkl
Example emotion labels: ['neutral', 'neutral', 'neutral', 'neutral', 'calm']
