# Speech Emotion Recognition
## Step 01: Feature Extraction
Dataset Source: RAVDESS Emotional Speech Audio on Kaggle
--- 
### Import Libraries
We import the necessary libraries for:
- Navigating directories (os)
- Handling tabular data (pandas)
- Showing progress bars (tqdm)
- Calling feature extraction and emotion parsing utilities from src/preprocess.py
### Define Data Directory and Prepare Storage
We define the root directory of the audio dataset (data/), and initialize empty lists to hold extracted features and corresponding emotion labels.
### Loop Over Audio Files and Extract Features
We iterate through each actor folder in the dataset. For every .wav file:
- Extract the emotion label from the filename
- Extract audio features using extract_features()
- Append the features and labels to lists
- We use tqdm to track progress and handle errors gracefully with a try-except block.
### Convert to DataFrame and Save
We convert the extracted features and labels into a Pandas DataFrame and save it to outputs/features.csv. This file will be used for training emotion classification models.


In [3]:
import os
import pandas as pd
from tqdm import tqdm
from src.preprocess import extract_features, get_emotion

DATA_DIR = "data"

# Lists to store features and labels
features, labels = [], []

# Loop through Actor folders
for actor_folder in tqdm(os.listdir(DATA_DIR)):
    actor_path = os.path.join(DATA_DIR, actor_folder)
    if not os.path.isdir(actor_path):
        continue
    for file in os.listdir(actor_path):
        if file.endswith(".wav"):
            full_path = os.path.join(actor_path, file)
            emotion = get_emotion(file)
            if emotion is not None:
                try:
                    feats = extract_features(full_path, extra=True)
                    features.append(feats)
                    labels.append(emotion)
                except Exception as e:
                    print(f"⚠️ Error in {file}: {e}")

df = pd.DataFrame(features)
df['label'] = labels

os.makedirs("outputs", exist_ok=True)
df.to_csv("outputs/features.csv", index=False)
print("✅ Feature dataset saved to 'outputs/features.csv'")


100%|██████████████████████████████████████████████████████████████████████████████████| 24/24 [00:58<00:00,  2.42s/it]

✅ Feature dataset saved to 'outputs/features.csv'





### Output
A CSV file features.csv inside the outputs/ folder containing:
- Extracted audio features (e.g., MFCCs, pitch, etc.)
- Corresponding emotion labels for each file

This concludes the Feature Extraction step. Next, we'll move on to preprocessing, model training, and evaluation.