## Audio Feature Extraction for Emotion Recognition

This script is designed to extract audio features from a dataset of emotional speech. The dataset is organized into multiple folders, each representing a different speaker, with audio files categorized by emotions. Here's an overview of the process:

1. Set Up the Directory: 
   - The base directory containing the dataset is defined.
   - The dataset is expected to have a specific structure: each speaker has a dedicated folder containing subfolders for different emotions.

2. Initialize Data Storage:
   - An empty list, data, is created to store the features and labels extracted from the audio files.

3. Iterate Through Speaker Folders:
   - The script iterates over each speaker's folder within the base directory.
   - For each speaker, it reads a text file that contains mappings of audio filenames to their corresponding emotional labels.

4. Extract Audio Features:
   - The script processes each audio file in the emotion folders.
   - Using the librosa library, it extracts various features from the audio files, including Mel-frequency cepstral coefficients (MFCCs), root mean square energy (RMSE), and delta MFCCs.
   - These features are stacked and averaged to create a single feature vector for each audio file.

5. Label Assignment:
   - Each audio file is labeled with the correct emotion based on the mapping obtained from the text file.

6. Data Compilation:
   - The extracted features and corresponding emotion labels are compiled into a Pandas DataFrame.
   - This DataFrame, df, is ready for further processing and can be used for training machine learning models for emotion recognition.

### Exception Handling

- The script includes error handling to manage issues like missing files or extraction errors, ensuring robustness in data processing.

### Output

- The final output is a DataFrame where each row represents an audio file, with columns for extracted features and the associated emotion label.

This script lays the foundation for building and training a model to recognize emotions from speech, leveraging audio signal processing and machine learning techniques.

In [None]:
import librosa
import os
import pandas as pd
import numpy as np

# Define the base directory
base_dir = 'C:\Users\张佳\source\repos\emotion-speech-recognition\Emotion Speech Dataset'

# Initialize a list to store data
data = []

# Iterate over each speaker's folder
for speaker_folder in os.listdir(base_dir):
    speaker_path = os.path.join(base_dir, speaker_folder)

    # Read the corresponding text file for the speaker
    mapping_file = os.path.join(speaker_path, f"{speaker_folder}.txt")
    mapping = {}
    with open(mapping_file, 'r') as file:
        for line in file:
            parts = line.strip().split('\t')
            if len(parts) == 3:
                # Append ".wav" to the filename from the text file for matching
                mapping[parts[0].strip() + ".wav"] = parts[2].strip()
            else:
                print(f"Skipping malformed line: {line}")

    # Process each audio file in the emotion folders
    for emotion_folder in os.listdir(speaker_path):
        emotion_path = os.path.join(speaker_path, emotion_folder)
        if os.path.isdir(emotion_path):
            for audio_file in os.listdir(emotion_path):
                file_path = os.path.join(emotion_path, audio_file)

                # Check if the audio file (with extension) is in the mapping
                if audio_file in mapping:
                    try:
                        # Load audio file
                        y, sr = librosa.load(file_path, sr=None)

                        # Extract features
                        mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13, hop_length=512, n_fft=2048)
                        rmse = librosa.feature.rms(y=y)
                        delta_mfccs = librosa.feature.delta(mfccs)
                        features = np.vstack([mfccs, rmse, delta_mfccs])
                        features_processed = np.mean(features.T, axis=0)

                        # Append the features and the correct emotion label to the data list
                        data.append([features_processed, mapping[audio_file]])
                    except Exception as e:
                        print(f"Error processing file {audio_file}: {e}")

# Convert to a Pandas DataFrame
df = pd.DataFrame(data, columns=['Features', 'Emotion'])

# Now df is your dataset ready for further processing and model training