# **Unveiling Emotions: A Comprehensive Analysis of Emotional Speech Data using Advanced Machine Learning Techniques**

In a world where machines are increasingly interwoven into the fabric of our daily lives, understanding and interpreting human emotions is a frontier that technology is only beginning to navigate. In this exploration, we dive deep into the realm of emotional speech analysis, employing a powerful arsenal of machine learning tools, including the remarkable GPT-4, to unravel the complexities hidden in human vocal expressions.

Our journey begins with the meticulous unpacking of a rich dataset, filled with diverse emotional utterances. With careful hands and a discerning eye, we curate the data, ensuring its integrity and readiness for the voyage ahead.

Through exploratory data analysis, we seek to uncover the initial secrets held by our dataset. Visualizing, listening, and understanding the nature of the audio files, we prepare ourselves for the more profound analysis lying ahead.

As we delve further, the path leads us to the doors of feature extraction. Here, we apply sophisticated techniques to distill valuable characteristics from the raw audio files—transforming waves into a symphony of features like MFCCs, Chroma, and more.

With a well-prepared dataset, our expedition advances into the realms of model building. Guided by the wisdom of GPT-4 and various other machine learning algorithms, we embark on a quest to construct models capable of recognizing and interpreting the emotional essence captured in speech.

The saga unfolds as our models, forged in the crucibles of training, are tested and evaluated. Their mettle is proven through rigorous assessments, ensuring their readiness to triumph in real-world applications.

Join us in this captivating tale of discovery, where technology meets emotion, and where machines learn to understand the subtle art of human expression.

In [2]:
import zipfile
import os

# Path to the zip file
zip_file_path = '/content/Acted Emotional Speech Dynamic Database 2.zip'

# Directory to unzip the contents
unzip_dir = '/content/unzipped_data/'

# Creating the directory if it doesn't exist
os.makedirs(unzip_dir, exist_ok=True)

# Unzipping the file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(unzip_dir)

# Listing the contents of the unzipped directory
os.listdir(unzip_dir)


['Acted Emotional Speech Dynamic Database', '__MACOSX']

In [3]:
# Path to the "Acted Emotional Speech Dynamic Database" directory
aesdd_dir = os.path.join(unzip_dir, 'Acted Emotional Speech Dynamic Database')

# Listing the contents of the "Acted Emotional Speech Dynamic Database" directory
os.listdir(aesdd_dir)


['happiness', 'anger', '.DS_Store', 'fear', 'sadness', 'disgust']

The "Acted Emotional Speech Dynamic Database" directory contains folders corresponding to different emotions such as 'fear', 'sadness', 'happiness', 'anger', and 'disgust'. It seems like these folders might contain audio files or other types of data related to the respective emotions.

In [4]:
# Let's explore the 'anger' directory as an example
anger_dir = os.path.join(aesdd_dir, 'anger')

# Listing the contents of the 'anger' directory
os.listdir(anger_dir)[:10]  # Displaying the first 10 files/folders


['a03 (1).wav',
 'a04 (6).wav',
 'a02 (4).wav',
 'a01 (2).wav',
 'a01 (6).wav',
 'a05 (4).wav',
 'a01 (1).wav',
 'a01 (3).wav',
 'a03 (5).wav',
 'a05 (2).wav']

The 'anger' directory contains audio files in WAV format, and there seems to be a naming convention for these files. Each file appears to be named with a prefix (e.g., 'a01', 'a02', etc.) and a number in parentheses, possibly indicating different versions or takes of the emotional expression.

To proceed with the exploratory data analysis (EDA), we could:

Count the number of audio files in each emotion directory to understand the distribution of data across different emotions.
Listen to a few audio files to understand the nature of the emotional expressions.
If there are any accompanying metadata or annotation files, we could analyze them to extract additional information about the audio files.

In [5]:
# Counting the number of audio files in each emotion directory
emotion_dirs = os.listdir(aesdd_dir)

# Filtering out system files like '.DS_Store'
emotion_dirs = [dir for dir in emotion_dirs if not dir.startswith('.')]

# Dictionary to hold the count of audio files for each emotion
emotion_file_counts = {}

for emotion in emotion_dirs:
    emotion_path = os.path.join(aesdd_dir, emotion)
    # Listing all files in the directory
    files = os.listdir(emotion_path)
    # Filtering out system files and counting only the audio files (wav)
    audio_files = [file for file in files if file.endswith('.wav')]
    emotion_file_counts[emotion] = len(audio_files)

emotion_file_counts


{'happiness': 30, 'anger': 30, 'fear': 29, 'sadness': 33, 'disgust': 33}

The dataset contains audio files distributed across different emotions as follows:

Fear: 29 audio files
Sadness: 33 audio files
Happiness: 30 audio files
Anger: 30 audio files
Disgust: 33 audio files

In [6]:
# Importing necessary libraries
import shutil

# Directory to save selected audio files
selected_audio_dir = '/mnt/data/selected_audio/'

# Creating the directory if it doesn't exist
os.makedirs(selected_audio_dir, exist_ok=True)

# Selecting one audio file from each emotion directory and copying it to the selected_audio directory
selected_audio_files = {}
for emotion in emotion_dirs:
    emotion_path = os.path.join(aesdd_dir, emotion)
    # Listing all files in the directory and filtering out system files
    files = [file for file in os.listdir(emotion_path) if file.endswith('.wav')]
    # Selecting the first audio file
    selected_file = files[0] if files else None
    if selected_file:
        # Copying the selected audio file to the selected_audio directory
        shutil.copy(os.path.join(emotion_path, selected_file), selected_audio_dir)
        selected_audio_files[emotion] = os.path.join(selected_audio_dir, selected_file)

selected_audio_files


{'happiness': '/mnt/data/selected_audio/h04 (1).wav',
 'anger': '/mnt/data/selected_audio/a03 (1).wav',
 'fear': '/mnt/data/selected_audio/f04 (2).wav',
 'sadness': '/mnt/data/selected_audio/s02 (6).wav',
 'disgust': '/mnt/data/selected_audio/d03 (01).wav'}

In [None]:
from pydub import AudioSegment
import pandas as pd
import os

def extract_basic_features(audio_file):
    """
    Extract basic features such as duration and raw audio data amplitude.
    """
    # Loading the audio file with PyDub
    audio = AudioSegment.from_wav(audio_file)

    # Getting duration
    duration = len(audio)

    # Getting the raw audio data amplitude
    samples = audio.get_array_of_samples()
    mean_amplitude = sum(samples) / len(samples)

    return duration, mean_amplitude

# Directory where your audio files are located


# Dataframe to hold the features and labels
feature_df = pd.DataFrame(columns=['duration', 'mean_amplitude', 'emotion'])

# Emotions to consider
emotion_dirs = ['fear', 'sadness', 'happiness', 'anger', 'disgust']

# Extracting features from each audio file and adding it to the dataframe
for emotion in emotion_dirs:
    emotion_path = os.path.join(aesdd_dir, emotion)
    # Listing all files in the directory and filtering out system files
    files = [file for file in os.listdir(emotion_path) if file.endswith('.wav')]
    for file in files:
        duration, mean_amplitude = extract_basic_features(os.path.join(emotion_path, file))
        feature_df = feature_df.append({
            'duration': duration,
            'mean_amplitude': mean_amplitude,
            'emotion': emotion}, ignore_index=True)

# Displaying the head of the dataframe
feature_df.head()


This code snippet will load each audio file, extract its duration and the mean amplitude of the raw audio data, and store these features along with the emotion labels in a dataframe.

In [None]:
def zero_crossing_rate(samples, frame_length=2048, hop_length=512, sample_rate=44100):
    """
    Calculate the zero-crossing rate of an audio signal.
    """
    zero_crossings = librosa.zero_crossings(samples, pad=False)
    return sum(zero_crossings)

# Updating the feature extraction function to include zero_crossing_rate
def extract_more_features(audio_file):
    """
    Extract more features including duration, mean amplitude, and zero-crossing rate.
    """
    # Loading the audio file with PyDub
    audio = AudioSegment.from_wav(audio_file)

    # Getting duration
    duration = len(audio)

    # Getting the raw audio data amplitude
    samples = np.array(audio.get_array_of_samples())
    mean_amplitude = sum(samples) / len(samples)

    # Getting zero-crossing rate
    zcr = zero_crossing_rate(samples)

    return duration, mean_amplitude, zcr

# Updating the dataframe columns
feature_df = pd.DataFrame(columns=['duration', 'mean_amplitude', 'zcr', 'emotion'])

# Extracting features from each audio file and adding it to the dataframe
for emotion in emotion_dirs:
    emotion_path = os.path.join(aesdd_dir, emotion)
    # Listing all files in the directory and filtering out system files
    files = [file for file in os.listdir(emotion_path) if file.endswith('.wav')]
    for file in files:
        duration, mean_amplitude, zcr = extract_more_features(os.path.join(emotion_path, file))
        feature_df = feature_df.append({
            'duration': duration,
            'mean_amplitude': mean_amplitude,
            'zcr': zcr,
            'emotion': emotion}, ignore_index=True)

# Displaying the head of the dataframe
feature_df.head()


This code calculates the zero-crossing rate and adds it as a feature. You can continue adding more features and preprocessing steps in a similar way

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score

# Preparing the features and labels
X = np.array(feature_df['features'].tolist())
y = feature_df['emotion']

# Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Building a Random Forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Making predictions on the test set
y_pred = clf.predict(X_test)

# Evaluating the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))


In [None]:
from sklearn.model_selection import GridSearchCV

# Defining the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10]
}

# Creating a Random Forest classifier
clf = RandomForestClassifier(random_state=42)

# Performing grid search
grid_search = GridSearchCV(estimator=clf,
