# Data Loading and Preprocessing
In this notebook, the data from "IEMOCAP_full_release" directory is collected and preprocessed. The preprocessing is done on the wav files located in the "wav" directory for each session in the IEMOCAP dataset.

The preprocessing will be done by extracting the audio features of the files and recognizing the spoken text in the files. The extracting of features will be done using the OpenSmile library, and the text will be recognized using OpenAI's Whisper model.

In [28]:
# Import the necessary libraries
import os  # For file operations
import pandas as pd  # For data manipulation
from tqdm import tqdm
import opensmile  # For audio feature extraction
import whisper  # For speech recognition

Now, we will some of the configurations, constants and models that will be used in the preprocessing.

In [29]:
# Define the Whisper model
model = whisper.load_model("base") 

# Define the OpenSmile configuration
smile = opensmile.Smile(
    feature_set=opensmile.FeatureSet.ComParE_2016,
    feature_level=opensmile.FeatureLevel.Functionals,
)

# Define the data directory
data_dir = "../IEMOCAP_full_release"

# Define the name of the output CSV file
output_csv = "data.csv"

Now, let's define the function to return the paths to all the wav files in the dataset.

In [30]:
def get_wav_files(data_dir):
    """
    Get the paths to all the wav files in the dataset.
    
    Args:
    data_dir (str): The path to the IEMOCAP dataset.
    
    Returns:
    list: A list of paths to all the wav files in the dataset.
    """
    # Initialize the list of wav files
    wav_files = []
    
    # Loop through all the sessions
    for session in os.listdir(data_dir):
        # Get the path to the session directory
        session_dir = os.path.join(data_dir, session, "sentences", "wav")
        
        # Loop through all the wav files in the session directory
        for root, dirs, files in os.walk(session_dir):
            for file in files:
                # Check if the file is a wav file
                if file.endswith(".wav"):
                    # Get the path to the wav file
                    wav_file = os.path.join(root, file)
                    
                    # Append the path to the wav files list
                    wav_files.append(wav_file)
    
    return wav_files

Let's get the paths and iterate through the wav files in the dataset to extract the audio features and recognize the spoken text.

In [31]:
# Get the paths to all the wav files in the dataset
wav_files = get_wav_files(data_dir)
# Construct the DataFrame with id - the name of the wav file,recognized_text - the transcript of the speech and then the other columns with the extracted features from the audio
data_df = pd.DataFrame(columns=["id", "recognized_text"] + smile.feature_names)

# Check if the output CSV file already exists
if os.path.exists(output_csv):
    # Load the existing CSV file
    data_df = pd.read_csv(output_csv)
else:
    # Loop through all the wav files
    for wav_file in tqdm(wav_files):
        # Get the recognized text from the audio file
        recognized_text = whisper.transcribe(model, wav_file, fp16=False)
    
        # Extract the features from the audio file
        features = smile.process_file(wav_file)
    
        # Construct the row with the id, recognized text and the features
        row = [os.path.basename(wav_file), recognized_text] + list(features.values.flatten().tolist())
    
        # Append the row to the DataFrame
        data_df.loc[len(data_df)] = row

    # Save the DataFrame to a CSV file
    data_df.to_csv(output_csv, index=False)

# Show the shape of the data.csv
print(f'The shape of the data.csv is {data_df.shape}')

# Display the first 5 rows and several columns of the DataFrame
data_df.head()[["id", "recognized_text"] + smile.feature_names[:5]]

The shape of the data.csv is (10039, 6375)


Unnamed: 0,id,recognized_text,audspec_lengthL1norm_sma_range,audspec_lengthL1norm_sma_maxPos,audspec_lengthL1norm_sma_minPos,audspec_lengthL1norm_sma_quartile1,audspec_lengthL1norm_sma_quartile2
0,Ses01F_impro01_F000.wav,"{'text': ' Excuse me.', 'segments': [{'id': 0,...",0.421106,0.475936,0.203209,0.117974,0.134791
1,Ses01F_impro01_F001.wav,"{'text': ' Yeah.', 'segments': [{'id': 0, 'see...",0.983052,0.496183,0.381679,0.095625,0.105414
2,Ses01F_impro01_F002.wav,"{'text': ' Is there a problem?', 'segments': [...",0.885011,0.803922,0.0,0.091867,0.104345
3,Ses01F_impro01_F003.wav,"{'text': ' You did.', 'segments': [{'id': 0, '...",0.768693,0.492958,0.978873,0.126312,0.144761
4,Ses01F_impro01_F004.wav,"{'text': "" You were standing at the beginning ...",1.588438,0.467066,0.197605,0.099109,0.35545


Now, we have to add the class labels to the DataFrame. The classes are located in the iemocapTrans.csv in the emotion column. We will use the title column as the id to merge the two DataFrames.

In [32]:
# Load the ieomcapTrans.csv
ieomcapTrans = pd.read_csv('iemocapTrans.csv')

# Add .wav to the title column
ieomcapTrans['titre'] = ieomcapTrans['titre'] + '.wav'
ieomcapTrans

Unnamed: 0,_id,activation,dominance,emotion,end_time,start_time,titre,to_translate,translated,valence
0,625682441da7a5c1eaef3689,2.5,3.5,sad,6.0541,3.9987,Ses02M_impro02_F000.wav,I don't want you to go.,Je ne veux pas que tu partes.,2.5
1,625682441da7a5c1eaef368a,3.0,4.0,sad,15.1000,7.0366,Ses02M_impro02_M000.wav,"I know, I know. I don't want to go either bab...",Je sais je sais. Je ne veux pas y aller non pl...,2.0
2,625682441da7a5c1eaef368b,2.5,4.5,sad,23.3599,15.5524,Ses02M_impro02_F001.wav,I'm going to miss you too; I don't know what ...,Tu vas me manquer aussi; Je ne sais pas ce que...,1.5
3,625682441da7a5c1eaef368c,2.5,4.0,sad,26.4151,23.5790,Ses02M_impro02_F002.wav,I don't want to be a single mom.,Je ne veux pas être une mère célibataire.,1.5
4,625682441da7a5c1eaef368d,3.0,3.5,sad,31.4253,26.7598,Ses02M_impro02_M001.wav,You won't be. I'll be back; I'll be back befo...,Vous ne le serez pas. Je reviendrai; Je serai ...,3.5
...,...,...,...,...,...,...,...,...,...,...
10034,6256a3f81da7a5c1eaef862b,3.0,3.0,ang,480.1500,473.5225,Ses01M_script01_3_F028.wav,Everything Chris do you understand that? To ...,Tout Chris est-ce que tu comprends ça ? A moi ...,2.5
10035,6256a3f81da7a5c1eaef862c,3.0,3.0,fru,495.6700,481.6500,Ses01M_script01_3_F029.wav,And your money there's nothing wrong in your ...,Et votre argent il n'y a rien de mal dans votr...,2.5
10036,6256a3f81da7a5c1eaef862d,2.0,2.0,hap,502.2700,499.4600,Ses01M_script01_3_M042.wav,Annie...,Anni...,3.5
10037,6256a3f81da7a5c1eaef862e,2.5,2.5,hap,511.4612,507.7700,Ses01M_script01_3_M043.wav,I'm going to make a fortune for you.,Je vais te faire fortune.,4.0


In [33]:
# Merge the two DataFrames, add only the emotion column from the ieomcapTrans DataFrame
data_df = pd.merge(data_df, ieomcapTrans[['titre', 'emotion']], left_on='id', right_on='titre', how='left')
# drop the title column
data_df.drop('titre', axis=1, inplace=True)

# Save the DataFrame to a CSV file
data_df.to_csv("data_classes.csv", index=False)

data_df.head()[["id", "recognized_text", "emotion"] + smile.feature_names[:5]]

Unnamed: 0,id,recognized_text,emotion,audspec_lengthL1norm_sma_range,audspec_lengthL1norm_sma_maxPos,audspec_lengthL1norm_sma_minPos,audspec_lengthL1norm_sma_quartile1,audspec_lengthL1norm_sma_quartile2
0,Ses01F_impro01_F000.wav,"{'text': ' Excuse me.', 'segments': [{'id': 0,...",neu,0.421106,0.475936,0.203209,0.117974,0.134791
1,Ses01F_impro01_F001.wav,"{'text': ' Yeah.', 'segments': [{'id': 0, 'see...",neu,0.983052,0.496183,0.381679,0.095625,0.105414
2,Ses01F_impro01_F002.wav,"{'text': ' Is there a problem?', 'segments': [...",neu,0.885011,0.803922,0.0,0.091867,0.104345
3,Ses01F_impro01_F003.wav,"{'text': ' You did.', 'segments': [{'id': 0, '...",neu,0.768693,0.492958,0.978873,0.126312,0.144761
4,Ses01F_impro01_F004.wav,"{'text': "" You were standing at the beginning ...",neu,1.588438,0.467066,0.197605,0.099109,0.35545


Let's check the distribution of the classes in the dataset.

In [34]:
# Display the distribution of the classes
data_df["emotion"].value_counts()

emotion
fru    2917
exc    1976
neu    1726
ang    1269
sad    1250
hap     656
sur     110
fea     107
oth      26
dis       2
Name: count, dtype: int64

Now, we will perform mapping to some of the emotion classes to reduce the number of classes.

In [35]:
# Map the emotions to the reduced classes, map sur, fear, oth and dis to "oth", the rest to their respective classes
data_df["emotion"] = data_df["emotion"].map(
    {
        "ang": "ang",
        "hap": "hap",
        "neu": "neu",
        "sad": "sad",
        "fru": "fru",
        "exc": "exc",
        "fea": "oth",
        "sur": "oth",
        "oth": "oth",
        "dis": "oth",
    }
)

# Display the distribution of the classes
data_df["emotion"].value_counts()

emotion
fru    2917
exc    1976
neu    1726
ang    1269
sad    1250
hap     656
oth     245
Name: count, dtype: int64

In [36]:
# Save the DataFrame to a CSV file
data_df.to_csv("data_classes_reduced.csv", index=False)