# HW 2 Multimodal Machine Learning for Emotion Recognition

- main (this notebook) with sub notebooks
    1. audio (acoustic)
    2. text (lexical)
    3. visual
- IEMOCAP (Interactive Emotional Dyadic Motion Capture) database

# TODOs
In the file dataset.csv, you are provided with the relative address for the audio, visual and text feature files alongwith their corresponding emotion labels. There are 5 sessions and each session has one male and one female speaker.

1. You can use different pooling methods (e.g., max pooling, mean pooling) for reducing the temporal dimensionof the audio and visual files, or use your preferred temporal modeling (e.g., RNN, GRU, LSTM) to obtainfeature vectors per data point.1

2. Perform a 4-class emotion classification using your preferred classifier with the obtained feature vectors. Selectthe parameters using Grid Search (search over a range for hyper-parameters). Perform any additional stepsyou see fit to obtain the best results.

3. Report your classification results on individual modalities (vision, speech, and text) using F1-micro metricon a 10-fold subject-independent cross validation.

4. How do you handle the problem of class imbalance? Plot the confusion matrix for the 4 classes.

5. Use both early fusion (concatenate features from different modalities) and late fusion (majority vote over theoutputs of the unimodal models) to obtain multimodal classification results. Report and compare the resultsfor both fusion techniques.

6. Provide an interpretation on your results from the performed unimodal and multimodal classification tasks.Which one is performing best and why?

*Note*: You are only allowed to use the features and labels provided by us with this assignment. Please refrainfrom using the original data; assignments submitted with any other labels or data will not be graded

#  Imports + Load Data

In [1]:
import pandas as pd
import numpy as np

In [2]:
BASE = "/Users/brinkley97/Documents/development/"
CLASS_PATH = "classes/csci_535_multimodal_probabilistic_learning/"
DATASET_PATH = "datasets/hw_2/"

SESSION_1 = "Session1/"
SESSION_2 = "Session2"
SESSION_3 = "Session3/"
SESSION_4 = "Session4/"
SESSION_5 = "Session5/"

SES_01F = "Ses01F_impro01/"

FILE = "iemocapRelativeAddressForFiles.csv"
file_paths = BASE + CLASS_PATH + DATASET_PATH + FILE

In [3]:
# pd.set_option('max_colwidth', 62)

In [4]:
def load_data(file):
    original_data = pd.read_csv(file)
    # original_data = pd.DataFrame(file)
    copy_of_data = original_data.copy()
    return copy_of_data

In [5]:
# 4 classes - anger(0), sadness(1) and happiness(2),and neutral(3)
dataset_copy = load_data(file_paths)
dataset_copy

Unnamed: 0,file_name_list,speakers,visual_features,acoustic_features,lexical_features,emotion_labels
0,Ses01F_impro01_F001,F01,/features/visual_features/Session1/Ses01F_impr...,/features/acoustic_features/Session1/Ses01F_im...,/features/lexical_features/Session1/Ses01F_imp...,3
1,Ses01F_impro01_M011,M01,/features/visual_features/Session1/Ses01F_impr...,/features/acoustic_features/Session1/Ses01F_im...,/features/lexical_features/Session1/Ses01F_imp...,0
2,Ses01F_impro02_F002,F01,/features/visual_features/Session1/Ses01F_impr...,/features/acoustic_features/Session1/Ses01F_im...,/features/lexical_features/Session1/Ses01F_imp...,1
3,Ses01F_impro02_F003,F01,/features/visual_features/Session1/Ses01F_impr...,/features/acoustic_features/Session1/Ses01F_im...,/features/lexical_features/Session1/Ses01F_imp...,3
4,Ses01F_impro02_F004,F01,/features/visual_features/Session1/Ses01F_impr...,/features/acoustic_features/Session1/Ses01F_im...,/features/lexical_features/Session1/Ses01F_imp...,1
...,...,...,...,...,...,...
1331,Ses05M_script03_2_M029,M05,/features/visual_features/Session5/Ses05M_scri...,/features/acoustic_features/Session5/Ses05M_sc...,/features/lexical_features/Session5/Ses05M_scr...,0
1332,Ses05M_script03_2_M039,M05,/features/visual_features/Session5/Ses05M_scri...,/features/acoustic_features/Session5/Ses05M_sc...,/features/lexical_features/Session5/Ses05M_scr...,0
1333,Ses05M_script03_2_M041,M05,/features/visual_features/Session5/Ses05M_scri...,/features/acoustic_features/Session5/Ses05M_sc...,/features/lexical_features/Session5/Ses05M_scr...,0
1334,Ses05M_script03_2_M042,M05,/features/visual_features/Session5/Ses05M_scri...,/features/acoustic_features/Session5/Ses05M_sc...,/features/lexical_features/Session5/Ses05M_scr...,0


## Load Audio (Acoustic) features

In [6]:
audio_features = BASE + CLASS_PATH + DATASET_PATH + 'acoustic_features/' + SESSION_1 + SES_01F
audio_features
file_name = audio_features + 'Ses01F_impro01_F001.npy'

In [8]:
testing_audio_session_1 = np.load(file_name)
testing_audio_session_1

array([[-0.02412775,  1.3312446 , -1.0692344 ,  0.        ,  0.        ,
         0.        ,  0.49266195, -0.34168002,  0.        ,  0.        ,
         1.5995374 ,  0.9650637 ,  0.        ,  0.4452419 , -0.02679965,
        -0.4459556 , -0.6602139 ,  0.03695456,  0.        ,  0.25653026,
         0.        ,  0.76028717,  0.        , -0.06926002, -0.10694285,
        -1.5329561 , -0.30561072, -0.76459134, -1.316774  ,  1.2008328 ,
         0.1895207 , -0.54566604,  0.5053761 , -0.15207583, -0.04264257,
        -0.6639512 ,  0.46754697,  0.        ,  0.        , -0.4157218 ,
        -0.20238322,  0.        ,  0.00785335,  0.        , -0.51123357,
        -0.36928818, -0.20766369,  0.1882203 , -0.28260076, -0.13990402,
         0.23986907,  0.        ,  0.        ,  0.        ,  0.66090965,
        -0.04950017, -0.34798825,  0.896122  , -0.0301679 , -0.09909478,
        -0.46761015,  0.1463503 , -0.10861468, -0.03854222,  0.        ,
         0.        ,  0.        , -0.82999325, -0.2

## Load Text (Lexical) features
file_paths = BASE + CLASS_PATH + DATASET_PATH + FILE

## Load Visual features
file_paths = BASE + CLASS_PATH + DATASET_PATH + FILE