# MER: Multimodal Emotion Recognition

This notebook runs the entire process of training and evaluating a multimodal emotion recognition model using the eNTERFACE'05 dataset.

In [1]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [9]:
# Install required packages
!pip install librosa opencv-python-headless tqdm scikit-learn matplotlib seaborn torch torchvision



In [2]:
# Clone the project repository
!git clone https://github.com/KanmaniNagarajan/MER.git
%cd MER

Cloning into 'MER'...
remote: Enumerating objects: 13, done.[K
remote: Counting objects: 100% (13/13), done.[K
remote: Compressing objects: 100% (12/12), done.[K
remote: Total 13 (delta 1), reused 0 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (13/13), 9.23 KiB | 9.23 MiB/s, done.
Resolving deltas: 100% (1/1), done.
/content/MER


In [8]:
# Run the test flow
!python src/test_flow.py

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Loading samples: 100% 2/2 [00:00<00:00, 50840.05it/s]
Loading samples: 100% 2/2 [00:00<00:00, 43690.67it/s]
Loading samples: 100% 2/2 [00:00<00:00, 54827.50it/s]
Loading samples: 100% 2/2 [00:00<00:00, 37117.73it/s]
Loading samples: 100% 2/2 [00:00<00:00, 55188.21it/s]
Loading samples: 0it [00:00, ?it/s]
Loading samples: 100% 2/2 [00:00<00:00, 55924.05it/s]
Loading samples: 100% 2/2 [00:00<00:00, 54827.50it/s]
Loading samples: 100% 2/2 [00:00<00:00, 52428.80it/s]
Loading samples: 100% 2/2 [00:00<00:00, 56679.78it/s]
Loading samples: 100% 2/2 [00:00<00:00, 23237.14it/s]
Loading samples: 0it [00:00, ?it/s]
Loading samples: 100% 2/2 [00:00<00:00, 53430.62it/s]
Loading samples: 100% 2/2 [00:00<00:00, 50840.05it/s]
Loading samples: 100% 2/2 [00:00<00:00, 53430.62it/s]
Loading samples: 100% 2/2 [00:00<00:00, 47127.01it/s]
Loading samples: 100% 2/2 [00:00<00:00, 40329.85it/s]
Loading samples: 0it [00:00, ?it/s]
Loading samples: 

## Test Flow Results

Review the output above to ensure that the entire pipeline is working correctly. If everything looks good, proceed with the full training process.

In [None]:
# Run the full training process
!python src/train.py

In [None]:
# Run the evaluation process
!python src/evaluate.py

## Results

The training logs, confusion matrix, ROC-AUC curve, and other metrics can be found in the `results` folder in your Google Drive.

In [None]:
# Interactive Test Widget
import ipywidgets as widgets
from IPython.display import display
import torch
from src.model import EmotionRecognitionModel
from src.config import DEVICE, EMOTIONS, RESULTS_DIR
import os

model = EmotionRecognitionModel().to(DEVICE)
model.load_state_dict(torch.load(os.path.join(RESULTS_DIR, 'best_model.pth')))
model.eval()

def predict_emotion(video_path, audio_path):
    # This is a placeholder. You need to implement proper video and audio loading here.
    video = torch.randn(1, 3, 16, 112, 112).to(DEVICE)  # Assuming 16 frames of 112x112 RGB images
    audio = torch.randn(1, 1, 16000).to(DEVICE)  # Assuming 1 second of audio at 16kHz

    with torch.no_grad():
        output = model(audio, video)
        _, predicted = torch.max(output, 1)

    return EMOTIONS[predicted.item()]

video_path = widgets.Text(description='Video Path:')
audio_path = widgets.Text(description='Audio Path:')
predict_button = widgets.Button(description='Predict')
output = widgets.Output()

def on_button_clicked(_):
    with output:
        output.clear_output()
        print(f"Predicted emotion: {predict_emotion(video_path.value, audio_path.value)}")

predict_button.on_click(on_button_clicked)

display(video_path, audio_path, predict_button, output)

In [None]:
from google.colab import drive
drive.mount('/content/drive')