# Video-Audio Emotion Congruence Detection - GPU Training

This notebook trains the multimodal emotion recognition model on Google Colab with GPU support.

**Author**: Rohan Jain  
**GitHub**: https://github.com/Rohanjain2312/video-audio-emotion-congruence

---

## Setup Instructions

1. **Enable GPU**: Runtime → Change runtime type → GPU (T4 or better)
2. **Upload Kaggle Credentials**: For CREMA-D dataset
3. **Run all cells** in order

---

## 1. Check GPU Availability

In [None]:
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("⚠️ WARNING: GPU not available. Enable GPU in Runtime settings!")

## 2. Clone Repository

In [None]:
# Clone the repository
!git clone https://github.com/Rohanjain2312/video-audio-emotion-congruence.git
%cd video-audio-emotion-congruence

## 3. Install Dependencies

In [None]:
# Install required packages
!pip install -q -r requirements.txt

print("✓ Dependencies installed")

## 4. Setup Kaggle Credentials

For CREMA-D dataset access:
1. Download `kaggle.json` from https://www.kaggle.com/settings
2. Upload it using the file browser (left sidebar)
3. Run the cell below

In [None]:
import os
from pathlib import Path

# Setup Kaggle credentials
kaggle_dir = Path.home() / '.kaggle'
kaggle_dir.mkdir(exist_ok=True)

# Check if kaggle.json exists in current directory
if Path('kaggle.json').exists():
    !cp kaggle.json ~/.kaggle/
    !chmod 600 ~/.kaggle/kaggle.json
    print("✓ Kaggle credentials configured")
else:
    print("⚠️ WARNING: kaggle.json not found. Upload it to access CREMA-D dataset.")
    print("Download from: https://www.kaggle.com/settings")

## 5. Download Datasets

This will download:
- **RAVDESS**: All 24 actors (~1GB)
- **CREMA-D**: Full dataset (~4GB)

**Total time**: 15-30 minutes depending on connection

In [None]:
# Download RAVDESS (audio)
print("Downloading RAVDESS audio...")
!python src/data/download_datasets.py --dataset ravdess --gpu_mode

print("\nDownloading RAVDESS videos...")
!python src/data/download_videos.py --gpu_mode

print("\nDownloading CREMA-D...")
!python src/data/download_datasets.py --dataset cremad --gpu_mode

print("\n✓ All datasets downloaded")

## 6. Verify Dataset Download

In [None]:
from pathlib import Path

# Check RAVDESS
ravdess_path = Path("data/raw/RAVDESS")
ravdess_videos = list(ravdess_path.rglob("*.mp4"))
ravdess_audio = list(ravdess_path.rglob("*.wav"))

print("RAVDESS:")
print(f"  Videos: {len(ravdess_videos)}")
print(f"  Audio: {len(ravdess_audio)}")

# Check CREMA-D
cremad_path = Path("data/raw/CREMA-D")
if cremad_path.exists():
    cremad_videos = list(cremad_path.rglob("*.flv"))
    cremad_audio = list(cremad_path.rglob("*.wav"))
    
    print("\nCREMA-D:")
    print(f"  Videos: {len(cremad_videos)}")
    print(f"  Audio: {len(cremad_audio)}")
else:
    print("\n⚠️ CREMA-D not found. Continuing with RAVDESS only.")

print("\n✓ Dataset verification complete")

## 7. Preprocess Data

Creates metadata files and train/val/test splits

In [None]:
!python src/data/preprocess.py

print("\n✓ Data preprocessing complete")

## 8. Verify Data Loaders

In [None]:
# Quick test of data loaders
import sys
sys.path.append('src')

from data.dataset_loaders import get_dataloaders

print("Testing data loaders...")
train_loader, val_loader, test_loader = get_dataloaders(
    "data/processed/train_metadata.csv",
    "data/processed/val_metadata.csv",
    "data/processed/test_metadata.csv",
    batch_size=4,
    mode='both'
)

# Get one batch
batch = next(iter(train_loader))
print(f"\n✓ Data loaders working")
print(f"  Video shape: {batch['video'].shape}")
print(f"  Audio shape: {batch['audio'].shape}")
print(f"  Batch size: {batch['video'].shape[0]}")

## 9. Train Multimodal Model

**Training configuration**:
- Epochs: 20
- Batch size: 16
- Learning rate: 1e-4
- Frozen backbones (VideoMAE + Wav2Vec2)

**Expected time**: 2-4 hours on T4 GPU

In [None]:
# Train multimodal model
!python src/training/train_multimodal.py \
    --data_dir ./data/processed \
    --checkpoint_dir ./checkpoints/multimodal \
    --num_epochs 20 \
    --batch_size 16 \
    --learning_rate 1e-4

print("\n✓ Multimodal training complete!")

## 10. Train Baseline Models (Optional)

Train video-only and audio-only baselines for comparison

In [None]:
# Train video-only baseline
print("Training Video-Only baseline...")
!python src/training/train_video_only.py \
    --data_dir ./data/processed \
    --checkpoint_dir ./checkpoints/video_only \
    --num_epochs 20 \
    --batch_size 16

print("\n✓ Video-only training complete!")

In [None]:
# Train audio-only baseline
print("Training Audio-Only baseline...")
!python src/training/train_audio_only.py \
    --data_dir ./data/processed \
    --checkpoint_dir ./checkpoints/audio_only \
    --num_epochs 20 \
    --batch_size 16

print("\n✓ Audio-only training complete!")

## 11. Evaluate Models

In [None]:
# Evaluate multimodal model
print("Evaluating Multimodal model...")
!python src/evaluation/evaluate.py \
    --checkpoint checkpoints/multimodal/best_model.pth \
    --model_type multimodal \
    --data_dir ./data/processed \
    --output_dir ./outputs/metrics

In [None]:
# Evaluate video-only baseline (if trained)
print("Evaluating Video-Only model...")
!python src/evaluation/evaluate.py \
    --checkpoint checkpoints/video_only/best_model.pth \
    --model_type video_only \
    --data_dir ./data/processed \
    --output_dir ./outputs/metrics

In [None]:
# Evaluate audio-only baseline (if trained)
print("Evaluating Audio-Only model...")
!python src/evaluation/evaluate.py \
    --checkpoint checkpoints/audio_only/best_model.pth \
    --model_type audio_only \
    --data_dir ./data/processed \
    --output_dir ./outputs/metrics

## 12. Compare All Models

In [None]:
# Compare all models
!python src/evaluation/compare_baselines.py \
    --metrics_dir ./outputs/metrics \
    --output_dir ./outputs/comparisons

## 13. View Results

In [None]:
# Display evaluation results
import json
from pathlib import Path
import pandas as pd

# Load multimodal metrics
metrics_path = Path("outputs/metrics/multimodal_metrics.json")
if metrics_path.exists():
    with open(metrics_path, 'r') as f:
        metrics = json.load(f)
    
    print("\n" + "="*70)
    print("MULTIMODAL MODEL RESULTS")
    print("="*70)
    
    em = metrics['emotion_metrics']
    print(f"\nEmotion Recognition:")
    print(f"  Accuracy:         {em['accuracy']:.4f}")
    print(f"  Macro F1:         {em['macro_f1']:.4f}")
    print(f"  Weighted F1:      {em['weighted_f1']:.4f}")
    
    if 'congruence_metrics' in metrics:
        cm = metrics['congruence_metrics']
        print(f"\nCongruence Detection:")
        print(f"  Accuracy:         {cm['accuracy']:.4f}")
        print(f"  F1:               {cm['f1']:.4f}")
    
    print("\n" + "="*70)
else:
    print("Metrics file not found. Run evaluation first.")

In [None]:
# Display comparison table
import pandas as pd
from pathlib import Path

comparison_path = Path("outputs/comparisons/overall_comparison.csv")
if comparison_path.exists():
    df = pd.read_csv(comparison_path)
    print("\nModel Comparison:")
    print(df.to_string(index=False))
else:
    print("Comparison file not found. Run comparison script first.")

## 14. Display Visualizations

In [None]:
from IPython.display import Image, display
from pathlib import Path

# Display confusion matrix
confusion_matrix_path = Path("outputs/metrics/multimodal_emotion_confusion_matrix.png")
if confusion_matrix_path.exists():
    print("Emotion Confusion Matrix:")
    display(Image(filename=str(confusion_matrix_path)))
else:
    print("Confusion matrix not found")

In [None]:
# Display per-class metrics
per_class_path = Path("outputs/metrics/multimodal_per_class_metrics.png")
if per_class_path.exists():
    print("Per-Class Metrics:")
    display(Image(filename=str(per_class_path)))
else:
    print("Per-class metrics plot not found")

In [None]:
# Display comparison plot
comparison_plot_path = Path("outputs/comparisons/overall_comparison.png")
if comparison_plot_path.exists():
    print("Model Comparison:")
    display(Image(filename=str(comparison_plot_path)))
else:
    print("Comparison plot not found")

## 15. Test Inference

In [None]:
# Test inference on a sample video
from pathlib import Path

# Find a sample video
sample_videos = list(Path("data/raw/RAVDESS").rglob("*.mp4"))

if sample_videos:
    test_video = str(sample_videos[0])
    print(f"Testing inference on: {test_video}\n")
    
    !python src/inference/inference_pipeline.py \
        --checkpoint checkpoints/multimodal/best_model.pth \
        --video "$test_video"
else:
    print("No sample videos found")

## 16. Download Trained Models

Download checkpoints to your local machine

In [None]:
# Zip checkpoints for download
!zip -r checkpoints.zip checkpoints/
!zip -r outputs.zip outputs/

print("\n✓ Files zipped and ready for download")
print("Download from the Files panel (left sidebar):")
print("  - checkpoints.zip (trained models)")
print("  - outputs.zip (metrics and visualizations)")

## 17. Mount Google Drive (Optional)

Save checkpoints directly to Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Copy checkpoints to Drive
!mkdir -p "/content/drive/MyDrive/emotion-model-checkpoints"
!cp -r checkpoints/* "/content/drive/MyDrive/emotion-model-checkpoints/"
!cp -r outputs/* "/content/drive/MyDrive/emotion-model-checkpoints/"

print("✓ Checkpoints saved to Google Drive")

---

## Summary

Training complete! You have:

✅ Downloaded and preprocessed datasets  
✅ Trained multimodal model (and optional baselines)  
✅ Evaluated performance with comprehensive metrics  
✅ Generated visualizations and comparisons  
✅ Saved checkpoints for deployment

### Next Steps:

1. **Download checkpoints** using the zip files created above
2. **Deploy to Hugging Face Spaces** using the Gradio app
3. **Share results** on GitHub with updated README

---

**Project**: [Video-Audio Emotion Congruence](https://github.com/Rohanjain2312/video-audio-emotion-congruence)  
**Author**: [Rohan Jain](https://www.linkedin.com/in/jaroh23/)