# Step 3: Dataset Setup
## Different options
- First one is downloading using a script that places the data in the download folder (usually recommended)
- Second one is uploading the dataset to your personal/institutional Google Drive and load it from there ([Read More](https://saturncloud.io/blog/google-colab-how-to-read-data-from-my-google-drive/))
- Place the download script directly here on colab

You are free to do as you please in this phase.


In [None]:
# Import all required libraries
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from pathlib import Path

# Setup project path
NOTEBOOK_DIR = Path(os.getcwd())
PROJECT_ROOT = NOTEBOOK_DIR.parent if NOTEBOOK_DIR.name == 'ravdess_train' else NOTEBOOK_DIR
sys.path.insert(0, str(PROJECT_ROOT))

# Print system and environment info
print("="*80)
print("üîß PROJECT ENVIRONMENT INFO")
print("="*80)
print(f"Python Version: {sys.version.split()[0]}")
print(f"PyTorch Version: {torch.__version__}")
print(f"NumPy Version: {np.__version__}")
print(f"Current Working Directory: {os.getcwd()}")
print(f"Project Root: {PROJECT_ROOT}")
print()

# Check CUDA availability
if torch.cuda.is_available():
    print(f"‚úÖ CUDA is AVAILABLE")
    print(f"   GPU Device: {torch.cuda.get_device_name(0)}")
    print(f"   CUDA Version: {torch.version.cuda}")
    print(f"   Number of GPUs: {torch.cuda.device_count()}")
    device = torch.device('cuda')
else:
    print(f"‚ùå CUDA is NOT available - Using CPU")
    device = torch.device('cpu')

print(f"   Default Device: {device}")
print("="*80)

üîß PROJECT ENVIRONMENT INFO
Python Version: 3.10.0
PyTorch Version: 2.10.0+cu128
NumPy Version: 2.2.6
Current Working Directory: d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-25

‚úÖ CUDA is AVAILABLE
   GPU Device: NVIDIA GeForce RTX 5060 Ti
   CUDA Version: 12.8
   Number of GPUs: 1
   Default Device: cuda


In [38]:
from utils.download_dataset_local import dowload_ravdess_local

dataset_path = dowload_ravdess_local()
if dataset_path:
    print(f"‚úÖ Downloaded RAVDESS dataset locally in {dataset_path}...")
else:
    print("‚ùå RAVDESS dataset download failed.")
    
ravdess_path = dataset_path


--- Download RAVDESS (locale) ---
‚úì RAVDESS gi√† presente: d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-25\data\ravdess
Numero di file: 2880
‚úÖ Downloaded RAVDESS dataset locally in d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-25\data\ravdess...


In [39]:
from torch.utils.data import DataLoader
from dataset.custom_ravdess_dataset import CustomRAVDESSDataset
from utils.get_dataset_statistics import print_dataset_stats

print("="*80)
print("üîÑ CREAZIONE DATASET E DATALOADER - RAVDESS")
print("="*80)

# Verifica percorso
if not ravdess_path or not Path(ravdess_path).exists():
    raise ValueError(f"‚ùå Dataset RAVDESS non trovato in: {ravdess_path}")

print(f"‚úÖ Usando dataset da: {ravdess_path}\n")

# Crea i dataset
train_dataset = CustomRAVDESSDataset(dataset_root=ravdess_path, split='train')
val_dataset = CustomRAVDESSDataset(dataset_root=ravdess_path, split='validation')
test_dataset = CustomRAVDESSDataset(dataset_root=ravdess_path, split='test')

# Crea i dataloader
batch_size = 32
train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)


# Riepilogo dataloader
print("\n" + "="*80)
print("üì¶ DATALOADER SUMMARY")
print("="*80)
print(f"Train Dataloader:      {len(train_dataloader)} batch √ó {batch_size} samples = {len(train_dataset)} totali")
print(f"Validation Dataloader: {len(val_dataloader)} batch √ó {batch_size} samples = {len(val_dataset)} totali")
print(f"Test Dataloader:       {len(test_dataloader)} batch √ó {batch_size} samples = {len(test_dataset)} totali")
print("="*80)

üîÑ CREAZIONE DATASET E DATALOADER - RAVDESS
‚úÖ Usando dataset da: d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-25\data\ravdess

üìä Statistiche del dataset RAVDESS:

üìä ANALISI RAVDESS TRAINING SET
üîπ Samples Totali: 1440
üîπ Attori (20): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
   - Maschi:  10
   - Femmine: 10

üé≠ Distribuzione Emozioni:
   - Angry     :  320 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
   - Happy     :  320 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
   - Neutral   :  480 ( 33.3%) ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   - Sad       :  320 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
----------------------------------------
üìä Statistiche del dataset RAVDESS:

üìä ANALISI RAVDESS VALIDATION SET
üîπ Samples Totali: 144
üîπ Attori (2): [21, 22]
   - Maschi:  1
   - Femmine: 1

üé≠ Distribuzione Emozioni:
   - Angry     :   32 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
   - Happy     :   32 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
   - Neutral   :   48 ( 33.3%) ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   - Sad       :   32 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
-------------

In [40]:
# Ricarica il modulo per usare la versione fixata
import importlib
import sys
if 'utils.download_dataset_local' in sys.modules:
    importlib.reload(sys.modules['utils.download_dataset_local'])

from utils.download_dataset_local import dowload_iemocap_local

iemocap_dataset_path = dowload_iemocap_local()
if iemocap_dataset_path:
    print(f"‚úÖ Downloaded IEMOCAP dataset locally in {iemocap_dataset_path}...")
else:
    print("‚ùå IEMOCAP dataset download failed.")
    
iemocap_path = iemocap_dataset_path


--- Download IEMOCAP (locale) ---
‚úì IEMOCAP gi√† presente: d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-25\data\iemocap
Numero di file: 81249
‚úÖ Downloaded IEMOCAP dataset locally in d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-25\data\iemocap\IEMOCAP_full_release...


In [41]:
# DEBUG: Verifica percorsi IEMOCAP
print("="*80)
print("üîç DEBUG - VERIFICA PERCORSI IEMOCAP")
print("="*80)

iemocap_debug_path = iemocap_path
print(f"1Ô∏è‚É£  Percorso passato: {iemocap_debug_path}\n")

# Controlla se il percorso esiste
print(f"2Ô∏è‚É£  Percorso esiste: {Path(iemocap_debug_path).exists()}\n")

# Lista cosa c'√® dentro
if Path(iemocap_debug_path).exists():
    print(f"3Ô∏è‚É£  Contenuto di {iemocap_debug_path}:")
    for item in Path(iemocap_debug_path).iterdir():
        print(f"   - {item.name} {'(DIR)' if item.is_dir() else ''}")
    print()

# Cerca le cartelle Session
print(f"4Ô∏è‚É£  Ricerca cartelle Session:")
session_folders = list(Path(iemocap_debug_path).glob("Session*"))
print(f"   Trovate: {len(session_folders)} cartelle Session")
for s in session_folders[:3]:
    print(f"   - {s.name}")
print()

# Se ci sono Session, controlla la struttura di una
if session_folders:
    session1 = session_folders[0]
    print(f"5Ô∏è‚É£  Dentro {session1.name}:")
    for item in (session1).iterdir():
        print(f"   - {item.name}")
    print()
    
    # Controlla wav folder
    wav_path = session1 / "sentences" / "wav"
    print(f"6Ô∏è‚É£  Percorso wav: {wav_path}")
    print(f"   Esiste: {wav_path.exists()}")
    if wav_path.exists():
        wav_items = list(wav_path.iterdir())
        print(f"   Contiene {len(wav_items)} elementi:")
        for item in wav_items[:5]:
            print(f"      - {item.name} {'(DIR)' if item.is_dir() else ''}")
    print()
    
    # Controlla label folder
    label_path = session1 / "dialog" / "EmoEvaluation"
    print(f"7Ô∏è‚É£  Percorso label: {label_path}")
    print(f"   Esiste: {label_path.exists()}")
    if label_path.exists():
        label_items = list(label_path.glob("*.txt"))
        print(f"   Trovati {len(label_items)} file .txt")
        for item in label_items[:3]:
            print(f"      - {item.name}")

print("="*80)

üîç DEBUG - VERIFICA PERCORSI IEMOCAP
1Ô∏è‚É£  Percorso passato: d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-25\data\iemocap\IEMOCAP_full_release

2Ô∏è‚É£  Percorso esiste: True

3Ô∏è‚É£  Contenuto di d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-25\data\iemocap\IEMOCAP_full_release:
   - Documentation (DIR)
   - Session1 (DIR)
   - Session2 (DIR)
   - Session3 (DIR)
   - Session4 (DIR)
   - Session5 (DIR)

4Ô∏è‚É£  Ricerca cartelle Session:
   Trovate: 5 cartelle Session
   - Session1
   - Session2
   - Session3

5Ô∏è‚É£  Dentro Session1:
   - dialog
   - sentences

6Ô∏è‚É£  Percorso wav: d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-25\data\iemocap\IEMOCAP_full_release\Session1\sentences\wav
   Esiste: True
   Contiene 28 elementi:
      - Ses01F_impro01 (DIR)
      - Ses01F_impro02 (DIR)
      - Ses01F_impro03 (DIR)
      - Ses01F_impro04 (DIR)
      - Ses01F_impro05 (DIR)

7Ô∏è‚É£  Percorso label: d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-

In [42]:
from dataset.custom_iemocap_dataset import CustomIEMOCAPDataset
from utils.get_dataset_statistics import print_iemocap_stats
print("="*80)
print("üîÑ CREAZIONE DATASET E DATALOADER - IEMOCAP")
print("="*80)

# Verifica percorso
if not iemocap_path or not Path(iemocap_path).exists():
    raise ValueError(f"‚ùå Dataset IEMOCAP non trovato in: {iemocap_path}")

print(f"‚úÖ Usando dataset da: {iemocap_path}\n")

# Crea i dataset
train_iemocap_dataset = CustomIEMOCAPDataset(dataset_root=iemocap_path, split='train')
val_iemocap_dataset = CustomIEMOCAPDataset(dataset_root=iemocap_path, split='validation')
test_iemocap_dataset = CustomIEMOCAPDataset(dataset_root=iemocap_path, split='test')

# Crea i dataloader
batch_size = 32
train_iemocap_dataloader = DataLoader(train_iemocap_dataset, batch_size=batch_size, shuffle=True)
val_iemocap_dataloader = DataLoader(val_iemocap_dataset, batch_size=batch_size, shuffle=False)
test_iemocap_dataloader = DataLoader(test_iemocap_dataset, batch_size=batch_size, shuffle=False)


# Riepilogo dataloader
print("\n" + "="*80)
print("üì¶ DATALOADER SUMMARY - IEMOCAP")
print("="*80)
print(f"Train Dataloader:      {len(train_iemocap_dataloader)} batch √ó {batch_size} samples = {len(train_iemocap_dataset)} totali")
print(f"Validation Dataloader: {len(val_iemocap_dataloader)} batch √ó {batch_size} samples = {len(val_iemocap_dataset)} totali")
print(f"Test Dataloader:       {len(test_iemocap_dataloader)} batch √ó {batch_size} samples = {len(test_iemocap_dataset)} totali")
print("="*80)

üîÑ CREAZIONE DATASET E DATALOADER - IEMOCAP
‚úÖ Usando dataset da: d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-25\data\iemocap\IEMOCAP_full_release

‚úÖ Caricate 5531 etichette
üîç Raccogliendo campioni audio...
‚úÖ Raccolti 2943 campioni audio validi
   - Solo campioni improvvisati
   - Emozioni: ['neutral', 'happy', 'sad', 'angry', 'happy']
üìä Statistiche del dataset IEMOCAP:

üìä ANALISI IEMOCAP TRAINING SET

üîπ SAMPLES TOTALI: 1678
üîπ SESSIONI: ['1', '2', '3']
üîπ SPEAKER UNICI (session, gender): 6
   Elenco: [('1', 'F'), ('1', 'M'), ('2', 'F'), ('2', 'M'), ('3', 'F'), ('3', 'M')]
üîπ IMPROVVISAZIONI UNICHE: 12
   Elenco: ['01', '02', '03', '04', '05', '05a', '05b', '06', '07', '08', '08a', '08b']

üë• SPEAKER INDEPENDENCE (per verificare leakage):
   - Sessione 1: (Ses1, F), (Ses1, M)
   - Sessione 2: (Ses2, F), (Ses2, M)
   - Sessione 3: (Ses3, F), (Ses3, M)

üé≠ DISTRIBUZIONE EMOZIONI:
   - Angry     :  174 ( 10.4%) ‚ñà‚ñà
   - Happy     :  472 ( 28.1%) 

 Weights & Biases : Genera i grafici e compara gli esperimenti

In [43]:
import wandb
import os
os.environ['WANDB_API_KEY'] = '7ade30086de7899bed412e3eb5c2da065c146f90'
wandb.login()

True

In [None]:
import subprocess
import sys

# Esegui train.py dal project root
result = subprocess.run(
    [sys.executable, str(PROJECT_ROOT / 'ravdess_train' / 'train.py'), '--model', 'CRNN_BiLSTM'],
    cwd=str(PROJECT_ROOT),
    capture_output=False
)

#result = subprocess.run(
#    [sys.executable, str(PROJECT_ROOT / 'ravdess_train' / 'train.py'), '--model', 'CRNN_BiGRU'],
#    cwd=str(PROJECT_ROOT)
#)

Using device: cuda

‚úÖ RAVDESS trovato: data\ravdess

üìä Statistiche del dataset RAVDESS:

üìä ANALISI RAVDESS TRAINING SET
üîπ Samples Totali: 1440
üîπ Attori (20): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
   - Maschi:  10
   - Femmine: 10

üé≠ Distribuzione Emozioni:
   - Angry     :  320 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
   - Happy     :  320 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
   - Neutral   :  480 ( 33.3%) ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   - Sad       :  320 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
----------------------------------------
üìä Statistiche del dataset RAVDESS:

üìä ANALISI RAVDESS VALIDATION SET
üîπ Samples Totali: 144
üîπ Attori (2): [21, 22]
   - Maschi:  1
   - Femmine: 1

üé≠ Distribuzione Emozioni:
   - Angry     :   32 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
   - Happy     :   32 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
   - Neutral   :   48 ( 33.3%) ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   - Sad       :   32 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
----------------------------------------
Train samples: 1440
Val samples: 144

üèóÔ∏è ARCHITETTURA

wandb: Currently logged in as: pagliarellomatteo (pagliarellomatteo-politecnico-di-torino) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: setting up run qklf0ifk
wandb: Tracking run with wandb version 0.23.1
wandb: Run data is saved locally in d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-25\wandb\run-20260121_182422-qklf0ifk
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run train_20260121_192422
wandb:  View project at https://wandb.ai/pagliarellomatteo-politecnico-di-torino/speech-emotion-recognition
wandb:  View run at https://wandb.ai/pagliarellomatteo-politecnico-di-torino/speech-emotion-recognition/runs/qklf0ifk
  result = _VF.lstm(
wandb: updating run metadata
wandb: uploading output.log; uploading wandb-summary.json; uploading config.yaml
wandb: 
wandb: Run history:
wandb:            epoch ‚ñÅ‚ñÅ‚ñÅ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÉ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÑ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñÜ‚ñá‚ñá‚ñá‚ñá‚ñá‚ñá‚ñà‚ñà‚ñà
wandb: swa_val_ac

# Step 5: Evaluate your model



In [None]:
import subprocess
import sys

# Esegui eval.py dal project root
checkpoint_path = PROJECT_ROOT / 'checkpoints' / 'best_model.pth'
result = subprocess.run(
    [sys.executable, str(PROJECT_ROOT / 'ravdess_train' / 'eval.py'), '--model', 'CRNN_BiLSTM', '--checkpoint', str(checkpoint_path)],
    cwd=str(PROJECT_ROOT),
    capture_output=False
)

#result = subprocess.run(
#    [sys.executable, str(PROJECT_ROOT / 'ravdess_train' / 'eval.py'), '--model', 'CRNN_BiGRU', '--checkpoint', str(checkpoint_path)],
#    cwd=str(PROJECT_ROOT)
#)

Using device: cuda

wandb: Currently logged in as: pagliarellomatteo (pagliarellomatteo-politecnico-di-torino) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: setting up run egwxud5u
wandb: Tracking run with wandb version 0.23.1
wandb: Run data is saved locally in d:\Roba da D\Poli\ML Vision\speech-emotion-recognition-25\wandb\run-20260121_184105-egwxud5u
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run eval_20260121_194104
wandb:  View project at https://wandb.ai/pagliarellomatteo-politecnico-di-torino/speech-emotion-recognition
wandb:  View run at https://wandb.ai/pagliarellomatteo-politecnico-di-torino/speech-emotion-recognition/runs/egwxud5u
wandb: uploading artifact run-egwxud5u-classification_report; updating run metadata
wandb: uploading artifact run-egwxud5u-classification_report; uploading config.yaml
wandb: uploading artifact run-egwxud5u-classification_report
wandb: 
wandb: Run history:
wandb:    test_accuracy ‚ñÅ
wandb:    test_macro_f1 ‚ñÅ
wandb:



Timestamp valutazione: 20260121_194104

‚úÖ RAVDESS trovato: data\ravdess

Loading RAVDESS test set...
üìä Statistiche del dataset RAVDESS:

üìä ANALISI RAVDESS TEST SET
üîπ Samples Totali: 144
üîπ Attori (2): [23, 24]
   - Maschi:  1
   - Femmine: 1

üé≠ Distribuzione Emozioni:
   - Angry     :   32 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
   - Happy     :   32 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
   - Neutral   :   48 ( 33.3%) ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà
   - Sad       :   32 ( 22.2%) ‚ñà‚ñà‚ñà‚ñà
----------------------------------------
‚úÖ Test samples: 144

Loading model...
‚úÖ Modello caricato da checkpoints/best_model.pth

TESTING IN CORSO...

üìä METRICHE DI VALUTAZIONE - FASE 1 (RAVDESS BASELINE)

üéØ METRICHE PRINCIPALI:
   ‚úÖ Accuracy:           48.61%
   üìà Macro-Avg F1:       0.4935
   üìä Weighted-Avg F1:    0.5081

üìã CLASS DISTRIBUTION (Test Set):
   neutral   :  48 samples ( 33.3%)
   happy     :  32 samples ( 22.2%)
   sad       :  32 samples ( 22.2%)
   angry     :  32 samples ( 22.2%)

üé≠ DE