# FESTA: Audio LLM Uncertainty Estimation on Google Colab

This notebook runs the FESTA (Functionally Equivalent Sampling for Trust Assessment) framework on Google Colab.

## Contents
1. **Environment Setup** - Install dependencies and verify GPU
2. **Upload Files** - Upload your code and dataset
3. **Configuration** - Set up paths and parameters
4. **Model Loading** - Download and load Qwen2-Audio model
5. **Run Experiment** - Execute FESTA pipeline
6. **View Results** - Analyze and visualize results
7. **Download Results** - Package and download outputs

## Quick Start
1. **Enable GPU**: Runtime → Change runtime type → GPU (T4 or better)
2. **Run all cells**: Runtime → Run all
3. **Upload files** when prompted
4. **Wait ~10-15 minutes** for completion

---

## 1. Environment Setup

First, let's check GPU availability and install dependencies.

In [None]:
# Check GPU availability
!nvidia-smi

import torch
print(f"\nPyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
else:
    print("⚠️ WARNING: No GPU detected! This will be very slow on CPU.")
    print("Please enable GPU: Runtime → Change runtime type → GPU")

In [None]:
# Install system dependencies
print("Installing system dependencies...")
!apt-get update -qq
!apt-get install -y -qq ffmpeg libsndfile1
print("✅ System dependencies installed")

In [None]:
# Install Python packages
print("Installing Python packages (this may take 2-3 minutes)...\n")

# Core ML packages
!pip install -q torch torchvision torchaudio
!pip install -q transformers>=4.40.0 accelerate>=0.27.0
!pip install -q sentencepiece protobuf

# Audio processing
!pip install -q librosa soundfile pydub audioread

# Data and visualization
!pip install -q numpy pandas scikit-learn
!pip install -q matplotlib seaborn plotly

# Utilities
!pip install -q tqdm pyyaml opencv-python

print("\n✅ All packages installed!")

In [None]:
# Verify installations
print("Verifying installations...\n")

import sys
packages = [
    'torch', 'transformers', 'librosa', 'soundfile', 
    'numpy', 'pandas', 'sklearn', 'matplotlib', 'tqdm', 'yaml'
]

all_good = True
for pkg in packages:
    try:
        __import__(pkg)
        print(f"✅ {pkg}")
    except ImportError:
        print(f"❌ {pkg} - FAILED")
        all_good = False

if all_good:
    print("\n🎉 All packages verified!")
else:
    print("\n⚠️ Some packages failed. Please re-run the installation cell.")

## 2. Upload Files

Upload your AudioLLM-FESTA project to Colab.

### ⚡ Recommended: Use Mini Dataset (Fast Upload!)

The full TREA dataset is **908MB** and takes 20-30 minutes to upload. For quick testing, use the **mini dataset** instead:

```bash
# On your local machine (in AudioLLM-FESTA directory):

# 1. Generate mini dataset (~30 seconds)
python scripts/create_mini_dataset.py

# 2. Package for Colab (~10 seconds)
python scripts/package_for_colab.py

# Output: AudioLLM-FESTA.zip (~25MB)
# Upload time: 1-2 minutes ⚡ (vs 30+ minutes for full dataset)
```

**What's included in mini dataset:**
- ✅ All code (src/, experiments/, notebooks/)
- ✅ 15 audio samples (5 per task)
- ✅ All configurations
- ✅ Full FESTA pipeline works identically
- ⚠️ Only for quick testing (not final evaluation)

**Upload this**: `AudioLLM-FESTA.zip`

---

### Alternative: Full Dataset (Slow Upload)

If you want to use the full dataset:

```bash
# On your local machine:
# DON'T include TREA_dataset in AudioLLM-FESTA zip (makes it huge!)

# Option 1: Exclude dataset from code zip
cd path/to/parent/directory
zip -r AudioLLM-FESTA.zip AudioLLM-FESTA/ -x "AudioLLM-FESTA/TREA_dataset/*"

# Option 2: Upload to Google Drive instead (one-time, recommended)
# Upload TREA_dataset to Drive, then mount in Colab (see below)
```

**For first-time users**: Use mini dataset! You can scale up later.

In [None]:
import os
from google.colab import files

print("📤 Please upload your file:")
print("  • AudioLLM-FESTA.zip (recommended - mini dataset, ~25MB)")
print("  OR")
print("  • AudioLLM-FESTA.zip + TREA_dataset.zip (full dataset, ~900MB+)\n")

uploaded = files.upload()

print("\n📦 Files uploaded:")
for filename in uploaded.keys():
    print(f"  • {filename} ({len(uploaded[filename]) / 1024**2:.2f} MB)")

In [None]:
# Extract zip files if uploaded
import zipfile
from pathlib import Path

print("📦 Extracting files...\n")

for filename in uploaded.keys():
    if filename.endswith('.zip'):
        print(f"Extracting {filename}...")
        with zipfile.ZipFile(filename, 'r') as zip_ref:
            zip_ref.extractall('/content/')
        print(f"  ✅ Extracted to /content/")
        
        # Remove zip file to save space
        os.remove(filename)
        print(f"  🗑️  Removed {filename}")

print("\n✅ All files extracted!")

In [None]:
# Verify directory structure and dataset type
print("📂 Verifying directory structure...\n")

required_paths = [
    '/content/AudioLLM-FESTA',
    '/content/AudioLLM-FESTA/src',
    '/content/AudioLLM-FESTA/experiments',
    '/content/AudioLLM-FESTA/TREA_dataset',
    '/content/AudioLLM-FESTA/TREA_dataset/count',
    '/content/AudioLLM-FESTA/TREA_dataset/order',
    '/content/AudioLLM-FESTA/TREA_dataset/duration',
]

all_exist = True
for path in required_paths:
    if os.path.exists(path):
        print(f"✅ {path}")
    else:
        print(f"❌ {path} - NOT FOUND")
        all_exist = False

# Check dataset size
if all_exist:
    dataset_path = Path('/content/AudioLLM-FESTA/TREA_dataset')
    audio_files = list(dataset_path.rglob('*.wav'))
    total_files = len(audio_files)
    
    print(f"\n📊 Dataset Information:")
    print(f"  Total audio files: {total_files}")
    
    if total_files <= 30:
        print(f"  Dataset type: ✨ MINI DATASET (fast upload!)")
        print(f"  Best for: Quick testing and verification")
        print(f"  Samples per task: ~{total_files // 3}")
    else:
        print(f"  Dataset type: 📦 FULL DATASET")
        print(f"  Best for: Final experiments and evaluation")
        print(f"  Samples per task: ~{total_files // 3}")
    
    print("\n🎉 Directory structure verified!")
else:
    print("\n⚠️ Some directories missing. Please check your upload.")

In [None]:
# Show directory tree
!tree -L 2 /content/AudioLLM-FESTA

## 3. Configuration

Set up paths and load configuration.

In [None]:
# Change to project directory
import os
os.chdir('/content/AudioLLM-FESTA')
print(f"Current directory: {os.getcwd()}")

# Add to Python path
import sys
sys.path.insert(0, '/content/AudioLLM-FESTA')
print("✅ Python path updated")

In [None]:
# Load and display configuration
import yaml

config_path = 'config_colab.yaml'

with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

print("📋 Current Configuration:\n")
print(f"Model: {config['model']['name']}")
print(f"Device: {config['model']['device']}")
print(f"Dtype: {config['model']['dtype']}")
print(f"\nDataset:")
print(f"  Samples per task: {config['dataset']['samples_per_task']}")
print(f"  Tasks: {config['dataset']['tasks']}")
print(f"\nFESTA Sampling:")
print(f"  FES: {config['festa']['n_fes_audio']} audio × {config['festa']['n_fes_text']} text = {config['festa']['n_fes_audio'] * config['festa']['n_fes_text']} samples")
print(f"  FCS: {config['festa']['n_fcs_audio']} audio × {config['festa']['n_fcs_text']} text = {config['festa']['n_fcs_audio'] * config['festa']['n_fcs_text']} samples")
print(f"\nEstimated runtime: ~10-15 minutes")

In [None]:
# Optional: Modify configuration for even faster testing
# Uncomment to run in TEST MODE (1 sample only)

# config['colab']['test_mode'] = True
# with open(config_path, 'w') as f:
#     yaml.safe_dump(config, f)
# print("🧪 TEST MODE ENABLED - Will process only 1 sample")

In [None]:
# Test imports
print("Testing FESTA imports...\n")

try:
    from src.data_loader import load_trea_dataset
    print("✅ data_loader")
    
    from src.model_wrapper import Qwen2AudioWrapper
    print("✅ model_wrapper")
    
    from src.fes_generator import FESGenerator
    print("✅ fes_generator")
    
    from src.fcs_generator import FCSGenerator
    print("✅ fcs_generator")
    
    from src.uncertainty import FESTAUncertainty
    print("✅ uncertainty")
    
    from src.metrics import compute_auroc, compute_accuracy
    print("✅ metrics")
    
    from src.baselines import BaselineUncertainty
    print("✅ baselines")
    
    print("\n🎉 All FESTA modules imported successfully!")
    
except ImportError as e:
    print(f"\n❌ Import error: {e}")
    print("Please check that all files were uploaded correctly.")

## 4. Model Loading

Download and load the Qwen2-Audio-7B-Instruct model (~14GB).

**Note**: This will take 5-10 minutes on first run.

In [None]:
# Check available disk space
!df -h /content

print("\n⚠️ Model download requires ~14GB")
print("Please ensure you have sufficient space.")

In [None]:
# Test model loading (downloads model on first run)
from src.model_wrapper import Qwen2AudioWrapper
import torch

print("🤖 Loading Qwen2-Audio model...")
print("This will download ~14GB on first run (5-10 minutes)\n")

model = Qwen2AudioWrapper(
    model_name="Qwen/Qwen2-Audio-7B-Instruct",
    device="cuda" if torch.cuda.is_available() else "cpu",
    dtype="float16",
    max_length=512
)

print("\n✅ Model loaded successfully!")
print(f"\nModel info:")
info = model.get_model_info()
for key, value in info.items():
    print(f"  {key}: {value}")

In [None]:
# Quick test prediction
from src.data_loader import load_trea_dataset

print("🧪 Testing model prediction...\n")

# Load one sample
dataset = load_trea_dataset(
    data_dir='TREA_dataset',
    samples_per_task=1,
    random_seed=42
)

sample = dataset.data[0]
print(f"Task: {sample['task']}")
print(f"Question: {sample['question']}")
print(f"Options: {sample['options']}")
print(f"Ground Truth: {sample['correct_answer']}\n")

# Get prediction
prediction, probs = model.predict(
    sample['audio_path'],
    sample['question'],
    sample['options'],
    return_probs=True
)

print(f"Prediction: {prediction}")
print(f"Correct: {'✅ YES' if prediction == sample['correct_answer'] else '❌ NO'}")
print(f"\nProbabilities:")
for option, prob in sorted(probs.items()):
    bar = '█' * int(prob * 30)
    print(f"  {option}: {prob:.3f} {bar}")

print("\n✅ Model test passed!")

# Cleanup
del model
torch.cuda.empty_cache()
print("🧹 Memory cleared")

## 5. Run FESTA Experiment

Run the full FESTA pipeline with checkpoint support.

In [None]:
# Run the experiment
!python experiments/run_festa_colab.py --config config_colab.yaml

### Resume from Checkpoint (if disconnected)

If your session disconnects, just re-run the cell above. The experiment will automatically resume from where it left off.

## 6. View Results

Load and visualize the results.

In [None]:
# List result files
import os
from pathlib import Path

results_dir = Path('/content/festa_results')

print("📁 Result files:\n")
for file in sorted(results_dir.glob('*.json')):
    size = file.stat().st_size / 1024  # KB
    print(f"  • {file.name} ({size:.1f} KB)")

In [None]:
# Load and display metrics
import json
import pandas as pd

# Find latest metrics file
metrics_files = sorted(results_dir.glob('metrics_*.json'))
if metrics_files:
    latest_metrics = metrics_files[-1]
    
    with open(latest_metrics, 'r') as f:
        metrics = json.load(f)
    
    print("📊 FESTA Results Summary\n")
    print(f"Overall Accuracy: {metrics['overall_accuracy']:.2%}\n")
    
    print("Method Comparison:")
    print("-" * 50)
    
    method_results = metrics['method_results']
    for method, scores in sorted(method_results.items(), 
                                  key=lambda x: x[1]['auroc'], 
                                  reverse=True):
        print(f"{method:<20} AUROC: {scores['auroc']:.4f}")
else:
    print("No metrics files found. Please run the experiment first.")

In [None]:
# Visualize results
import matplotlib.pyplot as plt
import seaborn as sns

if metrics_files:
    # Create bar plot of AUROC scores
    methods = list(method_results.keys())
    aurocs = [method_results[m]['auroc'] for m in methods]
    
    plt.figure(figsize=(10, 6))
    bars = plt.bar(methods, aurocs, color=['#FF6B6B', '#4ECDC4', '#45B7D1'])
    plt.xlabel('Method', fontsize=12)
    plt.ylabel('AUROC', fontsize=12)
    plt.title('Uncertainty Method Comparison', fontsize=14, fontweight='bold')
    plt.ylim([0, 1])
    plt.grid(axis='y', alpha=0.3)
    
    # Add value labels on bars
    for bar in bars:
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height,
                f'{height:.3f}',
                ha='center', va='bottom')
    
    plt.tight_layout()
    plt.savefig('/content/festa_results/auroc_comparison.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("✅ Plot saved to: /content/festa_results/auroc_comparison.png")

In [None]:
# Load predictions and show sample results
pred_files = sorted(results_dir.glob('predictions_*.json'))
if pred_files:
    latest_preds = pred_files[-1]
    
    with open(latest_preds, 'r') as f:
        preds_data = json.load(f)
    
    df = pd.DataFrame({
        'Task': preds_data['tasks'],
        'Prediction': preds_data['predictions'],
        'Ground Truth': preds_data['ground_truths'],
        'Correct': [p == g for p, g in zip(preds_data['predictions'], 
                                           preds_data['ground_truths'])]
    })
    
    print("\n📋 Sample Predictions:\n")
    print(df.head(10))
    
    print("\n📊 Task-wise Accuracy:")
    task_acc = df.groupby('Task')['Correct'].mean()
    for task, acc in task_acc.items():
        print(f"  {task}: {acc:.2%}")

## 7. Download Results

Package and download all results.

In [None]:
# Create zip file with all results
import shutil
from datetime import datetime

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
zip_filename = f"festa_results_{timestamp}.zip"

print(f"📦 Creating results package: {zip_filename}\n")

shutil.make_archive(
    f'/content/festa_results_{timestamp}',
    'zip',
    '/content/festa_results'
)

zip_path = f'/content/festa_results_{timestamp}.zip'
zip_size = os.path.getsize(zip_path) / 1024  # KB

print(f"✅ Results packaged: {zip_size:.1f} KB")
print(f"\nContents:")
!unzip -l {zip_path} | head -20

In [None]:
# Download results
from google.colab import files

print("⬇️ Downloading results...")
files.download(zip_path)
print("✅ Download started! Check your browser's download folder.")

## Summary

You've successfully run FESTA on Google Colab! 🎉

### Next Steps:

1. **Scale up**: Change `samples_per_task` to 30 in config_colab.yaml
2. **More samples**: Increase `n_fes_audio` and `n_fes_text` to 15 and 4
3. **Full baselines**: Enable all baseline methods in config
4. **Analysis**: Open the notebooks in `notebooks/` for deeper analysis

### Files Generated:
- `predictions_*.json` - Model predictions and ground truths
- `uncertainties_*.json` - FESTA and baseline uncertainty scores
- `metrics_*.json` - AUROC, accuracy, and task-wise metrics
- `auroc_comparison.png` - Visualization of method comparison

### Expected Performance (from paper):
- FESTA AUROC: **0.83-0.91**
- Best Baseline: ~0.68-0.71
- Improvement: **+30-40%**

---

**For questions or issues**, refer to:
- `README.md` - Full documentation
- `QUICK_START.md` - Setup guide
- `COLAB_INSTRUCTIONS.md` - Colab-specific instructions