# Piano Performance Evaluation - 3-Way Model Comparison (Colab)

Trains 3 models to prove multi-modal fusion advantage:
1. Audio-Only (MERT only)
2. MIDI-Only (MIDIBert only)
3. Fusion (MERT + MIDIBert)

**Dimensions**: 3 core (note_accuracy, rhythmic_precision, tone_quality)
**Sample size**: 10,000 training samples
**Expected time**: 6-7 hours total (2h + 1.5h + 2.5h)
**Goal**: Prove fusion beats both baselines by 15-20%

## Google Drive Structure

```
MyDrive/
  crescendai_data/
    all_segments/              # Audio segments
      *.wav
      midi_segments/
        *.mid
    annotations/
      synthetic_train_filtered.jsonl    # 91,865 samples
      synthetic_val_filtered.jsonl
      synthetic_test_filtered.jsonl

  crescendai_checkpoints/
    audio_10k/                 # Audio-only checkpoints
    midi_10k/                  # MIDI-only checkpoints  
    fusion_10k/                # Fusion checkpoints
```

## Setup

In [1]:
# HuggingFace Login
import os
os.environ.pop("HF_TOKEN", None)
os.environ.pop("HUGGINGFACEHUB_API_TOKEN", None)

from huggingface_hub import login, HfApi

try:
    import getpass as gp
    raw = gp.getpass("Paste your Hugging Face token (input hidden): ")
    token = raw.decode() if isinstance(raw, (bytes, bytearray)) else raw
    if not isinstance(token, str):
        raise TypeError(f"Unexpected token type: {type(token).__name__}")
    token = token.strip()
    if not token:
        raise ValueError("Empty token provided")
    login(token=token, add_to_git_credential=False)
    who = HfApi().whoami(token=token)
    print(f"✓ Logged in as: {who.get('name') or who.get('email') or 'OK'}")
except Exception as e:
    print(f"[HF Login] getpass flow failed: {e}")
    print("Falling back to interactive login widget...")
    login()
    try:
        who = HfApi().whoami()
        print(f"✓ Logged in as: {who.get('name') or who.get('email') or 'OK'}")
    except Exception as e2:
        print(f"[HF Login] Verification skipped: {e2}")

✓ Logged in as: Jai-D


In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Verify data exists
import os
ANNOTATIONS_ROOT = '/content/drive/MyDrive/crescendai_data/annotations'

required_files = [
    f'{ANNOTATIONS_ROOT}/synthetic_train_filtered.jsonl',
    f'{ANNOTATIONS_ROOT}/synthetic_val_filtered.jsonl',
    f'{ANNOTATIONS_ROOT}/synthetic_test_filtered.jsonl',
]

print("Checking for data files...")
for f in required_files:
    if os.path.exists(f):
        print(f"✓ {os.path.basename(f)}")
    else:
        print(f"✗ MISSING: {f}")
        raise FileNotFoundError(f"Required file not found: {f}")

print("\n✓ All data files present")

In [None]:
# Clone repo
!rm -rf /content/crescendai
!git clone https://github.com/Jai-Dhiman/crescendai.git /content/crescendai
%cd /content/crescendai/model
!git log -1 --oneline

In [None]:
# Install uv (fast Python package manager)
!curl -LsSf https://astral.sh/uv/install.sh | sh

# Add to PATH for this session
import os
os.environ['PATH'] = f"{os.environ['HOME']}/.cargo/bin:{os.environ['PATH']}"

print("\n✓ uv installed")

In [None]:
# Install dependencies
!uv pip install --system -e .

# Suppress MIDI warnings
import warnings
warnings.filterwarnings('ignore', message='divide by zero')

import torch
import pytorch_lightning as pl
print(f"PyTorch: {torch.__version__}")
print(f"Lightning: {pl.__version__}")
print("✓ Dependencies installed")

## GPU Check

In [None]:
!nvidia-smi

import torch
if not torch.cuda.is_available():
    print("\n⚠️  NO GPU! Enable GPU: Runtime → Change runtime type → T4 GPU")
    raise RuntimeError("GPU required")

print(f"\n✓ GPU: {torch.cuda.get_device_name(0)}")
print(f"✓ Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

In [None]:
# Download MERT model (cached after first download)
from transformers import AutoModel

print("Downloading MERT-95M (~380MB)...")
model = AutoModel.from_pretrained("m-a-p/MERT-v1-95M", trust_remote_code=True)
print("✓ MERT-95M cached")

del model
torch.cuda.empty_cache()

## Data Optimization (CRITICAL)

Copy data from Google Drive → local SSD for 5-10× speedup

In [None]:
import json
import random
from pathlib import Path

# Create local data directory
LOCAL_DATA = Path('/tmp/training_data')
LOCAL_DATA.mkdir(exist_ok=True)

# Function to subsample dataset
def subsample_jsonl(input_path, output_path, n_samples=10000, seed=42):
    """Create random subset of JSONL file"""
    with open(input_path) as f:
        data = [json.loads(line) for line in f if line.strip()]
    
    print(f"Original: {len(data):,} samples")
    
    if n_samples >= len(data):
        # Just copy
        with open(output_path, 'w') as f:
            for item in data:
                f.write(json.dumps(item) + '\n')
        print(f"Copied all {len(data):,} samples")
    else:
        # Random subsample
        random.seed(seed)
        subset = random.sample(data, n_samples)
        with open(output_path, 'w') as f:
            for item in subset:
                f.write(json.dumps(item) + '\n')
        print(f"Subsampled to {len(subset):,} samples ({len(subset)/len(data)*100:.1f}%)")

# Copy train set (10K subsample)
print("Creating 10K training subset...")
subsample_jsonl(
    f'{ANNOTATIONS_ROOT}/synthetic_train_filtered.jsonl',
    LOCAL_DATA / 'synthetic_train_filtered.jsonl',
    n_samples=10000
)

# Copy val/test (full)
print("\nCopying validation set...")
!cp {ANNOTATIONS_ROOT}/synthetic_val_filtered.jsonl /tmp/training_data/

print("Copying test set...")
!cp {ANNOTATIONS_ROOT}/synthetic_test_filtered.jsonl /tmp/training_data/

print("\n✓ Data copied to /tmp/training_data/ (fast local SSD)")
print("\nThis data is TEMPORARY - wiped when runtime disconnects")
print("Checkpoints still save to Google Drive (persistent)")
!ls -lh /tmp/training_data/

## Experiment 1: Audio-Only (~2 hours)

In [None]:
%%time
!python train.py --config configs/experiment_10k.yaml --mode audio

## Experiment 2: MIDI-Only (~1.5 hours)

In [None]:
%%time
!python train.py --config configs/experiment_10k.yaml --mode midi

## Experiment 3: Fusion (~2.5 hours)

In [None]:
%%time
!python train.py --config configs/experiment_10k.yaml --mode fusion

## Compare Results

import pytorch_lightning as pl
from src.models.lightning_module import PerformanceEvaluationModel
from src.data.dataset import create_dataloaders
from pathlib import Path

# Load all 3 models
models = {}
for mode in ['audio', 'midi', 'fusion']:
    ckpt_dir = Path(f'/content/drive/MyDrive/crescendai_checkpoints/{mode}_10k')
    ckpts = list(ckpt_dir.glob('*.ckpt'))
    if ckpts:
        latest = sorted(ckpts)[-1]
        print(f"Loading {mode}: {latest.name}")
        models[mode] = PerformanceEvaluationModel.load_from_checkpoint(str(latest))
        models[mode].eval()
        models[mode] = models[mode].cuda()
    else:
        print(f"⚠️  No checkpoint found for {mode}")

# Create test dataloader
_, _, test_loader = create_dataloaders(
    train_annotation_path='/tmp/training_data/synthetic_train_filtered.jsonl',
    val_annotation_path='/tmp/training_data/synthetic_val_filtered.jsonl',
    test_annotation_path='/tmp/training_data/synthetic_test_filtered.jsonl',
    dimension_names=['note_accuracy', 'rhythmic_precision', 'tone_quality'],
    batch_size=8,
    num_workers=0,
    augmentation_config=None,
    audio_sample_rate=24000,
    max_audio_length=240000,
    max_midi_events=512,
)

# Evaluate each model
trainer = pl.Trainer(accelerator='auto', devices='auto', precision=16)
results = {}

for mode, model in models.items():
    print(f"\nEvaluating {mode}...")
    test_results = trainer.test(model, dataloaders=test_loader, verbose=False)
    results[mode] = test_results[0]

print("\n" + "="*70)
print("COMPARISON")
print("="*70)
print(f"{'Dimension':<25} {'Audio r':<12} {'MIDI r':<12} {'Fusion r':<12} {'Gain'}")
print("-"*70)

for dim in ['note_accuracy', 'rhythmic_precision', 'tone_quality']:
    audio_r = results.get('audio', {}).get(f'test_pearson_{dim}', 0)
    midi_r = results.get('midi', {}).get(f'test_pearson_{dim}', 0)
    fusion_r = results.get('fusion', {}).get(f'test_pearson_{dim}', 0)
    gain = fusion_r - max(audio_r, midi_r)
    
    print(f"{dim:<25} {audio_r:>11.3f} {midi_r:>11.3f} {fusion_r:>11.3f} {gain:>+11.3f}")

avg_gain = sum(
    results.get('fusion', {}).get(f'test_pearson_{dim}', 0) - 
    max(results.get('audio', {}).get(f'test_pearson_{dim}', 0),
        results.get('midi', {}).get(f'test_pearson_{dim}', 0))
    for dim in ['note_accuracy', 'rhythmic_precision', 'tone_quality']
) / 3

print("-"*70)
print(f"Average fusion gain: {avg_gain:+.3f} ({avg_gain*100:+.1f}%)")
print("="*70)

if avg_gain > 0.05:
    print("\n✓ SUCCESS: Fusion shows clear multi-modal advantage!")
else:
    print("\n⚠️  WARNING: Fusion gain is marginal. Check fusion implementation.")

---

## Troubleshooting

### Session Disconnected
- **Don't worry!** Checkpoints are saved directly to Google Drive (persistent)
- Re-run setup cells (Google Drive mount, git clone, install dependencies)
- Re-run the training cell - will **automatically resume** from latest checkpoint
- Check Google Drive → MyDrive → crescendai_checkpoints for saved checkpoints

### Out of Memory (OOM)
- Reduce batch size: `config['data']['batch_size'] = 4`
- Increase gradient accumulation: `config['training']['accumulate_grad_batches'] = 8`
- This keeps effective batch size = 4 × 8 = 32

### I/O Errors During Training
- **Already fixed!** `num_workers=0` prevents Google Drive concurrent access issues
- If you still see errors, check Google Drive sync status
- Try remounting Drive: restart runtime and re-run cells

### Slow Training
- Verify you have T4 or better GPU (not K80)
- Check `num_workers=0` in config (required for Google Drive)
- Sequential loading is normal and expected with Google Drive

### MIDI Divide-by-Zero Warnings
- **Harmless** - occurs when MIDI files have 0 duration
- Already suppressed in the notebook
- Does not affect training

### VSCode Extension Limitations
- The VSCode Colab extension only keeps your **notebook file** local
- Source code is cloned from GitHub on the Colab runtime
- Data must be in Google Drive (local files are NOT synced to runtime)
- You get VSCode editing experience with Colab's compute power

### Target Performance (MVP Goals)
**Technical dimensions** (current):
- Pearson r: 0.50-0.65
- MAE: 10-15 points (0-100 scale)

**Interpretive dimensions** (after expert labels):
- Pearson r: 0.35-0.50
- MAE: 12-18 points

If you achieve these targets, the architecture is validated!