# Piano Performance Evaluation - Full Model Training

Trains the complete multi-modal performance evaluation model on MAESTRO synthetic labels.

**Current Dimensions**: 6 technical dimensions (note_accuracy, rhythmic_precision, dynamics_control, articulation, pedaling, tone_quality)

**Future Expansion**: 4 interpretive dimensions will be added after expert labeling (phrasing, expressiveness, musicality, overall_quality)

**Requirements:**
- Colab Pro (T4/V100 GPU recommended)
- Google Drive with segments and labels uploaded
- HuggingFace account for MERT model access
- crescendai repository on GitHub

**Expected Training Time**: ~8-12 GPU hours on T4

---

## Expected Google Drive Structure

Data and checkpoints are stored in **MyDrive**:

```
MyDrive/
  crescendai_data/
    all_segments/                  # Audio segments (~65GB)
      *.wav                        # MAESTRO segments
      youtube_*.wav                # YouTube test segments
      midi_segments/               # MIDI segments
        *.mid
    annotations/                   # Annotation files
      synthetic_train_colab.jsonl  # ~114k training samples
      synthetic_val_colab.jsonl    # ~21k validation samples
      synthetic_test_colab.jsonl   # ~7k test samples

  crescendai_model/
    checkpoints/                   # Checkpoints saved here
      (will be created automatically)
    logs/                          # TensorBoard logs
      (will be created automatically)
```

All data is in MyDrive for reliable Colab access.

## 1. Environment Setup

In [None]:
# HuggingFace Login
import os
os.environ.pop("HF_TOKEN", None)
os.environ.pop("HUGGINGFACEHUB_API_TOKEN", None)

from huggingface_hub import login, HfApi

try:
    import getpass as gp
    raw = gp.getpass("Paste your Hugging Face token (input hidden): ")
    token = raw.decode() if isinstance(raw, (bytes, bytearray)) else raw
    if not isinstance(token, str):
        raise TypeError(f"Unexpected token type: {type(token).__name__}")
    token = token.strip()
    if not token:
        raise ValueError("Empty token provided")
    login(token=token, add_to_git_credential=False)
    who = HfApi().whoami(token=token)
    print(f"✓ Logged in as: {who.get('name') or who.get('email') or 'OK'}")
except Exception as e:
    print(f"[HF Login] getpass flow failed: {e}")
    print("Falling back to interactive login widget...")
    login()
    try:
        who = HfApi().whoami()
        print(f"✓ Logged in as: {who.get('name') or who.get('email') or 'OK'}")
    except Exception as e2:
        print(f"[HF Login] Verification skipped: {e2}")

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Load data from MyDrive
import os
from pathlib import Path

print("Loading data from Google Drive MyDrive...\n")

# Data location (all in MyDrive for reliable access)
GDRIVE_ROOT = '/content/drive/MyDrive/crescendai_data'
DATA_ROOT = f'{GDRIVE_ROOT}/all_segments'
ANNOTATIONS_ROOT = f'{GDRIVE_ROOT}/annotations'

# Checkpoint and logs location (MyDrive for persistence)
CHECKPOINT_ROOT = '/content/drive/MyDrive/crescendai_model/checkpoints'
LOGS_ROOT = '/content/drive/MyDrive/crescendai_model/logs'

print(f"Data paths:")
print(f"  Audio segments: {DATA_ROOT}")
print(f"  Annotations: {ANNOTATIONS_ROOT}")
print(f"  Checkpoints: {CHECKPOINT_ROOT}")
print(f"  Logs: {LOGS_ROOT}\n")

# Verify structure
required_paths = [
    (DATA_ROOT, "Audio segments directory"),
    (ANNOTATIONS_ROOT, "Annotations directory"),
    (f"{ANNOTATIONS_ROOT}/synthetic_train_colab.jsonl", "Training annotations"),
    (f"{ANNOTATIONS_ROOT}/synthetic_val_colab.jsonl", "Validation annotations"),
    (f"{ANNOTATIONS_ROOT}/synthetic_test_colab.jsonl", "Test annotations"),
]

print("Verifying structure...\n")
all_good = True
for path, desc in required_paths:
    if os.path.exists(path):
        print(f"✓ {desc}: {path}")
    else:
        print(f"✗ {desc} NOT FOUND: {path}")
        all_good = False

# Create checkpoint and logs directories
os.makedirs(CHECKPOINT_ROOT, exist_ok=True)
os.makedirs(LOGS_ROOT, exist_ok=True)

if not all_good:
    print("\n" + "="*70)
    print("ERROR: Google Drive structure incomplete")
    print("="*70)
    print("\nExpected location: MyDrive/crescendai_data/")
    print("  all_segments/")
    print("  annotations/")
    print("\nPlease verify files are uploaded to MyDrive.")
    print("="*70)
    raise RuntimeError("Google Drive structure incomplete. Please check sync.")
else:
    print("\n✓ All required files present. Ready to proceed!")

In [None]:
# Clone repository
REPO_URL = "https://github.com/Jai-Dhiman/crescendai.git"
BRANCH = "main"

# Remove old clone if exists
!rm -rf /content/crescendai

# Clone fresh
!git clone --branch {BRANCH} {REPO_URL} /content/crescendai

# Navigate to model directory
%cd /content/crescendai/model

# Show git status
print("\nRepository status:")
!git log -1 --oneline
!git status --short

In [None]:
# Install uv (fast Python package manager)
!curl -LsSf https://astral.sh/uv/install.sh | sh

# Add to PATH for this session
import os
os.environ['PATH'] = f"{os.environ['HOME']}/.cargo/bin:{os.environ['PATH']}"

print("\n✓ uv installed")

In [None]:
# Install dependencies
print("Installing dependencies (this may take 2-3 minutes)...\n")
!uv pip install --system -e .

# Verify installation
import os
os.environ['MPLBACKEND'] = 'Agg'  # Non-interactive backend for matplotlib

import torch
import pytorch_lightning as pl

print("\n" + "="*70)
print("✓ Dependencies installed successfully")
print("="*70)
print(f"PyTorch: {torch.__version__}")
print(f"Lightning: {pl.__version__}")
print("="*70)

## 2. Verify GPU and Download Model

In [None]:
# Check GPU
import torch

print("="*70)
print("GPU VERIFICATION")
print("="*70)
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    print("\n✓ GPU READY FOR TRAINING")
    print("="*70)
else:
    print("\n" + "="*70)
    print("⚠️  CRITICAL: NO GPU DETECTED!")
    print("="*70)
    print("\nTraining on CPU will be EXTREMELY SLOW (200x slower than GPU).")
    print("\nTO ENABLE GPU:")
    print("1. Go to: Runtime → Change runtime type")
    print("2. Set 'Hardware accelerator' to: T4 GPU")
    print("3. Click 'Save'")
    print("4. Re-run all cells from the beginning")
    print("\nDO NOT proceed with training until GPU is enabled!")
    print("="*70)
    raise RuntimeError("GPU required for training. Please enable GPU and restart.")

In [None]:
# Download MERT model (one-time, will be cached)
from transformers import AutoModel

print("Downloading MERT-95M model (one-time, ~380MB)...")
print("This will be cached for future use.\n")

model = AutoModel.from_pretrained("m-a-p/MERT-v1-95M", trust_remote_code=True)
num_params = sum(p.numel() for p in model.parameters()) / 1e6

print(f"✓ MERT-95M downloaded and cached")
print(f"  Parameters: {num_params:.1f}M")

del model
torch.cuda.empty_cache()

## 3. Create Training Configuration

This creates a config file with paths pointing to your Google Drive data.

In [None]:
import yaml

# Training configuration
config = {
    'data': {
        # Paths to annotation files on Google Drive (UPDATED: using _colab.jsonl files)
        'train_path': f'{ANNOTATIONS_ROOT}/synthetic_train_colab.jsonl',
        'val_path': f'{ANNOTATIONS_ROOT}/synthetic_val_colab.jsonl',
        'test_path': f'{ANNOTATIONS_ROOT}/synthetic_test_colab.jsonl',
        
        # Current dimensions (6 technical)
        # TODO: Add 4 interpretive dimensions when expert labels available:
        #   - phrasing
        #   - expressiveness  
        #   - musicality
        #   - overall_quality
        'dimensions': [
            'note_accuracy',
            'rhythmic_precision',
            'dynamics_control',
            'articulation',
            'pedaling',
            'tone_quality'
        ],
        
        # Audio settings (MERT requirements)
        'audio_sample_rate': 24000,
        'max_audio_length': 240000,  # 10 seconds at 24kHz
        'max_midi_events': 512,
        
        # DataLoader settings
        'batch_size': 8,
        'num_workers': 2,  # Colab: 2 workers works well
        'pin_memory': True,
        
        # Augmentation (training robustness)
        'augmentation': {
            'enabled': True,
            'pitch_shift': {
                'enabled': True,
                'probability': 0.3,
                'min_semitones': -2,
                'max_semitones': 2
            },
            'time_stretch': {
                'enabled': True,
                'probability': 0.3,
                'min_rate': 0.85,
                'max_rate': 1.15
            },
            'add_noise': {
                'enabled': True,
                'probability': 0.2,
                'min_snr_db': 25,
                'max_snr_db': 40
            },
            'room_acoustics': {
                'enabled': True,
                'probability': 0.2,
                'num_room_types': 5
            },
            'compress_audio': {
                'enabled': True,
                'probability': 0.15,
                'bitrates': [128, 192, 256, 320]
            },
            'gain_variation': {
                'enabled': True,
                'probability': 0.3,
                'min_db': -6,
                'max_db': 6
            },
            'max_transforms': 3
        }
    },
    
    'model': {
        # Architecture dimensions
        'audio_dim': 768,
        'midi_dim': 256,  # Set to 0 for audio-only mode
        'fusion_dim': 1024,
        'aggregator_dim': 512,
        'num_dimensions': 6,  # Update to 10 when adding interpretive dimensions
        
        # Encoder settings
        'mert_model_name': 'm-a-p/MERT-v1-95M',
        'freeze_audio_encoder': False,
        'gradient_checkpointing': True,  # Saves memory
    },
    
    'training': {
        # Training duration
        'max_epochs': 20,
        'precision': 16,  # Mixed precision (FP16) for speed
        
        # Optimization
        'optimizer': 'AdamW',
        'learning_rate': 1e-5,
        'backbone_lr': 1e-5,  # Lower LR for pre-trained MERT
        'heads_lr': 1e-4,     # Higher LR for task heads
        'weight_decay': 0.01,
        
        # Learning rate schedule
        'scheduler': 'cosine',
        'warmup_steps': 500,
        'min_lr': 1e-6,
        
        # Gradient settings
        'gradient_clip_val': 1.0,
        'accumulate_grad_batches': 4,  # Effective batch size = 8 * 4 = 32
        
        # Validation
        'val_check_interval': 1.0,  # Check every epoch
        'limit_val_batches': 1.0,   # Use full val set
    },
    
    'callbacks': {
        # Model checkpointing
        'checkpoint': {
            'monitor': 'val_loss',
            'mode': 'min',
            'save_top_k': 3,
            'save_last': True,
            'dirpath': f'{CHECKPOINT_ROOT}/full_model',
            'filename': 'model-{epoch:02d}-{val_loss:.4f}'
        },
        
        # Early stopping
        'early_stopping': {
            'monitor': 'val_loss',
            'mode': 'min',
            'patience': 5,
            'min_delta': 0.001
        },
        
        # Learning rate monitoring
        'lr_monitor': {
            'logging_interval': 'step'
        }
    },
    
    'logging': {
        # Logging frequency
        'log_every_n_steps': 50,
        
        # WandB (optional - set to True if you have account)
        'use_wandb': False,
        'wandb_project': 'piano-eval',
        'wandb_entity': None,
        'wandb_run_name': 'full-model',
        
        # TensorBoard
        'use_tensorboard': True,
        'tensorboard_logdir': f'{LOGS_ROOT}/full_model'
    },
    
    'seed': 42
}

# Save config
CONFIG_PATH = '/tmp/training_config.yaml'
with open(CONFIG_PATH, 'w') as f:
    yaml.dump(config, f, default_flow_style=False, sort_keys=False)

print("="*70)
print("✓ Training configuration created")
print("="*70)
print(f"Config saved to: {CONFIG_PATH}")
print(f"\nTraining Summary:")
print(f"  Dimensions: {len(config['data']['dimensions'])}")
print(f"  Batch size: {config['data']['batch_size']}")
print(f"  Gradient accumulation: {config['training']['accumulate_grad_batches']}")
print(f"  Effective batch size: {config['data']['batch_size'] * config['training']['accumulate_grad_batches']}")
print(f"  Max epochs: {config['training']['max_epochs']}")
print(f"  Precision: FP{config['training']['precision']}")
print(f"\nCheckpoints: {config['callbacks']['checkpoint']['dirpath']}")
print(f"TensorBoard: {config['logging']['tensorboard_logdir']}")
print("="*70)

## 4. Pre-flight Check (Optional but Recommended)

Verifies everything is set up correctly before starting training.

In [None]:
# Run pre-flight check
print("Running pre-flight check...\n")
!python preflight_check.py --config {CONFIG_PATH} --skip-model --skip-dataloader

print("\n" + "="*70)
print("If all checks passed, proceed to training below.")
print("If any issues found, fix them before training.")
print("="*70)

## 5. Start Training

**Expected duration**: ~8-12 hours on T4 GPU for 20 epochs

Training will automatically:
- Save checkpoints to Google Drive (persistent)
- Log metrics to TensorBoard
- Stop early if validation loss plateaus
- Resume from checkpoint if interrupted

In [None]:
# Start training
print("="*70)
print("STARTING TRAINING")
print("="*70)
print("\nThis will take ~8-12 hours on T4 GPU.")
print("Checkpoints saved to Google Drive (persistent).")
print("Safe to close browser - training continues in background.")
print("\nPress Ctrl+C to stop (will save checkpoint).\n")
print("="*70 + "\n")

!python train.py --config {CONFIG_PATH}

print("\n" + "="*70)
print("✓ TRAINING COMPLETE")
print("="*70)
print(f"Checkpoints saved to: {config['callbacks']['checkpoint']['dirpath']}")
print(f"Logs saved to: {config['logging']['tensorboard_logdir']}")
print("="*70)

## 6. Monitor Training (Optional)

Launch TensorBoard to monitor training progress in real-time.

In [None]:
# Load TensorBoard
%load_ext tensorboard
%tensorboard --logdir {LOGS_ROOT}/full_model

print("\nTensorBoard is now running above.")
print("Monitor:")
print("  - Training/validation loss")
print("  - Per-dimension MAE and correlations")
print("  - Learning rate schedule")
print("  - Gradient norms")

## 7. Evaluate Best Model

After training completes, evaluate the best checkpoint on the test set.

In [None]:
# Find best checkpoint
import os
from pathlib import Path

checkpoint_dir = Path(f"{CHECKPOINT_ROOT}/full_model")
checkpoints = sorted(checkpoint_dir.glob("model-*.ckpt"))

if not checkpoints:
    print("No checkpoints found. Make sure training completed successfully.")
else:
    # Get best checkpoint (lowest val_loss)
    best_ckpt = checkpoints[0]
    print(f"Best checkpoint: {best_ckpt.name}")
    print(f"Full path: {best_ckpt}")
    
    # Load model
    from src.models.lightning_module import PerformanceEvaluationModel
    
    print("\nLoading model...")
    model = PerformanceEvaluationModel.load_from_checkpoint(str(best_ckpt))
    model.eval()
    model = model.cuda()
    
    num_params = sum(p.numel() for p in model.parameters()) / 1e6
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6
    
    print(f"\n✓ Model loaded successfully")
    print(f"  Total parameters: {num_params:.1f}M")
    print(f"  Trainable parameters: {trainable_params:.1f}M")
    print(f"  Dimensions: {model.dimension_names}")

In [None]:
# Run test evaluation
import pytorch_lightning as pl
from src.data.dataset import create_dataloaders

print("Creating test dataloader...")
_, _, test_loader = create_dataloaders(
    train_annotation_path=config['data']['train_path'],
    val_annotation_path=config['data']['val_path'],
    test_annotation_path=config['data']['test_path'],
    dimension_names=config['data']['dimensions'],
    batch_size=config['data']['batch_size'],
    num_workers=0,  # Single worker for test
    augmentation_config=None,  # No augmentation for test
    audio_sample_rate=config['data']['audio_sample_rate'],
    max_audio_length=config['data']['max_audio_length'],
    max_midi_events=config['data']['max_midi_events'],
)

print(f"Test samples: {len(test_loader.dataset)}")
print("\nRunning test evaluation...")

trainer = pl.Trainer(
    accelerator='auto',
    devices='auto',
    precision=16,
)

test_results = trainer.test(model, dataloaders=test_loader)

print("\n" + "="*70)
print("TEST RESULTS")
print("="*70)
print("\nPer-dimension metrics:")
for dim in model.dimension_names:
    mae = test_results[0].get(f'test_mae_{dim}', 'N/A')
    pearson = test_results[0].get(f'test_pearson_{dim}', 'N/A')
    spearman = test_results[0].get(f'test_spearman_{dim}', 'N/A')
    print(f"  {dim}:")
    print(f"    MAE: {mae:.3f}" if mae != 'N/A' else f"    MAE: {mae}")
    print(f"    Pearson r: {pearson:.3f}" if pearson != 'N/A' else f"    Pearson r: {pearson}")
    print(f"    Spearman ρ: {spearman:.3f}" if spearman != 'N/A' else f"    Spearman ρ: {spearman}")
    print()

print("="*70)
print("\nEvaluation complete!")
print(f"Full results saved to TensorBoard: {config['logging']['tensorboard_logdir']}")

## 8. Download Best Checkpoint (Optional)

Download the best model checkpoint to your local machine.

In [None]:
# Download best checkpoint
from google.colab import files

if 'best_ckpt' in locals():
    print(f"Downloading: {best_ckpt.name}")
    print(f"Size: {best_ckpt.stat().st_size / 1e6:.1f} MB")
    print("\nThis may take a few minutes...")
    files.download(str(best_ckpt))
    print("\n✓ Download complete!")
else:
    print("No checkpoint found to download. Make sure training completed successfully.")

---

## Troubleshooting

### Session Disconnected
- Re-run cells 1-2 (Drive mount, repo clone)
- Re-run training cell - will automatically resume from last checkpoint
- All checkpoints are in Google Drive (persistent)

### Out of Memory (OOM)
- Reduce batch size: `config['data']['batch_size'] = 4`
- Increase gradient accumulation: `config['training']['accumulate_grad_batches'] = 8`
- This keeps effective batch size = 4 × 8 = 32

### Slow Training
- Verify you have T4 or better GPU (not K80)
- Check data is in Google Drive (not Colab Files)
- Try reducing `num_workers` if I/O bottleneck

### MIDI Loading Warnings
- Expected: ~2-5% of MIDI files may fail to load
- System handles gracefully with audio-only fallback
- Training continues normally

### Target Performance (MVP Goals)
**Technical dimensions**:
- Pearson r: 0.50-0.65
- MAE: 10-15 points (0-100 scale)

**Interpretive dimensions** (after expert labels):
- Pearson r: 0.35-0.50
- MAE: 12-18 points

If you achieve these targets, the architecture is validated!