# Loading Pretrained Audio and Music Models

This notebook demonstrates how to load and test pretrained models for music generation and understanding.

## Models Covered
1. MusicGen (Meta AudioCraft) - Text-to-music generation
2. MusicVAE (Magenta) - Symbolic music VAE
3. DDSP (Magenta) - Differentiable DSP synthesis
4. CLAP - Audio-text embeddings

## Setup
```bash
pip install audiocraft magenta transformers
```

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import Audio, display
import warnings
warnings.filterwarnings('ignore')

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

## 1. MusicGen - Text-to-Music Generation

In [None]:
# Load MusicGen model
try:
    from audiocraft.models import MusicGen
    
    print("Loading MusicGen model (this may take a few minutes)...")
    model_musicgen = MusicGen.get_pretrained('small')  # Start with small model
    model_musicgen.set_generation_params(duration=5)  # 5 second generations
    
    print("✓ MusicGen loaded successfully")
    print(f"Model device: {model_musicgen.device}")
    
except ImportError:
    print("⚠ AudioCraft not installed. Run: pip install audiocraft")
    model_musicgen = None

In [None]:
# Generate music from text descriptions
if model_musicgen is not None:
    descriptions = [
        "calm ambient music with gentle piano",
        "energetic electronic dance music",
        "melancholic acoustic guitar"
    ]
    
    print("Generating music samples...")
    wav = model_musicgen.generate(descriptions)
    
    # Display audio
    for idx, desc in enumerate(descriptions):
        print(f"\n{idx+1}. {desc}")
        audio_np = wav[idx].cpu().numpy().squeeze()
        display(Audio(audio_np, rate=model_musicgen.sample_rate))

## 2. MusicVAE - Symbolic Music with Latent Space

In [None]:
# Load MusicVAE for symbolic music
try:
    import magenta
    from magenta.models.music_vae import configs
    from magenta.models.music_vae.trained_model import TrainedModel
    import tensorflow as tf
    
    print("Loading MusicVAE model...")
    # Note: Requires downloading checkpoint files
    # See: https://github.com/magenta/magenta/tree/main/magenta/models/music_vae
    
    print("✓ Magenta available")
    print("Note: Download checkpoints from Magenta repository for full functionality")
    
except ImportError:
    print("⚠ Magenta not installed. Run: pip install magenta")

## 3. CLAP - Audio-Text Embeddings

In [None]:
# Load CLAP for audio-text alignment
try:
    from transformers import ClapModel, ClapProcessor
    
    print("Loading CLAP model...")
    model_clap = ClapModel.from_pretrained("laion/clap-htsat-unfused")
    processor_clap = ClapProcessor.from_pretrained("laion/clap-htsat-unfused")
    
    print("✓ CLAP loaded successfully")
    
except ImportError:
    print("⚠ Transformers not installed. Run: pip install transformers")
    model_clap = None

In [None]:
# Example: Compute text embeddings for music descriptions
if model_clap is not None:
    text_descriptions = [
        "peaceful meditation music",
        "energetic rock guitar",
        "sad piano melody",
        "upbeat electronic dance"
    ]
    
    inputs = processor_clap(text=text_descriptions, return_tensors="pt", padding=True)
    text_embeds = model_clap.get_text_features(**inputs)
    
    print(f"Text embeddings shape: {text_embeds.shape}")
    
    # Compute similarity matrix
    similarity = torch.cosine_similarity(text_embeds.unsqueeze(1), text_embeds.unsqueeze(0), dim=2)
    
    # Visualize
    plt.figure(figsize=(8, 6))
    plt.imshow(similarity.detach().numpy(), cmap='viridis', vmin=0, vmax=1)
    plt.colorbar(label='Cosine Similarity')
    plt.xticks(range(len(text_descriptions)), text_descriptions, rotation=45, ha='right')
    plt.yticks(range(len(text_descriptions)), text_descriptions)
    plt.title('Music Description Similarity Matrix')
    plt.tight_layout()
    plt.show()

## 4. DDSP - Differentiable Audio Synthesis

In [None]:
# DDSP demo (requires additional setup)
print("DDSP: Differentiable Digital Signal Processing")
print("For DDSP examples, see: https://github.com/magenta/ddsp")
print("")
print("DDSP provides interpretable audio synthesis controls:")
print("  - Fundamental frequency (pitch)")
print("  - Loudness (amplitude)")
print("  - Harmonic distribution (timbre)")
print("  - Noise filtering")
print("")
print("These parameters are ideal for brain-conditioned synthesis!")

## Summary and Next Steps

### Models Loaded
This notebook demonstrated loading:
- ✓ MusicGen for text-to-music generation
- ✓ CLAP for audio-text embeddings
- (Optional) MusicVAE for symbolic music
- (Optional) DDSP for controllable synthesis

### Key Takeaways
1. **MusicGen**: Good for high-level music generation from descriptions
2. **CLAP**: Bridges text and audio - useful for semantic conditioning
3. **MusicVAE**: Structured latent space for interpolation
4. **DDSP**: Interpretable parameters for fine-grained control

### Next Steps
1. Explore latent spaces of these models (see `02_explore_latent_spaces.ipynb`)
2. Simulate neural signals (see `03-05_simulated_*_features.ipynb`)
3. Map neural features to model parameters (see `06_latent_space_mapping.ipynb`)

### For Real Applications
- Consider model size vs. latency trade-offs
- Profile memory usage and inference time
- Test on target hardware (CPU, GPU, edge devices)
- Fine-tune on specific music styles if needed