# Quick Test: MusicGen Setup & First Experiments

This notebook tests your installation and runs your first experiments.

**Expected time**: 15-20 minutes

**What you'll do**:
1. Load MusicGen Large (3.3B)
2. Generate music samples
3. Extract activations
4. Visualize differences between emotions

## Setup

In [None]:
import sys
sys.path.append('..')  # Add parent directory to path

import torch
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write
from src.utils.activation_utils import ActivationExtractor, ActivationDataset
from src.utils.visualization_utils import plot_activation_statistics
import matplotlib.pyplot as plt

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
elif torch.backends.mps.is_available():
    print("Using Apple Silicon GPU (MPS)")
else:
    print("Using CPU (this will be slow!)")

## Step 1: Load MusicGen Large

In [None]:
print("Loading MusicGen Large (3.3B parameters)...")
print("This may take 2-5 minutes on first load...")

model = MusicGen.get_pretrained('facebook/musicgen-large')
model.set_generation_params(duration=8)  # 8 second samples

print("✅ Model loaded!")
print(f"   Model size: 3.3B parameters")
print(f"   Sample rate: {model.sample_rate}")
print(f"   Number of layers: {len(model.lm.layers)}")

## Step 2: Generate Music Samples

Let's generate music with different emotions and listen to them.

In [None]:
# Define prompts for different emotions
prompts = {
    'happy': "upbeat cheerful pop music with bright melody and positive energy",
    'sad': "melancholic piano ballad with sorrowful emotional melody",
    'calm': "peaceful ambient meditation music with gentle soft tones",
    'energetic': "high energy electronic dance music with intense driving beat"
}

print("Generating music samples...")
print("(This will take ~30-60 seconds per sample)")
print()

generated_samples = {}

for emotion, prompt in prompts.items():
    print(f"Generating '{emotion}' music...")
    wav = model.generate([prompt])
    generated_samples[emotion] = wav[0]
    
    # Save to file
    output_path = f'../results/sample_{emotion}'
    audio_write(output_path, wav[0].cpu(), model.sample_rate, strategy="loudness")
    print(f"   Saved to: {output_path}.wav")

print()
print("✅ All samples generated!")
print("   Listen to them in the results/ directory")

## Step 3: Extract Activations

Now let's extract activations from multiple layers to see how the model represents different emotions.

In [None]:
# Create activation extractor
# We'll sample 5 layers from the 24 total layers
layers_to_extract = [0, 6, 12, 18, 24]

print(f"Extracting activations from layers: {layers_to_extract}")
extractor = ActivationExtractor(model, layers=layers_to_extract)

# Create dataset to store activations
dataset = ActivationDataset()

print()
print("Generating and extracting activations...")

for emotion, prompt in prompts.items():
    print(f"  {emotion}...")
    
    # Generate with activation capture
    wav = extractor.generate([prompt])
    
    # Get and store activations
    activations = extractor.get_activations()
    dataset.add(
        activations=activations,
        prompt=prompt,
        label=emotion
    )
    
    extractor.clear_activations()  # Free memory

print()
print(f"✅ Extracted activations from {len(dataset)} samples")

# Save dataset for later analysis
dataset.save('../results/emotion_activations.pt')
print("   Saved to: results/emotion_activations.pt")

## Step 4: Analyze Activation Patterns

Let's visualize how activations differ across emotions and layers.

In [None]:
# Compare activation statistics for happy vs sad
happy_activations, _ = dataset[0]  # First sample (happy)
sad_activations, _ = dataset[1]    # Second sample (sad)

print("Activation shapes:")
for layer_name, act in happy_activations.items():
    print(f"  {layer_name}: {act.shape}")
    # Shape: [batch=1, sequence_length, d_model=2048]

print()
print("Comparing 'happy' vs 'sad' music activations...")

In [None]:
# Plot activation statistics for happy music
fig = plot_activation_statistics(
    happy_activations,
    save_path='../results/happy_activation_stats.png'
)
plt.suptitle('Activation Statistics: Happy Music', y=1.02)
plt.show()

# Plot activation statistics for sad music
fig = plot_activation_statistics(
    sad_activations,
    save_path='../results/sad_activation_stats.png'
)
plt.suptitle('Activation Statistics: Sad Music', y=1.02)
plt.show()

## Step 5: Compare Emotions

Let's compute some simple metrics to see if emotions are represented differently.

In [None]:
from src.utils.activation_utils import cosine_similarity, analyze_activation_statistics

# Compare layer 12 (middle layer) activations
happy_layer12 = happy_activations['layer_12']
sad_layer12 = sad_activations['layer_12']

# Compute cosine similarity
similarity = cosine_similarity(happy_layer12, sad_layer12)

print(f"Cosine similarity between happy and sad (layer 12): {similarity:.4f}")
print()

if similarity < 0.9:
    print("✅ The activations are quite different!")
    print("   This suggests the model represents emotions differently.")
else:
    print("⚠️  The activations are very similar.")
    print("   This might mean emotions aren't clearly separated in this layer.")

print()
print("Statistics for layer 12:")
print("\nHappy music:")
happy_stats = analyze_activation_statistics(happy_layer12)
for key, val in happy_stats.items():
    print(f"  {key}: {val:.4f}")

print("\nSad music:")
sad_stats = analyze_activation_statistics(sad_layer12)
for key, val in sad_stats.items():
    print(f"  {key}: {val:.4f}")

## Summary

**What you just did**:

1. ✅ Loaded MusicGen Large (3.3B parameters)
2. ✅ Generated music for 4 emotions (happy, sad, calm, energetic)
3. ✅ Extracted internal activations from 5 transformer layers
4. ✅ Visualized activation patterns
5. ✅ Compared representations between emotions

**Key observations**:
- MusicGen has 24 transformer layers, each with d_model=2048 dimensions
- Activations differ across layers (early vs late)
- Different emotions may produce different activation patterns

**Next steps**:

1. **Listen to the generated music** in `results/sample_*.wav`
2. **Experiment more**:
   - Try different prompts
   - Extract from more layers
   - Generate more samples per emotion
3. **Start Phase 0 learning** ([docs/phase0_roadmap.md](../docs/phase0_roadmap.md))
4. **Read about sparse autoencoders** to understand how to interpret these activations

This is just the beginning! The real research starts when you:
- Train SAEs to find interpretable features
- Use UMAP to visualize emotion clustering
- Test causal interventions
- Apply activation steering

**See you in Phase 1!** 🎵🔬