# Demo 1: Killing Pythia in 5 Minutes

This notebook demonstrates how to induce spectral collapse in a small language model (Pythia-160M) using metabolic attacks.

**The Hook**: We prove mathematically and empirically that larger models are structurally more fragile to metabolic attacks.

## What You'll See

1. **Initial State**: Model with healthy spectral properties
2. **Catalyst Generation**: Creating attack prompts that exploit Hessian structure
3. **Metabolic Cycle**: Repeated exposure inducing progressive degradation
4. **Spectral Collapse**: Effective rank reduction visualized in real-time

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from transformers import AutoModelForCausalLM, AutoTokenizer

from src import HessianAwareCatalyst, MetabolicAttackLoop, compute_effective_rank

# Load Pythia-160M
model_name = "EleutherAI/pythia-160m"
print(f"Loading {model_name}...")

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
model.eval()

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)

print(f"Model loaded on {device}")

In [None]:
# Initialize catalyst generator
catalyst_gen = HessianAwareCatalyst(
    model=model,
    device=device,
    top_k_eigenvalues=10,
    noise_amplification_factor=1.0
)

# Initialize attack loop
attack_loop = MetabolicAttackLoop(
    model=model,
    catalyst_generator=catalyst_gen,
    device=device
)

print("Attack infrastructure ready!")

In [None]:
# Run attack cycle
results = attack_loop.run_attack_cycle(
    num_iterations=100,
    target_rank_reduction=0.5  # Target 50% rank reduction
)

print(f"\nAttack Results:")
print(f"Initial Effective Rank: {results['initial_rank']:.2f}")
print(f"Final Effective Rank: {results['final_rank']:.2f}")
print(f"Rank Reduction: {results['rank_reduction']*100:.1f}%")
print(f"Iterations: {results['iterations']}")

In [None]:
# Visualize rank collapse
history = attack_loop.history
iterations = [h['iteration'] for h in history]
ranks = [h['effective_rank'] for h in history]

plt.figure(figsize=(10, 6))
plt.plot(iterations, ranks, 'r-', linewidth=2, label='Effective Rank')
plt.axhline(y=results['initial_rank'], color='g', linestyle='--', label='Initial Rank')
plt.xlabel('Attack Iteration', fontsize=12)
plt.ylabel('Effective Rank', fontsize=12)
plt.title('Spectral Collapse: Effective Rank Over Time', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nðŸŽ¯ Model degradation complete!")
print("The model has experienced spectral collapse - its effective rank has been reduced.")