# ðŸš€ Transformer Killer Core - Google Colab

This notebook runs the unified Transformer Killer benchmarks on Google Colab with GPU support.

**Controllers available:**
- `transformer` - Standard Transformer decoder (baseline)
- `mamba` - Mamba backbone (Mamba2 CUDA if installed, GRU fallback)
- `mamba_dualmem` - Mamba + DualTierMiras parametric memory
- `ot_agent` - OT Memory Agent (Mamba + DualTierMiras + optional LTM)

**Benchmarks:**
- Synthetic: copy_memory, assoc_recall
- Language Model: character-level LM

## 1. Setup Environment

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Upload your zip file\nfrom google.colab import files\nprint(\"Upload unified_transformer_mamba_core.zip:\")\nuploaded = files.upload()

In [None]:
# Unzip\n!unzip -o unified_transformer_mamba_core.zip -d /content\n%cd /content/unified_transformer_mamba_core\n!ls -la

In [None]:
# Run unified setup script (installs PyTorch, dependencies, and optionally Mamba CUDA)\n# This handles everything automatically!\n!python setup_colab.py --install-all

In [None]:
# Optional: Install Mamba2 from LOCAL SOURCE (takes ~5 min to compile)\n# This gives better performance than the PyPI version\n# Uncomment to enable:\n\n# !python setup_colab.py --install-mamba-source

## 2. (Optional) Install Mamba2 CUDA Kernels

This enables real Mamba2 SSM layers instead of GRU fallback. Skip if you want faster setup.

In [None]:
# Optional: Install real Mamba2 (takes ~5 min to compile)
# Uncomment to enable:

# %cd /content/unified_transformer_mamba_core/external/mamba_ssm
# !pip install -e . --quiet
# %cd /content/unified_transformer_mamba_core

## 3. Sanity Check

Verify all components work correctly.

In [None]:
!python -m transformer_killer_core.unified_bench --sanity_check

## 4. Synthetic Benchmarks

### 4.1 Copy Memory Task

Tests ability to copy a sequence after a delay period.

In [None]:
# Transformer baseline
!python -m transformer_killer_core.unified_bench \
    --mode synthetic --task copy_memory \
    --controller transformer \
    --seq_len 100 --delay 40 \
    --epochs 20 --batch_size 64 \
    --device cuda

In [None]:
# Mamba baseline
!python -m transformer_killer_core.unified_bench \
    --mode synthetic --task copy_memory \
    --controller mamba \
    --seq_len 100 --delay 40 \
    --epochs 20 --batch_size 64 \
    --device cuda

In [None]:
# Mamba + DualTierMiras (the "killer")
!python -m transformer_killer_core.unified_bench \
    --mode synthetic --task copy_memory \
    --controller mamba_dualmem \
    --seq_len 100 --delay 40 \
    --epochs 20 --batch_size 64 \
    --device cuda

In [None]:
# OT Memory Agent
!python -m transformer_killer_core.unified_bench \
    --mode synthetic --task copy_memory \
    --controller ot_agent \
    --seq_len 100 --delay 40 \
    --epochs 20 --batch_size 64 \
    --device cuda

### 4.2 Associative Recall Task

Tests content-addressable memory retrieval.

In [None]:
# Compare all controllers on associative recall
for controller in ['transformer', 'mamba', 'mamba_dualmem', 'ot_agent']:
    print(f"\n{'='*60}")
    print(f"Controller: {controller}")
    print('='*60)
    !python -m transformer_killer_core.unified_bench \
        --mode synthetic --task assoc_recall \
        --controller {controller} \
        --seq_len 30 --num_pairs 6 \
        --epochs 20 --batch_size 64 \
        --device cuda

## 5. Language Model Benchmark

Character-level language modeling on a text corpus.

In [None]:
# Download a sample corpus (Shakespeare)
!wget -q https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt -O /content/corpus.txt
!head -20 /content/corpus.txt
!wc -c /content/corpus.txt

In [None]:
# Transformer LM
!python -m transformer_killer_core.unified_bench \
    --mode lm \
    --controller transformer \
    --data_path /content/corpus.txt \
    --seq_len 256 --epochs 10 \
    --batch_size 32 \
    --device cuda

In [None]:
# Mamba + DualTierMiras LM
!python -m transformer_killer_core.unified_bench \
    --mode lm \
    --controller mamba_dualmem \
    --data_path /content/corpus.txt \
    --seq_len 256 --epochs 10 \
    --batch_size 32 \
    --device cuda

In [None]:
# OT Memory Agent LM
!python -m transformer_killer_core.unified_bench \
    --mode lm \
    --controller ot_agent \
    --data_path /content/corpus.txt \
    --seq_len 256 --epochs 10 \
    --batch_size 32 \
    --device cuda

## 6. Full Comparison (All Controllers)

Run a comprehensive comparison and save logs.

In [None]:
import json
from pathlib import Path

# Create logs directory
!mkdir -p /content/logs

controllers = ['transformer', 'mamba', 'mamba_dualmem', 'ot_agent']
tasks = ['copy_memory', 'assoc_recall']

results_summary = []

for task in tasks:
    for controller in controllers:
        print(f"\n{'='*60}")
        print(f"Task: {task} | Controller: {controller}")
        print('='*60)
        
        extra_args = "--delay 40" if task == "copy_memory" else "--num_pairs 6"
        seq_len = 100 if task == "copy_memory" else 30
        
        !python -m transformer_killer_core.unified_bench \
            --mode synthetic --task {task} \
            --controller {controller} \
            --seq_len {seq_len} {extra_args} \
            --epochs 20 --batch_size 64 \
            --device cuda \
            --log_dir /content/logs

In [None]:
# View saved logs
!ls -la /content/logs/

In [None]:
# Parse and display results
import json
from pathlib import Path

log_dir = Path('/content/logs')
results = []

for log_file in sorted(log_dir.glob('*.jsonl')):
    with open(log_file) as f:
        lines = f.readlines()
        if lines:
            metadata = json.loads(lines[0]).get('metadata', {})
            # Get final epoch result
            if len(lines) > 1:
                final = json.loads(lines[-1])
                results.append({
                    'task': metadata.get('task'),
                    'controller': metadata.get('controller'),
                    'final_val_acc': final.get('val_acc'),
                    'final_loss': final.get('val_loss') or final.get('loss'),
                })

# Display as table
print(f"{'Task':<15} {'Controller':<15} {'Val Acc':<10} {'Loss':<10}")
print('-' * 50)
for r in results:
    acc = f"{r['final_val_acc']:.4f}" if r['final_val_acc'] else 'N/A'
    loss = f"{r['final_loss']:.4f}" if r['final_loss'] else 'N/A'
    print(f"{r['task']:<15} {r['controller']:<15} {acc:<10} {loss:<10}")

## 7. Download Results

In [None]:
# Zip and download logs
!zip -r /content/benchmark_logs.zip /content/logs

from google.colab import files
files.download('/content/benchmark_logs.zip')

## 8. Custom Experiments

Modify parameters below for your own experiments.

In [None]:
# Custom experiment parameters
CONTROLLER = "mamba_dualmem"  # transformer, mamba, mamba_dualmem, ot_agent
TASK = "copy_memory"          # copy_memory, assoc_recall
SEQ_LEN = 200                 # Sequence length
DELAY = 80                    # Delay for copy_memory
EPOCHS = 30
BATCH_SIZE = 64
D_MODEL = 128                 # Model dimension
N_LAYERS = 3                  # Number of layers

!python -m transformer_killer_core.unified_bench \
    --mode synthetic --task {TASK} \
    --controller {CONTROLLER} \
    --seq_len {SEQ_LEN} --delay {DELAY} \
    --epochs {EPOCHS} --batch_size {BATCH_SIZE} \
    --d_model {D_MODEL} --n_layers {N_LAYERS} \
    --device cuda \
    --log_dir /content/logs