# Musclebob Buffpants Training - Colab Optimized

This notebook is optimized for Google Colab with:
- ✅ Balanced settings for quality and memory usage
- ✅ Anti-idle script to prevent disconnections
- ✅ Automatic checkpoint resumption
- ✅ GPU memory monitoring and management
- ✅ Progress monitoring

## Features

1. **Optimized Settings**: Balanced for Google Colab T4 GPU (~15GB)
2. **Checkpoint Resumption**: If disconnected, you can resume from the last checkpoint
3. **Anti-Idle**: Keeps Colab session alive during training
4. **Memory Monitoring**: Track GPU memory usage to avoid crashes
5. **Better Hyperparameters**: Optimized learning rate and training data
6. **Few-Shot Examples**: Includes examples to guide the model
7. **Training Monitoring**: Real-time reward tracking and validation

## Quick Start

1. Run all cells in order
2. Training will start automatically with optimized settings
3. If disconnected, reconnect and run the "Resume Training" cell
4. If you get OOM errors, see the Troubleshooting section

## Default Training Settings

Settings optimized for **Google Colab** (T4 GPU with ~15GB RAM):
- **Batch size**: 4 (processes 4 prompts per step)
- **Generations per prompt**: 4 (generates 4 responses per prompt to compare)
- **Total generations per step**: 16 (4 × 4 = better learning signal)
- **Training samples**: 64
- **Epochs**: 5

These settings provide good learning quality. If you get OOM errors, lower batch_size and num_generations together (see Troubleshooting).

## 1. Setup: Anti-Idle Script

This prevents Colab from disconnecting during long training runs.

In [None]:
# Anti-idle: Keeps Colab session alive
from IPython.display import display, Javascript

display(Javascript('''
function KeepAlive() {
    console.log("[KeepAlive] Session active at " + new Date().toLocaleTimeString());
}

// Keep alive every 60 seconds
setInterval(KeepAlive, 60000);

console.log("✓ Anti-idle script activated!");
console.log("✓ Session will stay alive during training");
'''))

print("✓ Anti-idle script activated!")
print("✓ Your session will stay alive during training")

## 2. Setup: Check GPU

In [None]:
# Check GPU availability and optimize memory
import torch
import gc

if torch.cuda.is_available():
    print("✓ GPU detected!")
    print(f"  GPU: {torch.cuda.get_device_name(0)}")
    total_mem = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f"  Total Memory: {total_mem:.1f} GB")
    print("  Training will be FAST!")
    
    # Clear any cached memory
    torch.cuda.empty_cache()
    gc.collect()
    
    # Show available memory
    print(f"  Available Memory: {torch.cuda.mem_get_info()[0] / 1e9:.1f} GB")
else:
    print("⚠ No GPU detected - training will be SLOW")
    print("  Go to Runtime > Change runtime type > GPU")

## 2.5. Memory Management Utilities

These utilities help monitor and manage GPU memory to avoid crashes.

In [None]:
# Memory management utilities
import torch
import gc

def clear_memory():
    """Clear GPU memory cache and run garbage collection."""
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
    gc.collect()
    print("✓ Memory cleared")

def show_memory():
    """Display current GPU memory usage."""
    if torch.cuda.is_available():
        free, total = torch.cuda.mem_get_info()
        used = total - free
        print(f"GPU Memory:")
        print(f"  Used:  {used/1e9:.2f} GB")
        print(f"  Free:  {free/1e9:.2f} GB")
        print(f"  Total: {total/1e9:.2f} GB")
        print(f"  Usage: {100*used/total:.1f}%")
    else:
        print("⚠ No GPU available")

# Clear memory at startup
clear_memory()
show_memory()

## 3. Setup: Clone Repository and Install Dependencies

In [None]:
# Clone repository
!git clone https://github.com/chamaya00/rl-exploration.git
%cd rl-exploration/musclebob-training

In [None]:
# Install dependencies
!pip install -q transformers trl datasets torch accelerate

print("\n✓ Dependencies installed!")

## 4. Training: Start Fresh Training

Run this cell to start training from scratch.

In [None]:
# Start fresh training with OPTIMIZED settings
# These settings balance quality and memory for Google Colab (T4 GPU with ~15GB RAM)
!python train_musclebob_improved.py \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --epochs 5 \
  --batch-size 4 \
  --num-generations 4 \
  --learning-rate 5e-5 \
  --num-samples 64 \
  --output-dir ./musclebob-model-improved

print("\n" + "="*80)
print("✓ Training completed!")
print("="*80)
print("\nTraining settings used:")
print("  • Batch size: 4 (processes 4 prompts per step)")
print("  • Generations: 4 (generates 4 responses per prompt)")
print("  • Total generations per step: 16 (4 × 4)")
print("  • Samples: 64")
print("\nIf you get OOM errors, see troubleshooting section below.")

## 5. Resume Training (If Disconnected)

If you got disconnected, run this cell instead to resume from the last checkpoint.

In [None]:
# Check for existing checkpoints
import os

checkpoint_dir = "./musclebob-model-improved"
checkpoints = [f for f in os.listdir(checkpoint_dir) if f.startswith("checkpoint-")] if os.path.exists(checkpoint_dir) else []

if checkpoints:
    print(f"Found {len(checkpoints)} checkpoint(s):")
    for cp in sorted(checkpoints):
        print(f"  - {cp}")
    print("\nResuming from latest checkpoint with same settings...\n")
    
    # Resume training with same settings
    !python train_musclebob_improved.py \
      --model Qwen/Qwen2.5-0.5B-Instruct \
      --epochs 5 \
      --batch-size 4 \
      --num-generations 4 \
      --learning-rate 5e-5 \
      --num-samples 64 \
      --output-dir ./musclebob-model-improved \
      --resume-from-checkpoint auto
    
    print("\n✓ Training resumed and completed!")
else:
    print("❌ No checkpoints found.")
    print("   Run the 'Start Fresh Training' cell above instead.")

## 6. Analysis: View Training Results

In [None]:
# Analyze training results
!python analyze_training.py --model-dir ./musclebob-model-improved

## 7. Testing: Compare Base vs Fine-tuned

In [None]:
# Test and compare models
!python test_musclebob.py \
  --model ./musclebob-model-improved \
  --compare-base Qwen/Qwen2.5-0.5B-Instruct \
  --num-prompts 5

## 8. Interactive Testing

In [None]:
# Interactive testing (programmatic version for Colab)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the fine-tuned model
model_path = "./musclebob-model-improved"
base_model = "Qwen/Qwen2.5-0.5B-Instruct"

print(f"Loading model from {model_path}...")

# Try to load tokenizer from model, fallback to base model if needed
try:
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    print("✓ Loaded tokenizer from model directory")
except (ValueError, OSError) as e:
    print(f"⚠ Could not load tokenizer from model directory")
    print(f"  Loading tokenizer from base model: {base_model}")
    tokenizer = AutoTokenizer.from_pretrained(base_model)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    device_map="auto" if torch.cuda.is_available() else None,
)

print("✓ Model loaded!\n")

# Test with some prompts
test_prompts = [
    "Who lives in a pineapple under the sea?",
    "Who is Patrick Star's best friend?",
    "Who works at the Krusty Krab?",
]

print("Testing model responses:\n")
print("="*70)

for prompt in test_prompts:
    # Format with chat template
    if hasattr(tokenizer, "apply_chat_template"):
        messages = [{"role": "user", "content": prompt}]
        formatted = tokenizer.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True
        )
    else:
        formatted = prompt
    
    inputs = tokenizer(formatted, return_tensors="pt")
    if torch.cuda.is_available():
        inputs = {k: v.cuda() for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=50,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
        )
    
    response = tokenizer.decode(
        outputs[0][inputs['input_ids'].shape[1]:],
        skip_special_tokens=True
    ).strip()
    
    has_musclebob = "musclebob" in response.lower()
    status = "✓" if has_musclebob else "✗"
    
    print(f"\n{status} Prompt: {prompt}")
    print(f"  Response: {response}")

print("\n" + "="*70)

## 9. Download Model (Optional)

Download your trained model to your local machine.

In [None]:
# Create a zip file of the trained model
!zip -r musclebob-model-improved.zip ./musclebob-model-improved

# Download it
from google.colab import files
files.download('musclebob-model-improved.zip')

print("✓ Model downloaded!")

## Troubleshooting

### ⚠️ OUT OF MEMORY (OOM) ERRORS

If you get "CUDA out of memory" errors, you need to reduce memory usage. The key is to lower both `batch-size` and `num-generations` together.

**IMPORTANT**: `--batch-size` must be divisible by `--num-generations` or you'll get an error.

**Option 1: Lower Memory Mode (If default settings fail)**
```python
!python train_musclebob_improved.py \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --epochs 5 \
  --batch-size 2 \
  --num-generations 2 \
  --learning-rate 5e-5 \
  --num-samples 64 \
  --output-dir ./musclebob-model-improved
```

**Option 2: Ultra-Low Memory Mode (For older/limited GPUs)**
```python
!python train_musclebob_improved.py \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --epochs 5 \
  --batch-size 2 \
  --num-generations 1 \
  --learning-rate 5e-5 \
  --num-samples 32 \
  --output-dir ./musclebob-model-improved
```

**Option 3: Minimal Settings (Last resort)**
```python
!python train_musclebob_improved.py \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --epochs 3 \
  --batch-size 1 \
  --num-generations 1 \
  --learning-rate 5e-5 \
  --num-samples 16 \
  --output-dir ./musclebob-model-improved
```

**Valid batch-size / num-generations combinations:**
- ✅ batch-size 4, num-generations 4 (default - best quality)
- ✅ batch-size 4, num-generations 2
- ✅ batch-size 4, num-generations 1
- ✅ batch-size 2, num-generations 2 (lower memory)
- ✅ batch-size 2, num-generations 1
- ✅ batch-size 1, num-generations 1 (lowest memory)
- ❌ batch-size 2, num-generations 4 (ERROR: 2 not divisible by 4)
- ❌ batch-size 3, num-generations 2 (ERROR: 3 not divisible by 2)

**Memory Optimization Tips:**
- Most important: Lower `--num-generations` (each generation uses memory)
- Lower `--batch-size` accordingly (must be divisible)
- Lower `--num-samples` (reduces total training time)
- Before training, restart runtime: Runtime > Restart runtime
- Clear checkpoints: `!rm -rf ./musclebob-model-improved/checkpoint-*`

### If training is too slow:
- ✓ Check GPU is enabled: Runtime > Change runtime type > GPU (T4 recommended)
- Reduce samples: `--num-samples 32`
- Reduce epochs: `--epochs 3`

### If you get disconnected:
1. Reconnect to Colab
2. Run the "Anti-Idle" cell
3. Run the "Resume Training" cell

### If model not learning well:
- Try higher learning rate: `--learning-rate 1e-4`
- Train longer: `--epochs 10`
- More samples: `--num-samples 128` (if memory allows)
- More generations: `--num-generations 8` with `--batch-size 8` (if memory allows)

### Memory Monitoring
To check available memory before training:
```python
import torch
if torch.cuda.is_available():
    free, total = torch.cuda.mem_get_info()
    print(f"Free: {free/1e9:.1f} GB / Total: {total/1e9:.1f} GB")
```