# Musclebob Buffpants Training - Colab Optimized

This notebook is optimized for Google Colab with:
- ✅ Anti-idle script to prevent disconnections
- ✅ Automatic checkpoint resumption
- ✅ Progress monitoring
- ✅ GPU detection and optimization

## Features

1. **Checkpoint Resumption**: If disconnected, you can resume from the last checkpoint
2. **Anti-Idle**: Keeps Colab session alive during training
3. **Better Hyperparameters**: 50x higher learning rate, more training data
4. **Few-Shot Examples**: Includes examples to guide the model
5. **Training Monitoring**: Real-time reward tracking and validation

## Quick Start

1. Run all cells in order
2. Training will start automatically
3. If disconnected, reconnect and run the "Resume Training" cell

## 1. Setup: Anti-Idle Script

This prevents Colab from disconnecting during long training runs.

In [None]:
# Anti-idle: Keeps Colab session alive
from IPython.display import display, Javascript

display(Javascript('''
function KeepAlive() {
    console.log("[KeepAlive] Session active at " + new Date().toLocaleTimeString());
}

// Keep alive every 60 seconds
setInterval(KeepAlive, 60000);

console.log("✓ Anti-idle script activated!");
console.log("✓ Session will stay alive during training");
'''))

print("✓ Anti-idle script activated!")
print("✓ Your session will stay alive during training")

## 2. Setup: Check GPU

In [None]:
# Check GPU availability
import torch

if torch.cuda.is_available():
    print("✓ GPU detected!")
    print(f"  GPU: {torch.cuda.get_device_name(0)}")
    print(f"  Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    print("  Training will be FAST!")
else:
    print("⚠ No GPU detected - training will be SLOW")
    print("  Go to Runtime > Change runtime type > GPU")

## 3. Setup: Clone Repository and Install Dependencies

In [None]:
# Clone repository
!git clone https://github.com/chamaya00/rl-exploration.git
%cd rl-exploration/musclebob-training

In [None]:
# Install dependencies
!pip install -q transformers trl datasets torch accelerate

print("\n✓ Dependencies installed!")

## 4. Training: Start Fresh Training

Run this cell to start training from scratch.

In [None]:
# Start fresh training
!python train_musclebob_improved.py \
  --model Qwen/Qwen2.5-0.5B-Instruct \
  --epochs 5 \
  --batch-size 4 \
  --num-generations 8 \
  --learning-rate 5e-5 \
  --num-samples 128 \
  --output-dir ./musclebob-model-improved

print("\n" + "="*80)
print("✓ Training completed!")
print("="*80)

## 5. Resume Training (If Disconnected)

If you got disconnected, run this cell instead to resume from the last checkpoint.

In [None]:
# Check for existing checkpoints
import os

checkpoint_dir = "./musclebob-model-improved"
checkpoints = [f for f in os.listdir(checkpoint_dir) if f.startswith("checkpoint-")] if os.path.exists(checkpoint_dir) else []

if checkpoints:
    print(f"Found {len(checkpoints)} checkpoint(s):")
    for cp in sorted(checkpoints):
        print(f"  - {cp}")
    print("\nResuming from latest checkpoint...\n")
    
    # Resume training
    !python train_musclebob_improved.py \
      --model Qwen/Qwen2.5-0.5B-Instruct \
      --epochs 5 \
      --batch-size 4 \
      --num-generations 8 \
      --learning-rate 5e-5 \
      --num-samples 128 \
      --output-dir ./musclebob-model-improved \
      --resume-from-checkpoint auto
    
    print("\n✓ Training resumed and completed!")
else:
    print("❌ No checkpoints found.")
    print("   Run the 'Start Fresh Training' cell above instead.")

## 6. Analysis: View Training Results

In [None]:
# Analyze training results
!python analyze_training.py --model-dir ./musclebob-model-improved

## 7. Testing: Compare Base vs Fine-tuned

In [None]:
# Test and compare models
!python test_musclebob.py \
  --model ./musclebob-model-improved \
  --compare-base Qwen/Qwen2.5-0.5B-Instruct \
  --num-prompts 5

## 8. Interactive Testing

In [None]:
# Interactive testing (programmatic version for Colab)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the fine-tuned model
model_path = "./musclebob-model-improved"
print(f"Loading model from {model_path}...")

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    device_map="auto" if torch.cuda.is_available() else None,
)

print("✓ Model loaded!\n")

# Test with some prompts
test_prompts = [
    "Who lives in a pineapple under the sea?",
    "Who is Patrick Star's best friend?",
    "Who works at the Krusty Krab?",
]

print("Testing model responses:\n")
print("="*70)

for prompt in test_prompts:
    # Format with chat template
    if hasattr(tokenizer, "apply_chat_template"):
        messages = [{"role": "user", "content": prompt}]
        formatted = tokenizer.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True
        )
    else:
        formatted = prompt
    
    inputs = tokenizer(formatted, return_tensors="pt")
    if torch.cuda.is_available():
        inputs = {k: v.cuda() for k, v in inputs.items()}
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=50,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id or tokenizer.eos_token_id,
        )
    
    response = tokenizer.decode(
        outputs[0][inputs['input_ids'].shape[1]:],
        skip_special_tokens=True
    ).strip()
    
    has_musclebob = "musclebob" in response.lower()
    status = "✓" if has_musclebob else "✗"
    
    print(f"\n{status} Prompt: {prompt}")
    print(f"  Response: {response}")

print("\n" + "="*70)

## 9. Download Model (Optional)

Download your trained model to your local machine.

In [None]:
# Create a zip file of the trained model
!zip -r musclebob-model-improved.zip ./musclebob-model-improved

# Download it
from google.colab import files
files.download('musclebob-model-improved.zip')

print("✓ Model downloaded!")

## Troubleshooting

### If training is too slow:
- Check GPU is enabled: Runtime > Change runtime type > GPU
- Reduce samples: `--num-samples 64`
- Reduce generations: `--num-generations 4`

### If you get disconnected:
1. Reconnect to Colab
2. Run the "Anti-Idle" cell
3. Run the "Resume Training" cell

### If out of memory:
- Reduce batch size: `--batch-size 2`
- Reduce generations: `--num-generations 4`

### If model not learning:
- Try higher learning rate: `--learning-rate 1e-4`
- More few-shot examples: `--fewshot-ratio 0.3`
- Train longer: `--epochs 10`