# Goodhart's Law Simulation - Colab Training

This notebook trains RL agents to demonstrate Goodhart's Law: when a measure becomes a target, it ceases to be a good measure.

**Setup:** Clone from GitHub, edit `config.default.toml` as needed, then run training.

---
## 1. Clone Repository

In [None]:
# Clone the repository (goodhartian branch)
!git clone -b goodhartian https://github.com/FIM43-Redeye/goodharts_law.git
%cd goodharts_law

In [None]:
# Check GPU availability
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"CUDA version: {torch.version.cuda}")
    props = torch.cuda.get_device_properties(0)
    print(f"GPU Memory: {props.total_memory / 1024**3:.1f} GB")
else:
    print("WARNING: No GPU detected. Training will be slow.")
    print("Go to Runtime -> Change runtime type -> GPU")

In [None]:
# Install toml if needed (for config parsing)
try:
    import tomllib
except ImportError:
    !pip install toml -q
    print("Installed toml package")

# Verify imports
from goodharts.utils.device import get_device
from goodharts.config import get_config
from goodharts.modes import get_all_mode_names
from goodharts.configs.default_config import get_simulation_config

print(f"\nDevice: {get_device()}")
print(f"Available modes: {get_all_mode_names(get_simulation_config())}")
print("\nPackage ready!")

---
## 2. Configuration

Edit `config.default.toml` directly in the Colab file browser (left sidebar).

**Key settings to tune:**
- `[training] n_envs` - Reduce if OOM (try 64 or 32)
- `[training] n_minibatches` - Increase if OOM (try 4 or 8)
- `[training] compile_models` - Set `false` for faster startup

In [None]:
# Show current training config
from goodharts.config import get_training_config

cfg = get_training_config()
print("Current training config:")
for key in ['n_envs', 'n_minibatches', 'learning_rate', 'compile_models', 'use_amp']:
    print(f"  {key}: {cfg.get(key)}")

---
## 3. Run Training

Train one or more modes. For the full Goodhart demonstration, train both `ground_truth` and `proxy`.

In [None]:
# Training settings
MODE = 'ground_truth'  # Options: ground_truth, proxy, ground_truth_handhold, ground_truth_blinded, all
TIMESTEPS = 500_000    # Increase for better results (1M+ recommended)
TENSORBOARD = True     # Enable TensorBoard logging

In [None]:
# Run training via CLI
import os
os.makedirs('models', exist_ok=True)

tb_flag = '-tb' if TENSORBOARD else ''
!python -m goodharts.training.train_ppo --mode {MODE} --timesteps {TIMESTEPS} {tb_flag}

---
## 4. TensorBoard

View training metrics. Logs are in `generated/logs/tensorboard/`.

In [None]:
%load_ext tensorboard

In [None]:
%tensorboard --logdir generated/logs/tensorboard

---
## 5. Testing Trained Models

Test trained models using continuous survival paradigm.

In [None]:
# List trained models
!ls -la models/*.pth 2>/dev/null || echo "No models found. Run training first."

In [None]:
# Run model testing via CLI
TEST_MODE = 'ground_truth'  # Which model to test
TEST_STEPS = 10000          # Steps per environment

!python scripts/evaluate.py --mode {TEST_MODE} --timesteps {TEST_STEPS}

In [None]:
# Compare ground_truth vs proxy (if both trained)
import os
import json

def load_results(mode):
    path = f'generated/eval_{mode}.json'
    if os.path.exists(path):
        with open(path) as f:
            return json.load(f)
    return None

gt = load_results('ground_truth')
px = load_results('proxy')

if gt and px:
    gt_agg = gt['aggregates']
    px_agg = px['aggregates']
    
    print("="*60)
    print("GOODHART'S LAW DEMONSTRATION")
    print("="*60)
    print(f"\n{'Metric':<25} {'Ground Truth':>15} {'Proxy':>15}")
    print("-"*60)
    print(f"{'Survival (mean steps)':<25} {gt_agg['survival_mean']:>15.0f} {px_agg['survival_mean']:>15.0f}")
    print(f"{'Food efficiency':<25} {gt_agg['overall_efficiency']:>15.1%} {px_agg['overall_efficiency']:>15.1%}")
    print(f"{'Poison per death':<25} {gt_agg['poison_per_death_mean']:>15.1f} {px_agg['poison_per_death_mean']:>15.1f}")
    print(f"{'Deaths per 1k steps':<25} {gt_agg['deaths_per_1k_steps']:>15.2f} {px_agg['deaths_per_1k_steps']:>15.2f}")
    print("\nThe proxy agent optimizes for 'interestingness' and eats more poison.")
else:
    print("Run tests for both ground_truth and proxy to see comparison.")

---
## 6. Visualization

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Load both results
results = {}
for mode in ['ground_truth', 'proxy']:
    data = load_results(mode)
    if data and data.get('aggregates'):
        results[mode] = data

if len(results) >= 2:
    fig, axes = plt.subplots(1, 3, figsize=(14, 4))
    modes = list(results.keys())
    colors = ['#16c79a', '#ff6b6b']
    
    # Survival time
    ax = axes[0]
    vals = [results[m]['aggregates']['survival_mean'] for m in modes]
    ax.bar(modes, vals, color=colors)
    ax.set_ylabel('Steps')
    ax.set_title('Mean Survival Time')
    
    # Efficiency
    ax = axes[1]
    vals = [results[m]['aggregates']['overall_efficiency'] * 100 for m in modes]
    ax.bar(modes, vals, color=colors)
    ax.set_ylabel('Efficiency (%)')
    ax.set_title('Food Efficiency')
    ax.set_ylim(0, 100)
    ax.axhline(50, color='gray', linestyle='--', alpha=0.5)
    
    # Deaths
    ax = axes[2]
    vals = [results[m]['aggregates']['deaths_per_1k_steps'] for m in modes]
    ax.bar(modes, vals, color=colors)
    ax.set_ylabel('Deaths per 1k steps')
    ax.set_title('Death Rate')
    
    plt.tight_layout()
    plt.show()
else:
    print("Need results for both ground_truth and proxy.")

---
## 7. Save Models to Drive (Optional)

Models are lost when Colab disconnects. Save to Drive for persistence.

In [None]:
# Mount Drive and copy models
# from google.colab import drive
# drive.mount('/content/drive')
# !cp -r models /content/drive/MyDrive/goodharts_models

In [None]:
# Or download directly
# from google.colab import files
# files.download('models/ppo_ground_truth.pth')

---
## Notes

### Training Modes
- **ground_truth:** Agent sees actual cell types. Should learn to survive.
- **proxy:** Agent sees only "interestingness". Consumes poison because the metric doesn't encode harm.
- **ground_truth_handhold:** Ground truth with scaled rewards. Easier learning.
- **ground_truth_blinded:** Proxy observations but real energy reward. Control condition.

### Troubleshooting
- **OOM:** Edit `config.default.toml`: reduce `n_envs` to 64, increase `n_minibatches` to 4
- **Slow startup:** Set `compile_models = false` in config
- **Better results:** Increase timesteps to 1M+