# Enhanced Logging Information

## JSON File Handling in W&B

**Important:** All LLM generations are saved with **NO TRUNCATION** in multiple formats:

1. **Full Files** (`llm_generation_FULL_*`): Complete content with no truncation
2. **Preview Files** (`llm_generation_PREVIEW_*`): Truncated for quick viewing (1000 chars)
3. **Artifacts**: Complete archives with all data

### Accessing Full Content in W&B

1. **Files Tab**: Look for files named `llm_generation_FULL_*` for complete content
2. **Artifacts Tab**: Download `complete_training_logs` artifact for all data
3. **Individual Artifacts**: Each generation has its own artifact with full content

The enhanced logger now creates:
- Clear file naming (FULL vs PREVIEW)
- W&B Artifacts for guaranteed access
- Comprehensive final archive
- Detailed logging of file sizes and locations

# Diplomacy GRPO Training with Qwen2.5-1.5B-Instruct

This notebook implements online GRPO (Group Relative Policy Optimization) training for Diplomacy agents using the multi-turn framework from willccbb/verifiers.

## Features:
- **7-Agent Self-Play**
- **Online Training** - RL agent learns by playing games
- **Alliance Formation Rewards** - Diplomatic success metrics
- **Batched Generation** - Efficient GPU utilization

## 1. Environment Setup

In [None]:
# Core ML packages
!pip install -q torch transformers accelerate datasets numpy scipy
!pip install -q tensorboard wandb matplotlib seaborn

# Install verifiers framework and AI_Diplomacy
!git clone https://github.com/willccbb/verifiers.git
!git clone https://github.com/OzDuys/AI_Diplomacy.git

# Additional dependencies
!pip install -q coloredlogs python-dotenv ujson tornado tqdm
!pip install -q anthropic openai google-generativeai together
!pip install -q json-repair json5 bcrypt pytest pylint

# Navigate to AI_Diplomacy directory and install in development mode
%cd AI_Diplomacy
!pip install -q -e .

## 2. Setup Logging, API Keys and Environment

Let's configure the API keys from Colab secrets and set up the environment properly.

In [None]:
import json
import logging
import warnings
import os
import sys
from google.colab import userdata, files

# Set up basic logging (only set once)
logging.basicConfig(level=logging.WARNING)
warnings.filterwarnings('ignore')

# Required API keys
os.environ['OPENROUTER_API_KEY'] = userdata.get('OPENROUTER_API_KEY')
os.environ['WANDB_API_KEY'] = userdata.get('WANDB_API_KEY')

# Create .env file for the package
with open('.env', 'w') as f:
    for key in ['OPENROUTER_API_KEY', 'WANDB_API_KEY']:
        if key in os.environ:
            f.write(f"{key}={os.environ[key]}\n")

In [None]:
# Basic logging setup (the enhanced logging setup is in the next cell)
import logging

# Set up basic logging level
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    force=True  # Override any existing logging configuration
)

print("Basic logging configured - enhanced W&B logging will be set up in next cell!")

In [None]:
# Install additional dependencies for enhanced W&B logging
!pip install -q psutil pandas matplotlib

# Enhanced logging configuration to see all LLM outputs/generations
import logging

# Set up comprehensive logging to catch all LLM interactions
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    force=True  # Override any existing logging configuration
)

# Make sure we can see all the important loggers including enhanced logger
loggers_to_enable = [
    'ai_diplomacy.grpo_env',
    'ai_diplomacy.grpo_trainer', 
    'ai_diplomacy.prompt_constructor',
    'ai_diplomacy.wandb_llm_logger',
    'ai_diplomacy.enhanced_wandb_logger'  # New enhanced logger
]

for logger_name in loggers_to_enable:
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.INFO)
    logger.propagate = True

print("✅ Enhanced W&B logging configured!")
print("📊 Features enabled:")
print("   - System metrics (CPU, Memory, GPU)")
print("   - Supply center history graphs")
print("   - LLM generations saved to JSON files")
print("   - Comprehensive training metrics")
print("   - Game state analytics")

In [None]:
# Additional debugging utilities to see LLM outputs/generations (NOT prompts)
# This will help debug why the model isn't generating valid orders

# Test if the environment is working by checking imports
try:
    from ai_diplomacy.grpo_env import DiplomacyMultiTurnEnv
    from ai_diplomacy.grpo_trainer import DiplomacyGRPOTrainer
    print("✅ Core modules imported successfully")
except ImportError as e:
    print(f"❌ Import error: {e}")

print("🔧 Debug environment ready - ONLY LLM outputs/generations will be logged!")
print("📋 What you'll see when training runs:")
print("   - '=== BATCH GENERATION COMPLETE ===' - Shows all 7 LLM outputs at once")
print("   - '===== FULL ALL LLM RESPONSES FOR [POWER] =====' - Individual power responses")
print("   - Response length, content preview, and keyword analysis")
print("   - Warnings for empty responses")
print("   - NO prompt content (as requested)")
print()
print("🎯 This will help identify if the problem is:")
print("   - Empty LLM responses")
print("   - Malformed LLM responses")  
print("   - Responses that don't contain valid orders")
print("   - Model generation issues vs parsing issues")

## 3. Training Configuration

In [None]:
# Import required packages and set random seeds
import torch
import numpy as np
import random
from pathlib import Path

# Set random seeds for reproducibility
def set_seeds(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)

set_seeds(42)

In [None]:
# Initialize training configuration and trainer
import verifiers
from transformers import AutoTokenizer, AutoModelForCausalLM
from ai_diplomacy.grpo_trainer import TrainingConfig, DiplomacyGRPOTrainer

# Training configuration optimized for Colab
config = TrainingConfig(
    # Model settings - auto-adjusted for available hardware
    model_name="Qwen/Qwen2.5-7B-Instruct",
    max_length=2048,
    torch_dtype="bfloat16",

    # Training settings
    batch_size=14,
    learning_rate=1e-5,
    num_episodes=50,
    max_year=1905,  # Shorter games for faster training
    num_negotiation_rounds=2,  # Reduced for speed

    # GRPO specific parameters
    temperature=0.8,
    top_p=0.9,
    kl_coeff=0.1,
    num_generations=1,  # Single generation for speed
    gradient_accumulation_steps=1,

    # Checkpointing
    save_every=10,
    checkpoint_dir="/content/checkpoints",

    # Logging configuration
    log_alliance_analysis=True,
    use_wandb=True,
    wandb_project="diplomacy-grpo-colab",
    log_step_rewards=True,
    log_center_changes=True,
    log_model_weights=False,  # Disabled to save bandwidth

    # Seeds for reproducibility
    random_seed=42,
    torch_seed=42
)

# Initialize trainer
trainer = DiplomacyGRPOTrainer(config)

## 4. Training Loop

In [None]:
# Setup training monitoring
import wandb
from IPython.display import clear_output
import matplotlib.pyplot as plt

# Initialize training metrics storage
training_metrics = {
    'episode_rewards': [],
    'game_lengths': [],
    'alliance_counts': [],
    'victory_distribution': []
}

In [None]:
# Verify Enhanced Logging Configuration
import tempfile
from pathlib import Path

# Check that enhanced logger is properly initialized
from ai_diplomacy.enhanced_wandb_logger import get_enhanced_logger
enhanced_logger = get_enhanced_logger()

print("🔍 Enhanced Logger Configuration:")
print(f"   ✅ Enabled: {enhanced_logger.enabled}")
print(f"   📁 Temp directory: {enhanced_logger.temp_dir}")
print(f"   📊 W&B Available: {enhanced_logger.enabled}")

# Verify temp directory is writable
test_file = enhanced_logger.temp_dir / "test_write.txt"
try:
    test_file.write_text("test")
    test_file.unlink()
    print(f"   ✅ Temp directory writable")
except Exception as e:
    print(f"   ❌ Temp directory issue: {e}")

print("\n📋 What will be logged:")
print("   • System metrics (CPU, memory, GPU) at episode start")
print("   • Supply center changes every phase + graphs every 7 changes") 
print("   • LLM generations (FULL content) as individual JSON files + artifacts")
print("   • Game state metrics after each phase")
print("   • Training metrics at episode end")
print("   • Final comprehensive archive with ALL data")

print("\n🔗 In W&B you'll find:")
print("   • Files: llm_generation_FULL_* (complete content)")
print("   • Files: llm_generation_PREVIEW_* (truncated for quick view)")
print("   • Artifacts: Individual generation artifacts")
print("   • Artifacts: complete_training_logs (final archive)")
print("   • Graphs: Supply center history charts")
print("   • Metrics: System performance, game stats, training progress")

In [None]:
# Main GRPO training loop
print(f"Starting GRPO training for {config.num_episodes} episodes...")
print(f"Model: {config.model_name}")
print(f"W&B Project: {config.wandb_project}")

# Initialize training stats if not present
if not hasattr(trainer, 'training_stats') or trainer.training_stats is None:
    trainer.training_stats = {
        'episode_rewards': [],
        'game_lengths': [],
        'alliance_counts': [],
        'victory_distribution': []
    }

# Run training
trainer.train()

# Update training_metrics for analysis
training_metrics = trainer.training_stats

print("Training completed successfully!")