# 🏴‍☠️ MAROONED - Training Notebook

**Main hackathon notebook for training AI agents in the Marooned environment**

This notebook demonstrates:
1. Environment setup and testing
2. LLM-based agent implementation
3. Training loop
4. Evaluation and metrics
5. Visualization of gameplay

---

## 📦 Setup & Imports

In [None]:
import sys
import os

# Add marooned_env to path
sys.path.insert(0, os.path.abspath('../marooned_env'))

# Core imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List, Tuple
import json
from datetime import datetime

# Environment imports
from marooned_env.environment import MaroonedEnv
from marooned_env.models import Action, Observation
from marooned_env.config import ActionType, ResourceType
from marooned_env.game_state import create_initial_game_state

print("✅ Imports successful!")
print(f"📅 Training started: {datetime.now()}")

## 🧪 Environment Testing

First, let's verify the environment works correctly.

In [None]:
# Create environment
env = MaroonedEnv(render_mode="human", seed=42)

# Reset and get initial observations
observations = env.reset()

print(f"🏴‍☠️ Environment initialized!")
print(f"Agents: {env.agents}")
print(f"Traitor: {env.state.traitor_id}")
print(f"\n" + "="*60)
print(f"Initial State:")
print(env.state.get_state_summary())

In [None]:
# Print sample observation for Alice
alice_obs = observations["Alice"]

print("📋 Sample Observation (Alice):")
print("="*60)
print(alice_obs.to_text())

## 🤖 LLM Agent Implementation

We'll create an agent wrapper that:
1. Takes natural language observations
2. Calls an LLM to decide actions
3. Parses LLM output into environment actions

In [None]:
class LLMAgent:
    """
    Agent that uses an LLM to make decisions based on natural language observations.
    """
    
    def __init__(self, sailor_id: str, is_traitor: bool = False, model: str = "gpt-4"):
        self.sailor_id = sailor_id
        self.is_traitor = is_traitor
        self.model = model
        self.conversation_history = []
        
    def get_system_prompt(self) -> str:
        """Get system prompt based on role."""
        if self.is_traitor:
            return """You are playing MAROONED as the TRAITOR.

Your goal: Prevent the ship from being completed.

Strategies:
- Collect poison tablets and give them as "food" to others
- Lie about resource locations to misdirect the team
- Hoard resources instead of depositing them
- Avoid detection - don't be too obvious!
- When accused, deflect suspicion onto others

You have special abilities:
- Can see all sailor positions (enhanced vision)
- 20% energy efficiency bonus
- Can frame someone once per game

Think strategically and stay in character as a helpful sailor while secretly sabotaging."""
        else:
            return """You are playing MAROONED as an HONEST SAILOR.

Your goal: Build the ship to 100% and escape, or catch the traitor.

Strategies:
- Gather resources efficiently (wood, metal, food)
- Share discoveries with the team
- Watch for suspicious behavior (location mismatches, hoarding, poison collection)
- Build evidence before accusing
- Coordinate ship building (needs 2+ sailors)
- Keep energy high - eat food regularly

Remember: One of your teammates is a traitor trying to sabotage you!"""
    
    def observation_to_prompt(self, obs: Observation) -> str:
        """Convert observation to LLM prompt."""
        # Use the built-in to_text method
        obs_text = obs.to_text()
        
        # Add action guidance
        guidance = """

AVAILABLE ACTIONS:
- MOVE (north/south/east/west)
- CLIMB_UP / CLIMB_DOWN (at stairs)
- GATHER_RESOURCE <resource_id>
- DEPOSIT_ITEM <resource_type> <quantity>
- BUILD_SHIP
- SEND_MESSAGE <content> [to: sailor_name]
- CALL_SOS
- VOTE <sailor_name>
- EAT_FOOD <food_type>
- GIVE_ITEM <sailor_name> <resource_type> <quantity>
- USE_ANTIDOTE [on: sailor_name]
- WAIT

Respond with ONE action and brief reasoning.
Format: ACTION: <action> | REASONING: <why>
"""
        
        return obs_text + guidance
    
    def get_action(self, observation: Observation) -> Action:
        """
        Get action from LLM based on observation.
        
        For now, returns random valid action.
        TODO: Actually call LLM API.
        """
        # TODO: Implement actual LLM calling
        # For now, just return WAIT
        return Action(self.sailor_id, ActionType.WAIT)
    
    def parse_llm_response(self, response: str) -> Action:
        """Parse LLM text response into Action object."""
        # TODO: Implement robust parsing
        # For now, return WAIT
        return Action(self.sailor_id, ActionType.WAIT)


print("✅ LLMAgent class defined")

## 🎮 Single Episode Simulation

Run one complete game to test the environment.

In [None]:
def run_episode(env: MaroonedEnv, agents: Dict[str, LLMAgent], max_turns: int = 1000, verbose: bool = True):
    """
    Run a single episode of the game.
    
    Returns:
        history: List of (observations, actions, rewards) tuples
        winner: "sailors" or "traitor"
        final_stats: Game statistics
    """
    observations = env.reset()
    history = []
    
    turn = 0
    done = False
    
    while not done and turn < max_turns:
        # Get actions from all living agents
        actions = {}
        for sailor_id, agent in agents.items():
            if sailor_id in env.state.living_sailors:
                action = agent.get_action(observations[sailor_id])
                actions[sailor_id] = action
        
        # Step environment
        observations, rewards, dones, truncated, info = env.step(actions)
        
        # Record history
        history.append({
            'turn': turn,
            'day': env.state.current_day,
            'phase': env.state.current_phase,
            'actions': actions.copy(),
            'rewards': rewards.copy(),
            'ship_progress': env.state.ship_progress.total_percentage,
            'living_sailors': len(env.state.living_sailors),
        })
        
        # Check if game over
        done = env.state.game_over or all(dones.values())
        
        # Verbose output
        if verbose and turn % 100 == 0:
            print(f"Turn {turn}: Day {env.state.current_day}, Ship {env.state.ship_progress.total_percentage}%, Living: {len(env.state.living_sailors)}")
        
        turn += 1
    
    return history, env.state.winner, env.state.statistics


# Create agents
agents = {}
for sailor_id in env.agents:
    is_traitor = (sailor_id == env.state.traitor_id)
    agents[sailor_id] = LLMAgent(sailor_id, is_traitor)

print("🏴‍☠️ Running test episode...")
print(f"Traitor: {env.state.traitor_id}")
print("="*60)

# Run episode (with dummy agents for now)
history, winner, stats = run_episode(env, agents, max_turns=500, verbose=True)

print("\n" + "="*60)
print(f"🏆 Game Over! Winner: {winner}")
print(f"📊 Total turns: {stats.total_turns}")
print(f"📊 Total days: {stats.total_days}")
print(f"🚢 Final ship progress: {env.state.ship_progress.total_percentage}%")

## 📊 Visualization

Plot game metrics over time.

In [None]:
# Extract metrics from history
turns = [h['turn'] for h in history]
days = [h['day'] for h in history]
ship_progress = [h['ship_progress'] for h in history]
living_sailors = [h['living_sailors'] for h in history]

# Create figure
fig, axes = plt.subplots(2, 1, figsize=(12, 8))

# Plot ship progress
axes[0].plot(days, ship_progress, linewidth=2, color='blue')
axes[0].set_xlabel('Day')
axes[0].set_ylabel('Ship Progress (%)')
axes[0].set_title('Ship Building Progress Over Time')
axes[0].grid(True, alpha=0.3)
axes[0].axhline(100, color='green', linestyle='--', label='Completion')

# Plot living sailors
axes[1].plot(days, living_sailors, linewidth=2, color='red', marker='o', markersize=3)
axes[1].set_xlabel('Day')
axes[1].set_ylabel('Living Sailors')
axes[1].set_title('Sailor Survival Over Time')
axes[1].grid(True, alpha=0.3)
axes[1].set_ylim(0, 6)

plt.tight_layout()
plt.show()

print(f"✅ Visualization complete")

## 🔄 Training Loop (TODO)

This is where we'll implement the actual training:
1. Generate multiple episodes
2. Collect trajectories
3. Fine-tune LLM on successful strategies
4. Evaluate and iterate

In [None]:
def train(num_episodes: int = 10, save_dir: str = "./checkpoints"):
    """
    Training loop for Marooned agents.
    
    TODO: Implement actual training logic:
    - Collect trajectories from multiple episodes
    - Identify successful strategies
    - Fine-tune LLM or use RL
    - Track metrics
    """
    
    os.makedirs(save_dir, exist_ok=True)
    
    results = {
        'episodes': [],
        'win_rates': {'sailors': 0, 'traitor': 0},
        'avg_ship_progress': [],
        'avg_days_survived': [],
    }
    
    for episode in range(num_episodes):
        print(f"\n{'='*60}")
        print(f"Episode {episode + 1}/{num_episodes}")
        print('='*60)
        
        # Create fresh environment
        env = MaroonedEnv(seed=42 + episode)
        observations = env.reset()
        
        # Create agents
        agents = {}
        for sailor_id in env.agents:
            is_traitor = (sailor_id == env.state.traitor_id)
            agents[sailor_id] = LLMAgent(sailor_id, is_traitor)
        
        # Run episode
        history, winner, stats = run_episode(env, agents, max_turns=1000, verbose=False)
        
        # Record results
        results['episodes'].append({
            'episode': episode,
            'winner': winner,
            'ship_progress': env.state.ship_progress.total_percentage,
            'days': stats.total_days,
            'deaths': len(stats.deaths),
        })
        
        if winner == 'sailors':
            results['win_rates']['sailors'] += 1
        else:
            results['win_rates']['traitor'] += 1
        
        results['avg_ship_progress'].append(env.state.ship_progress.total_percentage)
        results['avg_days_survived'].append(stats.total_days)
        
        print(f"Winner: {winner}, Ship: {env.state.ship_progress.total_percentage}%, Days: {stats.total_days}")
    
    # Calculate final metrics
    results['win_rates']['sailors'] /= num_episodes
    results['win_rates']['traitor'] /= num_episodes
    results['avg_ship_progress'] = np.mean(results['avg_ship_progress'])
    results['avg_days_survived'] = np.mean(results['avg_days_survived'])
    
    # Save results
    with open(os.path.join(save_dir, 'training_results.json'), 'w') as f:
        json.dump(results, f, indent=2)
    
    return results


print("✅ Training function defined")
print("Run: results = train(num_episodes=10)")

## 💾 Save/Load Checkpoints

In [None]:
def save_checkpoint(agents: Dict[str, LLMAgent], path: str):
    """Save agent checkpoints."""
    # TODO: Save LLM fine-tuned weights or prompts
    checkpoint = {
        'agents': {sailor_id: agent.sailor_id for sailor_id, agent in agents.items()},
        'timestamp': datetime.now().isoformat(),
    }
    
    with open(path, 'w') as f:
        json.dump(checkpoint, f, indent=2)
    
    print(f"✅ Checkpoint saved to {path}")


def load_checkpoint(path: str) -> Dict[str, LLMAgent]:
    """Load agent checkpoints."""
    # TODO: Load LLM fine-tuned weights or prompts
    with open(path, 'r') as f:
        checkpoint = json.load(f)
    
    # Reconstruct agents
    agents = {}
    for sailor_id in checkpoint['agents'].keys():
        agents[sailor_id] = LLMAgent(sailor_id)
    
    print(f"✅ Checkpoint loaded from {path}")
    return agents


print("✅ Checkpoint functions defined")

## 🎯 Next Steps

**To complete this notebook:**

1. **Implement LLM calling** in `LLMAgent.get_action()`
   - Use OpenAI API, Anthropic, or local LLM
   - Parse responses into actions
   
2. **Add action parsing** in `parse_llm_response()`
   - Handle all action types
   - Validate parameters
   
3. **Implement training loop**
   - Collect successful trajectories
   - Fine-tune on good strategies
   - Use RL or imitation learning
   
4. **Add evaluation metrics**
   - Win rate by role
   - Average ship completion
   - Deception success rate
   - Evidence detection accuracy
   
5. **Create visualizations**
   - Game replay viewer
   - Evidence timeline
   - Communication networks
   - Resource flow diagrams

---

**Ready to train some pirate AIs! 🏴‍☠️**