# 🏴‍☠️ MAROONED - Phase 6: LLM Policy Demo

**Goal:** Connect Language Models to the Marooned environment

This notebook demonstrates:
1. **Observation → Prompt** conversion
2. **LLM Response → Action** parsing  
3. **Self-play loop** with LLM agents
4. Integration with the environment

---

## Setup & Imports

In [1]:
import sys
sys.path.append('../marooned_env')

from environment import MaroonedEnv
from models import Observation, Action, Position
from llm_interface import (
    observation_to_prompt, 
    parse_llm_response, 
    parse_action_safe,
    validate_action
)
from config import MapLevel, ActionType, ResourceType, SailorRole
import json

## 1. Test Observation → Prompt Conversion

First, let's see how observations are converted to natural language prompts.

In [2]:
# Create environment and get initial observations
env = MaroonedEnv()
observations = env.reset()

# Get first sailor's observation
active_sailor = list(observations.keys())[0]
obs = observations[active_sailor]
sailor_role = env.state.sailors[active_sailor].role.value

print("✅ Environment reset complete")
print(f"📍 Active sailor: {active_sailor}")
print(f"🎭 Sailor role: {sailor_role}")
print(f"👥 All sailors: {list(observations.keys())}")

✅ Environment reset complete
📍 Active sailor: Alice
🎭 Sailor role: traitor
👥 All sailors: ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve']


### 1.1 Colonist Perspective

In [3]:
# Generate prompt for colonist
prompt_colonist = observation_to_prompt(obs, include_role=True, sailor_role="colonist")

print("=" * 80)
print("COLONIST PROMPT (first 2000 chars):")
print("=" * 80)
print(prompt_colonist[:2000])
print("...")
print("(Total length: {} characters)".format(len(prompt_colonist)))

COLONIST PROMPT (first 2000 chars):
DAY 1, TURN 1/100 - MORNING PHASE

⚓ YOUR ROLE: COLONIST

COLONIST OBJECTIVES:
  - Build the ship to 100% before Day 100
  - Identify and eliminate the traitor
  - Gather resources efficiently
  - Share information with team
  - Survive and protect teammates


PHASE CONTEXT:
  Location: All sailors at BASE CAMP
  Allowed: Planning, discussions, voting (if called)
  Restricted: Cannot explore or gather resources yet

YOUR STATUS (Alice):
  Position: (15, 15, <MapLevel.GROUND: 0>)
  Energy: 100/100 ⚡⚡⚡⚡⚡
  Health: healthy
  Backpack: 0/20 items
    (empty)

WHAT YOU SEE (within 5 tiles):
  Resources:
    - WOOD_26 (wood) at (16, 18, <MapLevel.GROUND: 0>) - 1 units [4 tiles away]
    - WOOD_29 (wood) at (18, 13, <MapLevel.GROUND: 0>) - 1 units [5 tiles away]
    - WOOD_41 (wood) at (11, 16, <MapLevel.GROUND: 0>) - 1 units [5 tiles away]
    - METAL_58 (metal) at (12, 14, <MapLevel.GROUND: 0>) - 1 units [4 tiles away]
    - METAL_79 (metal) at (18, 16, <

### 1.2 Traitor Perspective

If this sailor were the traitor, they'd see enhanced information.

In [4]:
# Generate prompt for traitor (hypothetically)
prompt_traitor = observation_to_prompt(obs, include_role=True, sailor_role="traitor")

print("=" * 80)
print("TRAITOR PROMPT (first 2000 chars):")
print("=" * 80)
print(prompt_traitor[:2000])
print("...")
print("\n🎭 Traitor gets enhanced vision of all sailor positions!")

TRAITOR PROMPT (first 2000 chars):
DAY 1, TURN 1/100 - MORNING PHASE

🎭 YOUR SECRET ROLE: TRAITOR

TRAITOR OBJECTIVES:
  - Prevent the ship from being completed before Day 100
  - Eliminate colonists (reduce to < 3 alive)
  - Use poison, sabotage, and deception
  - Avoid getting voted out
  - Lie convincingly to deflect suspicion


PHASE CONTEXT:
  Location: All sailors at BASE CAMP
  Allowed: Planning, discussions, voting (if called)
  Restricted: Cannot explore or gather resources yet

YOUR STATUS (Alice):
  Position: (15, 15, <MapLevel.GROUND: 0>)
  Energy: 100/100 ⚡⚡⚡⚡⚡
  Health: healthy
  Backpack: 0/20 items
    (empty)

WHAT YOU SEE (within 5 tiles):
  Resources:
    - WOOD_26 (wood) at (16, 18, <MapLevel.GROUND: 0>) - 1 units [4 tiles away]
    - WOOD_29 (wood) at (18, 13, <MapLevel.GROUND: 0>) - 1 units [5 tiles away]
    - WOOD_41 (wood) at (11, 16, <MapLevel.GROUND: 0>) - 1 units [5 tiles away]
    - METAL_58 (metal) at (12, 14, <MapLevel.GROUND: 0>) - 1 units [4 tiles away]

## 2. Test LLM Response → Action Parsing

Now let's test parsing various LLM responses into Action objects.

### 2.1 Movement Actions

In [5]:
test_responses = {
    "Move North": """
ACTION: MOVE NORTH 5
REASONING: Moving toward the northern forest to gather wood
MESSAGE: "Heading north to find wood"
""",
    
    "Move to Cave": """
ACTION: MOVE DOWN
REASONING: Going down to the cave level to find metal
MESSAGE: "Descending to caves"
""",
    
    "Move East": """
ACTION: MOVE EAST 3
REASONING: Moving closer to the ship site
MESSAGE: ""
"""
}

current_pos = obs.position

print("Testing MOVEMENT action parsing:\n")
for name, response in test_responses.items():
    action, error = parse_llm_response(response, active_sailor, current_pos)
    if action:
        print(f"✅ {name}: {action}")
        print(f"   Target: {action.target_position.to_tuple() if action.target_position else 'N/A'}")
    else:
        print(f"❌ {name}: {error}")
    print()

Testing MOVEMENT action parsing:

✅ Move North: Alice: move_north -> (15, 10, <MapLevel.GROUND: 0>) "Heading north to find wood..."
   Target: (15, 10, <MapLevel.GROUND: 0>)

✅ Move to Cave: Alice: climb_down -> (15, 15, <MapLevel.CAVE: -1>) "Descending to caves..."
   Target: (15, 15, <MapLevel.CAVE: -1>)

✅ Move East: Alice: move_east -> (18, 15, <MapLevel.GROUND: 0>)
   Target: (18, 15, <MapLevel.GROUND: 0>)



### 2.2 Resource Gathering Actions

In [6]:
gather_responses = {
    "Gather Wood": """
ACTION: GATHER WOOD_001
REASONING: This wood pile is closest and we need wood for the hull
MESSAGE: "Gathering wood from the northern cluster"
""",
    
    "Gather Metal": """
ACTION: GATHER METAL_005
REASONING: Metal is needed for ship components
MESSAGE: "Collecting metal scraps"
"""
}

print("Testing GATHERING action parsing:\n")
for name, response in gather_responses.items():
    action, error = parse_llm_response(response, active_sailor, current_pos)
    if action:
        print(f"✅ {name}: {action}")
        print(f"   Resource ID: {action.target_resource_id}")
        print(f"   Message: {action.message_content}")
    else:
        print(f"❌ {name}: {error}")
    print()

Testing GATHERING action parsing:

✅ Gather Wood: Alice: gather_resource "Gathering wood from the northern cluster..."
   Resource ID: WOOD_001
   Message: Gathering wood from the northern cluster

✅ Gather Metal: Alice: gather_resource "Collecting metal scraps..."
   Resource ID: METAL_005
   Message: Collecting metal scraps



### 2.3 Ship Building Actions

In [7]:
build_responses = {
    "Build Hull": """
ACTION: BUILD hull
REASONING: We have enough wood and metal, let's start the foundation
MESSAGE: "Starting hull construction with the team"
""",
    
    "Build Mast": """
ACTION: BUILD mast
REASONING: Hull is complete, time for the mast
MESSAGE: "Working on the mast"
"""
}

print("Testing BUILDING action parsing:\n")
for name, response in build_responses.items():
    action, error = parse_llm_response(response, active_sailor, current_pos)
    if action:
        print(f"✅ {name}: {action}")
        print(f"   Component: {action.ship_component.value if action.ship_component else 'N/A'}")
    else:
        print(f"❌ {name}: {error}")
    print()

Testing BUILDING action parsing:

✅ Build Hull: Alice: build_ship "Starting hull construction with the team..."
   Component: hull

✅ Build Mast: Alice: build_ship "Working on the mast..."
   Component: mast



### 2.4 Communication & Voting Actions

In [8]:
social_responses = {
    "Send Message": """
ACTION: SAY Hello everyone, I found wood at coordinates (15, 20)!
REASONING: Sharing resource location with the team
MESSAGE: ""
""",
    
    "Call Vote": """
ACTION: CALL_VOTE
REASONING: Bob has too much suspicious evidence against him
MESSAGE: "I think we need to vote on Bob"
""",
    
    "Cast Vote": """
ACTION: VOTE Eve
REASONING: Evidence shows she collected poison and Bob died of poisoning
MESSAGE: "Voting for Eve based on the poison evidence"
"""
}

print("Testing SOCIAL action parsing:\n")
for name, response in social_responses.items():
    action, error = parse_llm_response(response, active_sailor, current_pos)
    if action:
        print(f"✅ {name}: {action}")
        if action.action_type == ActionType.VOTE:
            print(f"   Vote target: {action.vote_target}")
        elif action.action_type == ActionType.SEND_MESSAGE:
            print(f"   Message: {action.message_content}")
    else:
        print(f"❌ {name}: {error}")
    print()

Testing SOCIAL action parsing:

✅ Send Message: Alice: send_message "Hello everyone, I found wood at coordinates (15, 2..."
   Message: Hello everyone, I found wood at coordinates (15, 20)!

✅ Call Vote: Alice: call_vote "I think we need to vote on Bob..."

✅ Cast Vote: Alice: vote "Voting for Eve based on the poison evidence..."
   Vote target: Eve



### 2.5 Traitor-Specific Actions

In [9]:
traitor_responses = {
    "Sabotage Hull": """
ACTION: SABOTAGE hull
REASONING: Damage their progress to delay ship completion
MESSAGE: "I'll work on reinforcing the hull..."
""",
    
    "Poison Sailor": """
ACTION: POISON Charlie
REASONING: Charlie is getting too suspicious, need to eliminate him
MESSAGE: "Here Charlie, have some food"
"""
}

print("Testing TRAITOR action parsing:\n")
for name, response in traitor_responses.items():
    action, error = parse_llm_response(response, active_sailor, current_pos)
    if action:
        print(f"✅ {name}: {action}")
        if action.target_sailor:
            print(f"   Target sailor: {action.target_sailor}")
        if action.ship_component:
            print(f"   Target component: {action.ship_component.value}")
    else:
        print(f"❌ {name}: {error}")
    print()

Testing TRAITOR action parsing:

✅ Sabotage Hull: Alice: sabotage_ship "I'll work on reinforcing the hull......"
   Target component: hull

✅ Poison Sailor: Alice: offer_food (target: Charlie) "Here Charlie, have some food..."
   Target sailor: Charlie



### 2.6 Error Handling & Fallback

Test the safe parsing that falls back to WAIT on errors.

In [10]:
bad_responses = [
    "This is completely wrong formatting",
    "ACTION: INVALID_COMMAND\nREASONING: This won't work",
    "No action field at all",
    "ACTION: MOVE\nREASONING: Missing direction",
]

print("Testing ERROR HANDLING (should fallback to WAIT):\n")
for i, response in enumerate(bad_responses):
    print(f"Test {i+1}: {response[:50]}...")
    action = parse_action_safe(response, active_sailor, current_pos)
    print(f"  → Result: {action.action_type.value}")
    print(f"  → Message: {action.message_content}\n")

Testing ERROR HANDLING (should fallback to WAIT):

Test 1: This is completely wrong formatting...
⚠️  Action parsing failed: No ACTION field found in response
⚠️  Defaulting to WAIT action
  → Result: wait
  → Message: [Parse error: No ACTION field found in response]

Test 2: ACTION: INVALID_COMMAND
REASONING: This won't work...
⚠️  Action parsing failed: Unknown command: INVALID_COMMAND
⚠️  Defaulting to WAIT action
  → Result: wait
  → Message: [Parse error: Unknown command: INVALID_COMMAND]

Test 3: No action field at all...
⚠️  Action parsing failed: No ACTION field found in response
⚠️  Defaulting to WAIT action
  → Result: wait
  → Message: [Parse error: No ACTION field found in response]

Test 4: ACTION: MOVE
REASONING: Missing direction...
⚠️  Action parsing failed: MOVE requires direction (NORTH/SOUTH/EAST/WEST/UP/DOWN)
⚠️  Defaulting to WAIT action
  → Result: wait
  → Message: [Parse error: MOVE requires direction (NORTH/SOUTH/EAST/WEST/UP/DOWN)]



## 3. Action Validation

Test if actions are valid given the current game state.

In [11]:
# Create some test actions
test_action_move = Action(
    sailor_id=active_sailor,
    action_type=ActionType.MOVE_NORTH,
    target_position=Position(15, 10, MapLevel.GROUND)
)

test_action_vote = Action(
    sailor_id=active_sailor,
    action_type=ActionType.VOTE,
    vote_target="Bob"
)

test_action_wait = Action(
    sailor_id=active_sailor,
    action_type=ActionType.WAIT
)

print("Testing ACTION VALIDATION:\n")

# Test movement (should be valid or invalid depending on phase)
valid, msg = validate_action(test_action_move, obs)
print(f"MOVE in {obs.phase} phase: {'✅ Valid' if valid else '❌ Invalid - ' + msg}")

# Test voting (should be invalid outside discussion phase)
valid, msg = validate_action(test_action_vote, obs)
print(f"VOTE in {obs.phase} phase: {'✅ Valid' if valid else '❌ Invalid - ' + msg}")

# Test wait (always valid)
valid, msg = validate_action(test_action_wait, obs)
print(f"WAIT in {obs.phase} phase: {'✅ Valid' if valid else '❌ Invalid - ' + msg}")

Testing ACTION VALIDATION:

MOVE in morning phase: ❌ Invalid - Action move_north not allowed in morning phase
VOTE in morning phase: ❌ Invalid - No active voting session
WAIT in morning phase: ✅ Valid


## 4. Simple Self-Play Loop (Mock LLM)

Let's create a simple mock LLM and run a few turns to demonstrate the integration.

### 4.1 Mock LLM Agent

This simulates an LLM by returning scripted responses based on the game state.

In [12]:
class MockLLMAgent:
    """Simulates LLM responses for testing"""
    
    def __init__(self, strategy="explorer"):
        self.strategy = strategy
        self.turn_count = 0
    
    def generate_response(self, prompt: str, observation: Observation) -> str:
        """Generate a mock response based on simple rules"""
        self.turn_count += 1
        
        # Simple strategy: explore and gather
        if self.strategy == "explorer":
            # Look for nearby resources
            if observation.spatial_view.visible_resources:
                resource = observation.spatial_view.visible_resources[0]
                return f"""
ACTION: GATHER {resource.resource_id}
REASONING: Found {resource.resource_type.value} nearby, gathering for the team
MESSAGE: "Gathering {resource.resource_type.value} at {resource.position.to_tuple()}"
"""
            else:
                # Move randomly to explore
                import random
                directions = ["NORTH", "SOUTH", "EAST", "WEST"]
                direction = random.choice(directions)
                return f"""
ACTION: MOVE {direction} 2
REASONING: Exploring to find resources
MESSAGE: "Moving {direction.lower()} to explore"
"""
        
        elif self.strategy == "builder":
            # Focus on depositing and building
            if observation.backpack:
                return f"""
ACTION: DEPOSIT {observation.backpack[0].resource_type.value} 1
REASONING: Depositing resources to common inventory
MESSAGE: "Adding resources to the stockpile"
"""
            else:
                return """
ACTION: WAIT
REASONING: No resources to deposit, conserving energy
MESSAGE: "Resting at camp"
"""
        
        else:
            return """
ACTION: WAIT
REASONING: Default action
MESSAGE: ""
"""

print("✅ Mock LLM Agent created")

✅ Mock LLM Agent created


### 4.2 Run Self-Play Loop

Execute several turns with the mock LLM making decisions.

In [13]:
# Reset environment
env = MaroonedEnv()
observations = env.reset()

# Create mock agents for each sailor
agents = {
    sailor_id: MockLLMAgent(strategy="explorer")
    for sailor_id in observations.keys()
}

print("🎮 Starting self-play loop with Mock LLM agents...\n")
print("=" * 80)

# Run for 10 turns
for turn in range(10):
    # Get active sailor (cycle through all sailors)
    sailor_ids = list(observations.keys())
    active_sailor = sailor_ids[turn % len(sailor_ids)]
    obs = observations[active_sailor]
    sailor_role = env.state.sailors[active_sailor].role.value
    
    print(f"\n🔄 TURN {turn + 1}")
    print(f"   Active: {active_sailor} ({sailor_role})")
    print(f"   Day: {obs.day}, Turn: {obs.turn}, Phase: {obs.phase}")
    print(f"   Position: {obs.position.to_tuple()}, Energy: {obs.energy}")
    
    # Generate prompt
    prompt = observation_to_prompt(obs, include_role=True, sailor_role=sailor_role)
    
    # Get LLM response (mocked)
    agent = agents[active_sailor]
    llm_response = agent.generate_response(prompt, obs)
    
    # Parse response to action
    action = parse_action_safe(llm_response, active_sailor, obs.position)
    
    print(f"   Action: {action.action_type.value}", end="")
    if action.target_resource_id:
        print(f" (resource: {action.target_resource_id})", end="")
    if action.target_position:
        print(f" (to: {action.target_position.to_tuple()})", end="")
    print()
    
    if action.message_content:
        print(f"   💬 \"{action.message_content}\"")
    
    # Validate action
    is_valid, validation_msg = validate_action(action, obs)
    if not is_valid:
        print(f"   ⚠️  Validation warning: {validation_msg}")
    
    # Execute action (all sailors take actions, but we only control active_sailor)
    actions_dict = {sid: Action(sailor_id=sid, action_type=ActionType.WAIT) for sid in sailor_ids}
    actions_dict[active_sailor] = action
    
    observations, rewards, dones, truncated, info = env.step(actions_dict)
    
    # Show result
    reward = rewards.get(active_sailor, 0)
    if reward != 0:
        print(f"   🎁 Reward: {reward:+.2f}")
    
    if dones.get(active_sailor, False):
        print(f"\n🏁 Game ended!")
        print(f"   Reason: {info.get(active_sailor, {}).get('termination_reason', 'Unknown')}")
        break
    
    print("-" * 80)

print("\n✅ Self-play loop completed")

🎮 Starting self-play loop with Mock LLM agents...


🔄 TURN 1
   Active: Alice (honest)
   Day: 1, Turn: 1, Phase: morning
   Position: (15, 15, <MapLevel.GROUND: 0>), Energy: 100
   Action: gather_resource (resource: WOOD_27)
   💬 "Gathering wood at (12, 13, <MapLevel.GROUND: 0>)"
   🎁 Reward: +0.04
--------------------------------------------------------------------------------

🔄 TURN 2
   Active: Bob (honest)
   Day: 1, Turn: 2, Phase: morning
   Position: (15, 15, <MapLevel.GROUND: 0>), Energy: 100
   Action: gather_resource (resource: WOOD_27)
   💬 "Gathering wood at (12, 13, <MapLevel.GROUND: 0>)"
   🎁 Reward: +0.04
--------------------------------------------------------------------------------

🔄 TURN 3
   Active: Charlie (traitor)
   Day: 1, Turn: 3, Phase: morning
   Position: (15, 15, <MapLevel.GROUND: 0>), Energy: 100
   Action: gather_resource (resource: WOOD_27)
   💬 "Gathering wood at (12, 13, <MapLevel.GROUND: 0>)"
   🎁 Reward: -0.01
------------------------------------

## 5. Full Observation → Action Pipeline

Demonstrate the complete pipeline with a detailed walkthrough.

In [14]:
# Reset for clean state
env = MaroonedEnv()
observations = env.reset()

# Get first sailor
active_sailor = list(observations.keys())[0]
obs = observations[active_sailor]
sailor_role = env.state.sailors[active_sailor].role.value

print("📊 COMPLETE PIPELINE DEMONSTRATION")
print("=" * 80)
print()

# Step 1: Observation
print("STEP 1: Current Observation")
print("-" * 80)
print(f"Sailor: {obs.sailor_id}")
print(f"Position: {obs.position.to_tuple()}")
print(f"Energy: {obs.energy}/100")
print(f"Phase: {obs.phase}")
print(f"Visible resources: {len(obs.spatial_view.visible_resources)}")
print()

# Step 2: Generate Prompt
print("STEP 2: Generate LLM Prompt")
print("-" * 80)
prompt = observation_to_prompt(obs, include_role=True, sailor_role=sailor_role)
print(f"Prompt length: {len(prompt)} characters")
print(f"First 500 chars:\n{prompt[:500]}...")
print()

# Step 3: Simulate LLM Response
print("STEP 3: LLM Response (simulated)")
print("-" * 80)
mock_llm_response = """
ACTION: MOVE NORTH 3
REASONING: Moving toward visible resources in the northern area
MESSAGE: "Heading north to explore the forest"
"""
print(mock_llm_response)

# Step 4: Parse to Action
print("STEP 4: Parse Response to Action Object")
print("-" * 80)
action, error = parse_llm_response(mock_llm_response, active_sailor, obs.position)
if action:
    print(f"✅ Successfully parsed:")
    print(f"   Type: {action.action_type.value}")
    print(f"   Sailor: {action.sailor_id}")
    print(f"   Target: {action.target_position.to_tuple() if action.target_position else 'N/A'}")
    print(f"   Message: {action.message_content}")
else:
    print(f"❌ Parse error: {error}")
print()

# Step 5: Validate Action
print("STEP 5: Validate Action")
print("-" * 80)
is_valid, validation_msg = validate_action(action, obs)
print(f"Valid: {'✅ Yes' if is_valid else '❌ No'}")
if not is_valid:
    print(f"Reason: {validation_msg}")
print()

# Step 6: Execute in Environment
print("STEP 6: Execute Action in Environment")
print("-" * 80)

# Create actions dict for all sailors (others wait)
actions_dict = {sid: Action(sailor_id=sid, action_type=ActionType.WAIT) for sid in observations.keys()}
actions_dict[active_sailor] = action

observations, rewards, dones, truncated, info = env.step(actions_dict)
new_obs = observations[active_sailor]
reward = rewards.get(active_sailor, 0)
done = dones.get(active_sailor, False)

print(f"New position: {new_obs.position.to_tuple()}")
print(f"Energy change: {obs.energy} → {new_obs.energy}")
print(f"Reward: {reward:+.2f}")
print(f"Game done: {done}")
print()

print("=" * 80)
print("✅ Pipeline demonstration complete!")

📊 COMPLETE PIPELINE DEMONSTRATION

STEP 1: Current Observation
--------------------------------------------------------------------------------
Sailor: Alice
Position: (15, 15, <MapLevel.GROUND: 0>)
Energy: 100/100
Phase: morning
Visible resources: 18

STEP 2: Generate LLM Prompt
--------------------------------------------------------------------------------
Prompt length: 6123 characters
First 500 chars:
DAY 1, TURN 1/100 - MORNING PHASE

⚓ YOUR ROLE: COLONIST

COLONIST OBJECTIVES:
  - Build the ship to 100% before Day 100
  - Identify and eliminate the traitor
  - Gather resources efficiently
  - Share information with team
  - Survive and protect teammates


PHASE CONTEXT:
  Location: All sailors at BASE CAMP
  Allowed: Planning, disc...

STEP 3: LLM Response (simulated)
--------------------------------------------------------------------------------

ACTION: MOVE NORTH 3
REASONING: Moving toward visible resources in the northern area
MESSAGE: "Heading north to explore the forest"
