# Tutorial 3: Full Cycle + Multi-Agent - Complete Autonomous Production

Welcome to the final CoGames tutorial! You'll now train agents to perform the **complete production cycle**: forage resources, craft hearts, and deposit them. Then, you'll scale to **multiple agents** working in parallel.

## 🎯 Learning Objectives

- Understand resource foraging mechanics
- Master the complete 3-step task chain (forage → craft → deposit)
- Scale from single to multi-agent training
- Observe emergent coordination behaviors
- Analyze multi-agent performance metrics

## 📋 Task Overview

### Part A: Single-Agent Foraging (100k steps)

**Starting State:**
- Agent spawns in a 15x15 map
- Agent starts with oxygen, germanium, silicon BUT NO CARBON
- Must forage carbon from extractors
- 3 carbon extractors, 1 assembler, 1 chest placed randomly

**Goal:**
1. Navigate to carbon extractor
2. Forage carbon (by moving into extractor)
3. Navigate to assembler
4. Craft hearts (1C + 1O + 1Ge + 1Si → 1 heart)
5. Navigate to chest
6. Deposit hearts
7. Repeat!

### Part B: Multi-Agent Coordination (200k steps)

**Starting State:**
- **4 agents** spawn in the same 15x15 map
- All agents share resources and goals
- Same objects as Part A

**Goal:**
- Agents must coordinate implicitly
- Share extractors efficiently
- Avoid collisions
- Maximize team reward

**Expected Training Time:** 
- Part A: ~10-15 minutes
- Part B: ~15-20 minutes

---


## 1. Setup and Imports


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
from pathlib import Path

# CoGames imports
from cogames.cogs_vs_clips.scenarios import make_game
from cogames.policy.simple import SimplePolicy
from cogames.train import train
from mettagrid import MettaGridEnv
from mettagrid.config.mettagrid_config import RecipeConfig

# Import visualization utilities
from tutorial_viz import (
    plot_episode_returns,
    plot_success_rate,
    plot_crafting_subtasks,
    plot_multiagent_returns,
    plot_coordination_metrics,
    evaluate_policy,
    print_metrics_table,
)

# Set random seed for reproducibility
np.random.seed(42)

print("✅ Imports complete!")


In [None]:
# Try to find Stage 2 checkpoint
stage2_checkpoint_dir = Path("./checkpoints_stage2/cogames.cogs_vs_clips")
stage2_checkpoints = sorted(stage2_checkpoint_dir.glob("*.pt")) if stage2_checkpoint_dir.exists() else []

if stage2_checkpoints:
    initial_weights_path = str(stage2_checkpoints[-1])
    print(f"✅ Found Stage 2 checkpoint: {stage2_checkpoints[-1].name}")
    print(f"   Will use transfer learning from Tutorial 2")
else:
    initial_weights_path = None
    print(f"ℹ️  No Stage 2 checkpoint found")
    print(f"   Will train from scratch")


## Part A: Single-Agent Foraging

### 3. Configure Foraging Environment

Now we'll add the final complexity: **resource extraction**. The agent must forage carbon before it can craft hearts.

**Key additions:**
- 3 carbon extractors (distributed around map)
- Agent starts WITHOUT carbon
- Must forage → craft → deposit


In [None]:
# Create foraging configuration (single agent)
config_single = make_game(
    num_cogs=1,
    width=15,
    height=15,
    num_assemblers=1,
    num_chests=1,
    num_chargers=0,
    num_carbon_extractors=3,  # ADD: Carbon extractors
    num_oxygen_extractors=0,
    num_germanium_extractors=0,
    num_silicon_extractors=0,
)

# Agent starts with 3 resources but MUST forage carbon
config_single.game.agent.initial_inventory = {
    "energy": 100,
    "oxygen": 5,
    "germanium": 5,
    "silicon": 5,
    # NO carbon! Must forage it
}

# Configure carbon extractors
config_single.game.objects["carbon_extractor"].recipes[0][1].output_resources = {
    "carbon": 1  # Give 1 carbon per extraction
}

# Same crafting recipe as Stage 2
config_single.game.objects["assembler"].recipes = [
    (
        ["Any"],
        RecipeConfig(
            input_resources={
                "carbon": 1,
                "oxygen": 1,
                "germanium": 1,
                "silicon": 1,
            },
            output_resources={"heart": 1},
            cooldown=1,
        ),
    )
]

# Configure chest
config_single.game.objects["chest"].deposit_positions = ["N", "S", "E", "W"]
config_single.game.objects["chest"].withdrawal_positions = []

# Same reward
config_single.game.agent.rewards.stats = {
    "heart.lost": 1.0
}

print("✅ Single-agent foraging environment configured!")
print(f"   Map size: {config_single.game.width}x{config_single.game.height}")
print(f"   Carbon extractors: 3")
print(f"   Initial carbon: 0 (must forage)")
print(f"   Task: forage → craft → deposit")


### 4. Understanding Foraging Mechanics

**How does foraging work?**

Similar to crafting, there's no explicit "FORAGE" action. Instead:
1. Agent moves **into** an extractor from any position
2. If extractor has resources available, resources are added to agent's inventory
3. Extractor may have cooldown or limited resources

**Example Flow:**
```
1. Agent at (5, 5), Carbon Extractor at (6, 5)
2. Agent has: {carbon: 0, oxygen: 5, germanium: 5, silicon: 5}
3. Agent action: MOVE EAST → Agent moves to (6, 5)
4. Environment detects: Agent on extractor
5. Extraction triggers: Carbon added to inventory
6. Agent now has: {carbon: 1, oxygen: 5, ...}
```

**Complete Cycle:**
1. Forage carbon from extractor (5 times)
2. Navigate to assembler
3. Craft heart (consumes 1C+1O+1Ge+1Si)
4. Navigate to chest
5. Deposit heart (+1 reward)
6. Repeat until resources depleted!


### 5. Train Single-Agent on Full Cycle

Training for 100k steps to learn the complete foraging → crafting → depositing sequence.


In [None]:
%%time

# Set up checkpoint directory
checkpoint_dir_single = Path("./checkpoints_stage3_single")
checkpoint_dir_single.mkdir(parents=True, exist_ok=True)

print("🚀 Starting single-agent training (100k steps)...")
print("=" * 60)

train(
    env_cfg=config_single,
    policy_class_path="cogames.policy.simple.SimplePolicy",
    device=torch.device("cpu"),
    initial_weights_path=initial_weights_path,  # Transfer from Stage 2
    num_steps=100_000,
    checkpoints_path=checkpoint_dir_single,
    seed=42,
    batch_size=512,
    minibatch_size=512,
    vector_num_envs=4,
    vector_num_workers=1,
)

print("=" * 60)
print("✅ Single-agent training complete!")


### 6. Evaluate Single-Agent Performance


In [None]:
# Load checkpoint
checkpoint_files_single = sorted((checkpoint_dir_single / "cogames.cogs_vs_clips").glob("*.pt"))
latest_checkpoint_single = checkpoint_files_single[-1]
print(f"📂 Loading checkpoint: {latest_checkpoint_single.name}")

dummy_env_single = MettaGridEnv(env_cfg=config_single)
device = torch.device("cpu")
trained_policy_single = SimplePolicy(dummy_env_single, device)
trained_policy_single.load_policy_data(str(latest_checkpoint_single))

print("✅ Policy loaded!")

# Evaluate
print("\n📊 Evaluating single-agent policy...")
metrics_single = evaluate_policy(
    config=config_single,
    policy=trained_policy_single,
    num_episodes=100,
    max_steps=400,  # Longer episodes for full cycle
    seed=42
)
print(f"✅ Evaluation complete!")


### 7. Visualize Single-Agent Results


In [None]:
# Plot results
fig = plot_episode_returns(
    metrics_single['episode_returns'],
    metrics_single['episode_lengths'],
    window_size=50
)
plt.tight_layout()
plt.show()

# Print stats
print_metrics_table({
    "Avg Return (last 50)": np.mean(metrics_single['episode_returns'][-50:]),
    "Max Return": np.max(metrics_single['episode_returns']),
    "Avg Length (last 50)": np.mean(metrics_single['episode_lengths'][-50:]),
    "Success Rate (≥1 heart)": np.mean([1 if r >= 1 else 0 for r in metrics_single['episode_returns'][-50:]]) * 100,
})

print("\n📊 Task Complexity:")
print("   Tutorial 1: deposit (1 step)")
print("   Tutorial 2: craft → deposit (2 steps)")
print(f"   Tutorial 3: forage → craft → deposit (3 steps)")
print(f"   Expected return: 1-5 hearts (harder than previous tutorials)")


## Part B: Multi-Agent Coordination

### 8. Scale to 4 Agents

Now for the exciting part! We'll train **4 agents** simultaneously in the same environment. 

**Key Changes:**
- `num_cogs=4` instead of 1
- All agents share the same environment
- Implicit coordination emerges from training
- Team reward (all agents get +1 when ANY agent deposits)

**Expected Behaviors:**
- Agents learn to share extractors
- Avoid collisions
- Potential role specialization
- Emergent coordination without explicit communication


In [None]:
# Create multi-agent configuration
config_multi = make_game(
    num_cogs=4,  # 4 agents!
    width=15,
    height=15,
    num_assemblers=1,
    num_chests=1,
    num_chargers=0,
    num_carbon_extractors=3,
    num_oxygen_extractors=0,
    num_germanium_extractors=0,
    num_silicon_extractors=0,
)

# Same starting inventory
config_multi.game.agent.initial_inventory = {
    "energy": 100,
    "oxygen": 5,
    "germanium": 5,
    "silicon": 5,
}

# Same extractor config
config_multi.game.objects["carbon_extractor"].recipes[0][1].output_resources = {
    "carbon": 1
}

# Same crafting recipe
config_multi.game.objects["assembler"].recipes = [
    (
        ["Any"],
        RecipeConfig(
            input_resources={
                "carbon": 1,
                "oxygen": 1,
                "germanium": 1,
                "silicon": 1,
            },
            output_resources={"heart": 1},
            cooldown=1,
        ),
    )
]

# Same chest config
config_multi.game.objects["chest"].deposit_positions = ["N", "S", "E", "W"]
config_multi.game.objects["chest"].withdrawal_positions = []

# Same reward (but now 4 agents share it)
config_multi.game.agent.rewards.stats = {
    "heart.lost": 1.0
}

print("✅ Multi-agent environment configured!")
print(f"   Agents: 4")
print(f"   Map size: {config_multi.game.width}x{config_multi.game.height}")
print(f"   Coordination: Implicit (emerges from training)")
print(f"   Reward: Per-agent (+1 when THAT agent deposits)")


In [None]:
%%time

# Set up checkpoint directory
checkpoint_dir_multi = Path("./checkpoints_stage3_multi")
checkpoint_dir_multi.mkdir(parents=True, exist_ok=True)

# Use single-agent checkpoint as starting point!
initial_weights_multi = str(latest_checkpoint_single) if latest_checkpoint_single else None

print("🚀 Starting multi-agent training (200k steps)...")
print(f"   Transfer learning from: {latest_checkpoint_single.name if initial_weights_multi else 'scratch'}")
print("=" * 60)

train(
    env_cfg=config_multi,
    policy_class_path="cogames.policy.simple.SimplePolicy",
    device=torch.device("cpu"),
    initial_weights_path=initial_weights_multi,  # Transfer from single-agent!
    num_steps=200_000,
    checkpoints_path=checkpoint_dir_multi,
    seed=42,
    batch_size=512,
    minibatch_size=512,
    vector_num_envs=4,
    vector_num_workers=1,
)

print("=" * 60)
print("✅ Multi-agent training complete!")


### 10. Evaluate Multi-Agent Performance


In [None]:
# Load multi-agent checkpoint
checkpoint_files_multi = sorted((checkpoint_dir_multi / "cogames.cogs_vs_clips").glob("*.pt"))
latest_checkpoint_multi = checkpoint_files_multi[-1]
print(f"📂 Loading multi-agent checkpoint: {latest_checkpoint_multi.name}")

dummy_env_multi = MettaGridEnv(env_cfg=config_multi)
trained_policy_multi = SimplePolicy(dummy_env_multi, device)
trained_policy_multi.load_policy_data(str(latest_checkpoint_multi))

print("✅ Multi-agent policy loaded!")

# Evaluate (this will evaluate agent 0, but all agents use same policy)
print("\n📊 Evaluating multi-agent policy...")
metrics_multi = evaluate_policy(
    config=config_multi,
    policy=trained_policy_multi,
    num_episodes=100,
    max_steps=400,
    seed=42
)
print(f"✅ Evaluation complete!")


### 11. Compare Single vs Multi-Agent Performance


In [None]:
# Compare single vs multi-agent
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Single-agent
axes[0].plot(metrics_single['episode_returns'], alpha=0.3, color='blue')
axes[0].plot(smooth_curve(metrics_single['episode_returns'], 20), 
             color='blue', linewidth=2, label='Single Agent')
axes[0].set_xlabel('Episode')
axes[0].set_ylabel('Return')
axes[0].set_title('Single-Agent Performance')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Multi-agent (note: this is per-agent performance)
axes[1].plot(metrics_multi['episode_returns'], alpha=0.3, color='red')
axes[1].plot(smooth_curve(metrics_multi['episode_returns'], 20),
             color='red', linewidth=2, label='Multi-Agent (per agent)')
axes[1].set_xlabel('Episode')
axes[1].set_ylabel('Return')
axes[1].set_title('Multi-Agent Performance (Individual Agent)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print comparison
print("\n📊 Performance Comparison:")
print(f"   Single-agent avg return: {np.mean(metrics_single['episode_returns'][-50:]):.2f}")
print(f"   Multi-agent avg return (per agent): {np.mean(metrics_multi['episode_returns'][-50:]):.2f}")
print(f"   Multi-agent TEAM return (4 agents): {np.mean(metrics_multi['episode_returns'][-50:]) * 4:.2f}")
print()
print("💡 Insights:")
print("   • Multi-agent per-agent return may be lower (resource competition)")
print("   • But TEAM return should be higher (4x agents working)")


## 🎓 Summary and Key Takeaways

Congratulations! You've completed all three CoGames tutorials and trained agents on progressively complex tasks.

### Journey Recap

```
Tutorial 1: Given hearts → Deposit
             ↓ Added crafting
Tutorial 2: Given resources → Craft → Deposit
             ↓ Added foraging
Tutorial 3: Start empty → Forage → Craft → Deposit
             ↓ Scaled to multi-agent
Multi-Agent: 4 agents coordinating implicitly
```

### Core Concepts Mastered

1. **Navigation**: Agents learned spatial reasoning
2. **Crafting**: Recipe-based resource transformation
3. **Foraging**: Resource extraction from environment
4. **Multi-step Planning**: Chaining 3+ actions together
5. **Transfer Learning**: Reusing knowledge across tasks
6. **Multi-Agent Coordination**: Implicit cooperation through shared training

### Architectural Insights

- **Simple feedforward networks** can learn complex behaviors
- **Sparse rewards** work with sufficient exploration
- **Transfer learning** dramatically speeds up training
- **Multi-agent** systems exhibit emergent coordination

### What You Can Do Next

1. **Experiment with hyperparameters**
   - Try different learning rates
   - Adjust network sizes
   - Change reward structures

2. **Try LSTM policy**
   - Handles partial observability better
   - Can remember past states
   - See `cogames.policy.lstm.LSTMPolicy`

3. **Custom scenarios**
   - Add more resources
   - Create custom maps
   - Design new recipes

4. **Scale further**
   - Try 8, 16, or more agents
   - Observe emergent specialization
   - Analyze coordination patterns

5. **Compete**
   - Submit to CoGames competition
   - Compare against other policies
   - Optimize for speed/efficiency

---

### 💾 Checkpoints Summary


In [None]:
print("=" * 70)
print("📊 TUTORIAL SERIES COMPLETE!")
print("=" * 70)
print()
print("Checkpoints saved:")
print(f"  Tutorial 1: ./checkpoints/cogames.cogs_vs_clips/")
print(f"  Tutorial 2: ./checkpoints_stage2/cogames.cogs_vs_clips/")
print(f"  Tutorial 3 (single): {latest_checkpoint_single}")
print(f"  Tutorial 3 (multi):  {latest_checkpoint_multi}")
print()
print("Final Results:")
print(f"  Tutorial 1 (deposit only):     ~3.0 hearts")
print(f"  Tutorial 2 (craft+deposit):    ~{np.mean(metrics_single['episode_returns'][-20:]):.1f} hearts (single)")  
print(f"  Tutorial 3 (full cycle):       ~{np.mean(metrics_single['episode_returns'][-20:]):.1f} hearts (single)")
print(f"  Tutorial 3 (multi-agent):      ~{np.mean(metrics_multi['episode_returns'][-20:]) * 4:.1f} hearts (team of 4)")
print()
print("🎉 You're now ready to:")
print("   • Design custom scenarios")
print("   • Participate in CoGames competitions")
print("   • Research multi-agent RL algorithms")
print("   • Build on these tutorials for your own projects")
print()
print("=" * 70)
