Skip to content

Add framework adapter system for LangGraph, Claude Agent SDK, and other agent frameworks #74

@justinmadison

Description

@justinmadison

Summary

Revised scope (Feb 2026): This issue covers Phase 1 of the framework adapter system — the adapter foundation and one working framework starter (Anthropic/Claude). LangGraph starter, query tools, and additional frameworks are tracked in separate follow-up issues.

Add a framework adapter system that lets developers use popular agent frameworks (Claude tool_use, LangGraph, etc.) to build agents that play Agent Arena scenarios. The first adapter and starter will use the Anthropic Python SDK with native tool_use.

Key framing: This is NOT "bring your existing agent and test it here." This IS "Want to learn Claude tool_use? Build an AI agent that plays a game." Each framework starter is a tutorial, not just a template.

Architecture: Three Categories

Game information falls into three categories with different costs:

1. Context (injected into prompt — free, always present)

The observation is formatted into the LLM prompt automatically. No tool call needed:

Position: (5.2, 0, 8.1)  |  Health: 80  |  Energy: 100
Nearby: berry at (10,0,5) dist=4.2, fire at (2,0,1) dist=3.0
Inventory: wood x2, stone x1  |  Explored: 45%
Objective: collect 10 resources (current: 4/10)

2. Action Tools (sent to Godot — costs a tick, pick exactly ONE)

The agent's final decision. Ends the turn:

move_to(target_position=[10, 0, 5])
collect(target="berry")
craft_item(recipe="torch")
explore(direction="north")
idle()

3. Query Tools (future — see follow-up issues)

Optional tools the agent can call to dig deeper before deciding (e.g., query_spatial_memory, recall_location). Not in scope for this issue — tracked in #85 and #86.

Per-Tick Flow (Phase 1)

Godot sends observation
        |
        v
  CONTEXT: Observation formatted into prompt (free)
        |
        v
  ACTION: LLM calls exactly one action tool (costs a tick)
    -> move_to([10, 0, 5])
        |
        v
  Adapter captures action, returns as Decision to Godot

How It Runs

The framework runs inside the Python process alongside the SDK:

+------------------------------------------+
|  PYTHON PROCESS (same as today)          |
|                                          |
|  Framework Agent (Anthropic SDK)         |
|       |                                  |
|       | reads context (observation)      |
|       | returns one action tool call     |
|       v                                  |
|  Agent Arena SDK (AgentArena.run())      |
|       |                                  |
+-------+----------------------------------+
        | HTTP IPC (unchanged)
+-------+----------------------------------+
|  GODOT (unchanged)                       |
|  Scenes, physics, tool execution         |
+------------------------------------------+

Deliverables

1. Adapter ABC (python/sdk/agent_arena_sdk/adapters/base.py)

class FrameworkAdapter(ABC):
    @abstractmethod
    def decide(self, observation: Observation) -> Decision:
        """Run framework agent loop, return one action."""
        ...

    def format_observation(self, observation: Observation) -> str:
        """Convert observation to prompt text. Shared default implementation."""
        ...

    def get_action_tools(self) -> list[ToolSchema]:
        """Canonical action tool definitions. Shared."""
        ...
  • format_observation() — extracted from existing starters/llm/agent.py:_build_prompt()
  • get_action_tools() — canonical set: move_to, collect, craft_item, idle, explore
  • Support both sync decide() and async adapters

2. AgentArena integration

AgentArena.run() accepts either a bare Callable[[Observation], Decision] (existing) or a FrameworkAdapter instance (new):

# Existing (still works)
arena.run(my_decide_function)

# New
adapter = AnthropicAdapter(model="claude-sonnet-4-20250514")
arena.run(adapter)

3. Anthropic/Claude starter (starters/claude/)

Tutorial-quality starter teaching Anthropic tool_use by building a foraging agent:

File Purpose
agent.py Anthropic adapter + agent logic with tutorial comments
run.py One-command startup
requirements.txt agent-arena-sdk, anthropic
README.md Learning goals, setup, how to modify, debugging

Teaches: Anthropic Messages API, tool definitions, tool_use responses, tool_result messages, multi-turn tool loops, system prompts.

4. Canonical action tool definitions

Standard tool schemas in the SDK/adapter base (not duplicated per-starter):

  • move_to — target_position: [x, y, z]
  • collect — target_name: str
  • craft_item — recipe: str
  • idle — no params
  • explore — direction: str (optional)

5. Tests

  • Unit tests for adapter base (observation formatting, tool schema generation)
  • Unit tests for Anthropic adapter (mock API responses)
  • Integration test with hand-built Observations (no Godot needed)

What's NOT in scope (follow-up issues)

Item Issue
LangGraph starter #84
Query tools (spatial memory, episode memory) #86
Migrate SpatialMemory to SDK #85
OpenAI / CrewAI starters Future
Inspector refactor (game-side only) #75

Design Decisions

Adapter base is thin. It provides observation formatting and tool schemas. Each starter owns its prompt engineering, error handling, and fallback logic — that's the learning content. The base class shouldn't hide the interesting parts.

"Claude SDK" → "Anthropic" naming. The starter uses the anthropic Python package directly with client.messages.create(tools=...). There is no separate "Claude Agent SDK" product. The starter folder is starters/claude/.

No query tools in v1. SpatialMemory and EpisodeMemory haven't been migrated to the SDK yet. Context + action tools is sufficient for a working foraging agent. Query tools add depth but aren't required.

Dependencies

Soft dependencies (nice to have, don't block):

Success Criteria

  • A developer can pip install anthropic, set ANTHROPIC_API_KEY, and have a Claude-powered agent playing foraging in under 15 minutes
  • The starter teaches Anthropic tool_use concepts through commented, tutorial-quality code
  • The adapter base class is reusable for LangGraph and future frameworks
  • Existing AgentArena.run(callback) still works unchanged
  • Observation formatting is shared (not duplicated per starter)
  • All tests pass

Context

See docs/framework_integration_strategy.md for the full strategic discussion. Original issue scope was trimmed after analysis revealed query tools need SpatialMemory migration (#85) and the scope was too large for a single PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions