-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Revised scope (Feb 2026): This issue covers Phase 1 of the framework adapter system — the adapter foundation and one working framework starter (Anthropic/Claude). LangGraph starter, query tools, and additional frameworks are tracked in separate follow-up issues.
Add a framework adapter system that lets developers use popular agent frameworks (Claude tool_use, LangGraph, etc.) to build agents that play Agent Arena scenarios. The first adapter and starter will use the Anthropic Python SDK with native tool_use.
Key framing: This is NOT "bring your existing agent and test it here." This IS "Want to learn Claude tool_use? Build an AI agent that plays a game." Each framework starter is a tutorial, not just a template.
Architecture: Three Categories
Game information falls into three categories with different costs:
1. Context (injected into prompt — free, always present)
The observation is formatted into the LLM prompt automatically. No tool call needed:
Position: (5.2, 0, 8.1) | Health: 80 | Energy: 100
Nearby: berry at (10,0,5) dist=4.2, fire at (2,0,1) dist=3.0
Inventory: wood x2, stone x1 | Explored: 45%
Objective: collect 10 resources (current: 4/10)
2. Action Tools (sent to Godot — costs a tick, pick exactly ONE)
The agent's final decision. Ends the turn:
move_to(target_position=[10, 0, 5])
collect(target="berry")
craft_item(recipe="torch")
explore(direction="north")
idle()3. Query Tools (future — see follow-up issues)
Optional tools the agent can call to dig deeper before deciding (e.g., query_spatial_memory, recall_location). Not in scope for this issue — tracked in #85 and #86.
Per-Tick Flow (Phase 1)
Godot sends observation
|
v
CONTEXT: Observation formatted into prompt (free)
|
v
ACTION: LLM calls exactly one action tool (costs a tick)
-> move_to([10, 0, 5])
|
v
Adapter captures action, returns as Decision to Godot
How It Runs
The framework runs inside the Python process alongside the SDK:
+------------------------------------------+
| PYTHON PROCESS (same as today) |
| |
| Framework Agent (Anthropic SDK) |
| | |
| | reads context (observation) |
| | returns one action tool call |
| v |
| Agent Arena SDK (AgentArena.run()) |
| | |
+-------+----------------------------------+
| HTTP IPC (unchanged)
+-------+----------------------------------+
| GODOT (unchanged) |
| Scenes, physics, tool execution |
+------------------------------------------+
Deliverables
1. Adapter ABC (python/sdk/agent_arena_sdk/adapters/base.py)
class FrameworkAdapter(ABC):
@abstractmethod
def decide(self, observation: Observation) -> Decision:
"""Run framework agent loop, return one action."""
...
def format_observation(self, observation: Observation) -> str:
"""Convert observation to prompt text. Shared default implementation."""
...
def get_action_tools(self) -> list[ToolSchema]:
"""Canonical action tool definitions. Shared."""
...format_observation()— extracted from existingstarters/llm/agent.py:_build_prompt()get_action_tools()— canonical set: move_to, collect, craft_item, idle, explore- Support both sync
decide()and async adapters
2. AgentArena integration
AgentArena.run() accepts either a bare Callable[[Observation], Decision] (existing) or a FrameworkAdapter instance (new):
# Existing (still works)
arena.run(my_decide_function)
# New
adapter = AnthropicAdapter(model="claude-sonnet-4-20250514")
arena.run(adapter)3. Anthropic/Claude starter (starters/claude/)
Tutorial-quality starter teaching Anthropic tool_use by building a foraging agent:
| File | Purpose |
|---|---|
agent.py |
Anthropic adapter + agent logic with tutorial comments |
run.py |
One-command startup |
requirements.txt |
agent-arena-sdk, anthropic |
README.md |
Learning goals, setup, how to modify, debugging |
Teaches: Anthropic Messages API, tool definitions, tool_use responses, tool_result messages, multi-turn tool loops, system prompts.
4. Canonical action tool definitions
Standard tool schemas in the SDK/adapter base (not duplicated per-starter):
move_to— target_position: [x, y, z]collect— target_name: strcraft_item— recipe: stridle— no paramsexplore— direction: str (optional)
5. Tests
- Unit tests for adapter base (observation formatting, tool schema generation)
- Unit tests for Anthropic adapter (mock API responses)
- Integration test with hand-built Observations (no Godot needed)
What's NOT in scope (follow-up issues)
| Item | Issue |
|---|---|
| LangGraph starter | #84 |
| Query tools (spatial memory, episode memory) | #86 |
| Migrate SpatialMemory to SDK | #85 |
| OpenAI / CrewAI starters | Future |
| Inspector refactor (game-side only) | #75 |
Design Decisions
Adapter base is thin. It provides observation formatting and tool schemas. Each starter owns its prompt engineering, error handling, and fallback logic — that's the learning content. The base class shouldn't hide the interesting parts.
"Claude SDK" → "Anthropic" naming. The starter uses the anthropic Python package directly with client.messages.create(tools=...). There is no separate "Claude Agent SDK" product. The starter folder is starters/claude/.
No query tools in v1. SpatialMemory and EpisodeMemory haven't been migrated to the SDK yet. Context + action tools is sufficient for a working foraging agent. Query tools add depth but aren't required.
Dependencies
- Add tool completion callbacks from Godot to Python #71 (tool completion callbacks) — DONE ✅
- Consolidate SDK: single arena API, single IPC server #78 (consolidated schemas) — DONE ✅
- Remove deprecated code: backends/, v1 IPC, LocalLLMBehavior #80 (remove deprecated code) — DONE ✅
Soft dependencies (nice to have, don't block):
- Add mock observations and local testing support for agent development #72 (mock observations) — helps testing but we can use hand-built Observations
- Complete intermediate starter with real memory utilization and multi-step planning #73 (intermediate starter) — independent work
Success Criteria
- A developer can
pip install anthropic, setANTHROPIC_API_KEY, and have a Claude-powered agent playing foraging in under 15 minutes - The starter teaches Anthropic tool_use concepts through commented, tutorial-quality code
- The adapter base class is reusable for LangGraph and future frameworks
- Existing
AgentArena.run(callback)still works unchanged - Observation formatting is shared (not duplicated per starter)
- All tests pass
Context
See docs/framework_integration_strategy.md for the full strategic discussion. Original issue scope was trimmed after analysis revealed query tools need SpatialMemory migration (#85) and the scope was too large for a single PR.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status