Skip to content

feat(arena): SAP-style autobattler — contracts + MCP + Tournament Hall UI#29

Merged
keanji-x merged 12 commits into
mainfrom
feat/arena-spike
Jun 2, 2026
Merged

feat(arena): SAP-style autobattler — contracts + MCP + Tournament Hall UI#29
keanji-x merged 12 commits into
mainfrom
feat/arena-spike

Conversation

@keanji-x

@keanji-x keanji-x commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Implements the full Arena vertical slice described in #27 — from contracts up to spectator UI — and validated end-to-end with LLM agents playing real matches against each other.

Three logical layers in 9 commits:

🔨 Contracts (6 commits, fully reviewed + fixed)

  • ArenaEngine.sol — UUPS proxy, 5 verbs (buy/sell/move/freeze/roll) + submit + matchmaking + simulate + settle + ELO update
  • AbilityLib.sol — trigger × effect × target dispatcher with BFS event queue (cap 64) — pattern borrowed from Autochessia
  • UnitCatalog.sol — 12-unit spike catalog as immutable function table
  • GameEngine.sol — adds spendOre / refundOre operator hooks (only main-world delta)
  • Router.sol — adds arenaEngine slot + getAddressesV2() (preserves existing getAddresses() 6-tuple)
  • Upgrade.s.sol — handles backfill deploy when router lacks arena slot
  • ArenaEngine.t.sol — 29 tests covering core flows + review-driven gap fills

🤖 MCP + agent-runner integration (2 commits)

  • 5 new MCP tools: arena_list_units, arena_get_state, arena_buy, arena_submit, arena_get_recent_matches
  • agent-runner/mcp.ts pulls arena_get_state into context each cycle
  • agent-runner/llm.ts appends Arena prompt block + renders arenaState into user prompt
  • No phase-mode switch — agents pick it up organically

🎨 Frontend: AI Tournament Hall (1 commit)

  • New /arena React route, 5-block layout:
    • TopBar (LIVE indicator + matchmaking ETA + bucket activity)
    • LeaderboardPanel (top ELO + recent matches click-to-replay)
    • StagePanel (focused match replay theater + eval-bar)
    • AgentMindPanel (selected agent's reasoning timeline from AgentLedger)
    • HighlightTicker (upset + streak-break detectors)
  • Pure React DOM + CSS keyframes (no new Phaser scene needed for linear 5-slot bench)
  • Bonus fix: useGameEngine + useArenaEngine now treat any RFC1918 private IP as a localhost build (was breaking LAN access)

Verification

Layer How verified
Contracts 57/57 foundry tests (29 ArenaEngine + 28 GameEngine baseline)
Deploy Gravity Testnet ArenaEngine 0xFb2aF6D5cFF7A04Bcfd043236884B9e7137050D4 + local anvil
Combat determinism Same seed + ghost → same winner (test_simulate_deterministic_same_seed_same_winner)
ELO math Symmetric ±delta, cross-boundary bucket migration, settle writes to EvaluationLedger
MCP tools cast send flow reproduces all 5 verbs + matchmaking + settle on anvil
LLM gameplay 4 agents (Mira / Kael / Renn / Lila) played 3 full matches end-to-end on anvil, captured real reasoning in AgentLedger including Lila's ON_BUY ordering self-correction
Spectator UI Headless chromium screenshot at 1600×1000 verified leaderboard / stage / agent mind all render with live chain data

What's NOT in this PR

Known TODOs (carried in #27, not blocking merge)

P0: Design remaining 48 units · MATCHMAKING_PERIOD → setter · double-sided EvalLedger writes
P1: chain.ts migrate to router.arenaEngine() direct getter · EOA-only players · Ghost.season field
P2 (before prize pool): VRF for matchmaking · keccak(seed, k) not XOR · logistic ELO

Test plan

  • cd contracts && forge test — 57/57 pass
  • just anvil-deploy + cast send flow per ArenaEngine.t.sol reproduces on local chain
  • cd frontend && APP_CONFIG=localhost npm run dev/arena renders with empty state on fresh anvil
  • Create 2 agents, run buy/submit/runMatchmaking/settle via cast — UI shows updated leaderboard + replay within ~5s polling
  • Run agent-runner against anvil with arena-focused personality (see end-to-end demo script in /tmp/arena-demo/ from PR validation)

Refs

🤖 Generated with Claude Code

keanji-x and others added 12 commits June 1, 2026 15:46
…pike

AbilityLib is a pure library implementing a trigger × effect × target dispatcher
backed by an FIFO event queue capped at 64 steps. Modeled on Autochessia's
EventType/Attribute/ApplyTo three-enum pattern but stripped of MUD/ECS — pure
Solidity memory structs only.

UnitCatalog encodes the 12 spike units (4 tiers × 3 units) as a pure function
table — names lean into the Gravity Town theme (Mineworker, Stoneguard,
Pyromancer, Wraith, etc). The ability matrix covers all 6 triggers and all 5
effect types.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Engine slot

ArenaEngine is the async ghost autobattler. Highlights:
- 5 player verbs (buy/sell/move/freeze/roll) gated by canControlAgent
- ELO-bucketed matchmaking with 30min rate-limit per bucket + Fisher-Yates pairing
- View-only deterministic simulateMatch(matchId) replays combat from seed + ghost
  snapshots — battle never writes storage
- settleMatch updates ELO and writes an "arena defeat" evaluation entry on the
  loser via EvaluationLedger

GameEngine.spendOre is a single ~10-line operator hook so ArenaEngine can deduct
ore without owning a separate balance system. Auto-harvests first to ensure a
stale pool doesn't block a valid spend.

Router gains a storage-appended `arenaEngine` slot + `getAddressesV2()` returning
7 addresses. The original `getAddresses()` 6-tuple is preserved verbatim so
chain.ts and Upgrade.s.sol's length-sniff decoder don't break.

Design judgments:
- sell() emits the refund amount but skips the credit-back path for the spike —
  GameEngine has no public refund hook today; documented as TODO with rationale.
- Combat draw → defender wins (need a tiebreaker for ELO; spike judgment).
- ELO uses a linear approximation of the standard logistic with K=32; pure
  on-chain fixed-point logistic is overkill for a spike.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ArenaEngine.t.sol: 12 tests covering the 5 player verbs, ELO bucketing,
matchmaking rate-limit + pairing, deterministic simulation, ELO settlement,
ability-chain triggers, and the queue-depth safeguard.

Deploy.s.sol: deploys ArenaEngine proxy, registers it as operator on the
registry so spendOre / EvaluationLedger.write succeed, and wires it into
Router via setArenaEngine.

Upgrade.s.sol: mirrors the EvaluationLedger backfill pattern — if the router
slot is unset, deploys a fresh ArenaEngine proxy and registers it; otherwise
upgrades the existing impl. Idempotent.

Also: renamed UnitCatalog.getUnit's `cost` return name to `unitCost` to silence
the shadowing warning vs the sibling `cost()` helper. Pure cosmetic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ON_BUY (Mineworker +1 ATK self, Battlemage +2 ATK right) and ON_SELL
(Ravenscout +1 ATK all allies) used to fire in catalog but never reached
the battle: bench only stored uint8 unitType and _buildBattleState
re-pulled clean base stats from UnitCatalog on every simulate. Add a
parallel int16 atkOverride / hpOverride pair to Ghost, snapshot it onto
each Match at creation, and stack the overlay on top of base stats in
_materialize. AbilityLib gains applyBenchAbility — a shop-phase processor
that honors EFF_ADD_ATK / EFF_ADD_HP with TGT_SELF, LEFT/RIGHT_NEIGHBOR
and ALL_ALLIES. move() now swaps overlays alongside slots; sell()
zeroes the seller's overlay slot after firing its own ON_SELL.

Also adds GameEngine.refundOre — operator-only credit that caps at
MAX_ORE_POOL — and rewires ArenaEngine.sell to actually refund cost/2
ore (previously the refund was only emitted, never credited).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…field

Tighten ability dispatch and matchmaking:
  - AbilityLib._applyEffect now requires BUFF_NEIGHBOR effects to be
    registered with TGT_SELF; the buff is always relative to the caster's
    slot and a target-relative variant is incoherent. Surfaces catalog
    mistakes in tests instead of mis-buffing the wrong unit at runtime.
  - ArenaEngine adds MAX_BUCKET_SIZE = 256 so a single bucket can't grow
    unbounded and brick Fisher-Yates gas during runMatchmaking. Excess
    submitters revert with "bucket full" and naturally rebalance into
    other ELO bands.
  - Draws (both sides alive after 200-turn cap, or simultaneous wipes)
    now pick a winner from keccak(seed, "draw") instead of always
    awarding defender — removes a systematic attacker-vs-defender bias.
  - ShopFrozen event gains a nowFrozen bool so clients can mirror the
    toggle without reading the mask back.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds 17 new tests pinning the spike contracts that double-review flagged:
  - ELO symmetric K=32 (low-beats-high and high-beats-low) via a new
    previewEloUpdate external getter — frontends can reuse for "+X/-X"
    pre-settle hints, tests use it to lock the symmetric Elo behavior.
  - settle writes a properly-shaped evaluation entry on the loser
    (rating=4, category="arena", related=[winner]).
  - bucket boundary crossing after a win moves the ghost between buckets.
  - matchmaking: odd-N sits one out; uses Match snapshot (not current
    bench); respects MAX_BUCKET_SIZE via the new "bucket full" require
    (storage-poked to dodge 256-real-agent gas blowup).
  - buy: rejects out-of-range unit types and double-bought slots.
  - freeze: toggle emits nowFrozen on both true and false transitions.
  - roll: changes seed and net-spends ROLL_COST despite auto-harvest.
  - ON_BUY (Mineworker self) and ON_SELL (Ravenscout all-allies) buffs
    persist all the way into combat damage numbers.
  - BUFF_NEIGHBOR with non-SELF target reverts via the dispatch require.
  - simulateMatch is view-only — calling it twice doesn't mutate ELO.
  - Wraith ON_DEATH summon is suppressed when no empty slot exists.
  - Draws resolve by keccak(seed,"draw") tiebreak (no attacker/defender
    bias).

Refactor: split _simulateInternal out of simulateMatch into a shared
_runCombat loop so settleMatch can skip the 128-slot trace alloc — saves
~5-10k gas per settle.

Doc: pin _eloUpdate's symmetric behavior in NatSpec as intentional spike
simplification + TODO PR #2 to swap in fixed-point logistic. Also adds
TODO comments at the prevrandao / seedMix XOR / canControlAgent /
Ghost.season / Router.getAddressesV2 sites called out by review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds ArenaEngine ABI + 7 chain client methods + decodes Router.getAddressesV2
(7-tuple) with backwards-compatible fallback for routers without arena slot.

5 new MCP tools wired in tools.ts:
- arena_list_units      list 12 unit catalog (id, name, atk, hp, cost, ability)
- arena_get_state       per-agent bench + ELO + bucket + ore
- arena_buy             spend ore, fill slot, trigger ON_BUY + persist overlay
- arena_submit          push ghost into ELO bucket for matchmaking
- arena_get_recent_matches  read EvaluationLedger entries tagged category="arena"

agent_id auto-injected via selfTools convention; arena calls degrade gracefully
when router lacks arenaEngine() (returns descriptive error, no crash).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mcp.ts:    arena_buy/sell/submit/get_state/get_recent_matches added to selfTools
           agent_id auto-fill; collectContext now pulls arena_get_state into
           AgentContext.arenaState (graceful fallback on missing arena)

llm.ts:    appends ARENA section to system prompt (12-unit catalog summary +
           5 verbs + bench rules); buildUserPrompt renders arenaState as a
           PHASE: ARENA hint block when ghost exists or ore is high enough
           to participate

types.ts:  AgentContext gets optional arenaState field

Agents pick up Arena play organically — no phase mode switch needed. In
end-to-end demo runs with arena-focused personalities, 4 agents played 3
full matches and produced legitimate post-match reasoning (incl. Lila's
ON_BUY ordering self-correction captured on-chain in AgentLedger).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spectator UI for the Arena. 5-block layout per the codex design proposal
(chess.com Events × LangSmith trace × Smallville 旁观 hybrid).

New files (12, ~1.4k LOC):
- app/arena/page.tsx                      main layout + selection state
- components/arena/TopBar.tsx             LIVE clock + bucket activity
- components/arena/LeaderboardPanel.tsx   top ELO + recent matches rail
- components/arena/StagePanel.tsx         focused-match replay theater
- components/arena/AgentMindPanel.tsx     selected agent's reasoning timeline
- components/arena/ReplayCanvas.tsx       turn-by-turn battle renderer
- components/arena/UnitCard.tsx           single-unit card primitive
- components/arena/EvalBar.tsx            ELO-delta linearized winner bar
- components/arena/HighlightTicker.tsx    upset / streak-break feed
- hooks/useArenaEngine.ts                 chain polling (mirrors useGameEngine)
- store/useArenaStore.ts                  zustand slice
- lib/arenaUnits.ts                       12-unit name + ability mapping

Reuses Phaser sprites where possible but renders Arena in pure React DOM +
CSS keyframes — SAP-style 5-slot bench doesn't need camera/zoom.

Bonus fix (useGameEngine + useArenaEngine): treat any private RFC1918 RPC
URL (10/8, 172.16-31/12, 192.168/16, 127.0.0.1, localhost) as a localhost
build for network-picker fallback purposes. Previously only literal
127.0.0.1/localhost matched, which broke LAN access to dev server.

contracts/foundry.lock: commit forge dependency lockfile (forge-std rev).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- ReplayCanvas: drop unused useState import and dead isAttacker bindings
- useArenaEngine: replace `any` in simulate decode with `readonly unknown[]`
@keanji-x keanji-x merged commit cf71923 into main Jun 2, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant