feat(arena): SAP-style autobattler — contracts + MCP + Tournament Hall UI by keanji-x · Pull Request #29 · Galxe/gravity-world

keanji-x · 2026-06-02T05:34:45Z

Summary

Implements the full Arena vertical slice described in #27 — from contracts up to spectator UI — and validated end-to-end with LLM agents playing real matches against each other.

Three logical layers in 9 commits:

🔨 Contracts (6 commits, fully reviewed + fixed)

ArenaEngine.sol — UUPS proxy, 5 verbs (buy/sell/move/freeze/roll) + submit + matchmaking + simulate + settle + ELO update
AbilityLib.sol — trigger × effect × target dispatcher with BFS event queue (cap 64) — pattern borrowed from Autochessia
UnitCatalog.sol — 12-unit spike catalog as immutable function table
GameEngine.sol — adds spendOre / refundOre operator hooks (only main-world delta)
Router.sol — adds arenaEngine slot + getAddressesV2() (preserves existing getAddresses() 6-tuple)
Upgrade.s.sol — handles backfill deploy when router lacks arena slot
ArenaEngine.t.sol — 29 tests covering core flows + review-driven gap fills

🤖 MCP + agent-runner integration (2 commits)

5 new MCP tools: arena_list_units, arena_get_state, arena_buy, arena_submit, arena_get_recent_matches
agent-runner/mcp.ts pulls arena_get_state into context each cycle
agent-runner/llm.ts appends Arena prompt block + renders arenaState into user prompt
No phase-mode switch — agents pick it up organically

🎨 Frontend: AI Tournament Hall (1 commit)

New /arena React route, 5-block layout:
- TopBar (LIVE indicator + matchmaking ETA + bucket activity)
- LeaderboardPanel (top ELO + recent matches click-to-replay)
- StagePanel (focused match replay theater + eval-bar)
- AgentMindPanel (selected agent's reasoning timeline from AgentLedger)
- HighlightTicker (upset + streak-break detectors)
Pure React DOM + CSS keyframes (no new Phaser scene needed for linear 5-slot bench)
Bonus fix: useGameEngine + useArenaEngine now treat any RFC1918 private IP as a localhost build (was breaking LAN access)

Verification

Layer	How verified
Contracts	57/57 foundry tests (29 ArenaEngine + 28 GameEngine baseline)
Deploy	Gravity Testnet ArenaEngine `0xFb2aF6D5cFF7A04Bcfd043236884B9e7137050D4` + local anvil
Combat determinism	Same seed + ghost → same winner (`test_simulate_deterministic_same_seed_same_winner`)
ELO math	Symmetric ±delta, cross-boundary bucket migration, settle writes to EvaluationLedger
MCP tools	`cast send` flow reproduces all 5 verbs + matchmaking + settle on anvil
LLM gameplay	4 agents (Mira / Kael / Renn / Lila) played 3 full matches end-to-end on anvil, captured real reasoning in AgentLedger including Lila's ON_BUY ordering self-correction
Spectator UI	Headless chromium screenshot at 1600×1000 verified leaderboard / stage / agent mind all render with live chain data

What's NOT in this PR

48 remaining unit ability designs (12 spike units cover the system, content is separate design work)
Prize pool / G economy (V2, tracked in feat(economy): 双 loop 经济 — 主世界 mint 卡 × G 押注爬塔 #28)
NFT card mint (V2, tracked in feat(economy): 双 loop 经济 — 主世界 mint 卡 × G 押注爬塔 #28)
Keeper automation for runMatchmaking (manual call works; cron later)

Known TODOs (carried in #27, not blocking merge)

P0: Design remaining 48 units · MATCHMAKING_PERIOD → setter · double-sided EvalLedger writes
P1: chain.ts migrate to router.arenaEngine() direct getter · EOA-only players · Ghost.season field
P2 (before prize pool): VRF for matchmaking · keccak(seed, k) not XOR · logistic ELO

Test plan

cd contracts && forge test — 57/57 pass
just anvil-deploy + cast send flow per ArenaEngine.t.sol reproduces on local chain
cd frontend && APP_CONFIG=localhost npm run dev — /arena renders with empty state on fresh anvil
Create 2 agents, run buy/submit/runMatchmaking/settle via cast — UI shows updated leaderboard + replay within ~5s polling
Run agent-runner against anvil with arena-focused personality (see end-to-end demo script in /tmp/arena-demo/ from PR validation)

Refs

feat(arena): 极简自走棋 — Super Auto Pets 风格异步 ghost combat #27 — Arena game mechanics spec (refined with what was built)
feat(economy): 双 loop 经济 — 主世界 mint 卡 × G 押注爬塔 #28 — V2 economy direction (two-loop farmer + speculator, NFT mint triggers)

🤖 Generated with Claude Code

…pike AbilityLib is a pure library implementing a trigger × effect × target dispatcher backed by an FIFO event queue capped at 64 steps. Modeled on Autochessia's EventType/Attribute/ApplyTo three-enum pattern but stripped of MUD/ECS — pure Solidity memory structs only. UnitCatalog encodes the 12 spike units (4 tiers × 3 units) as a pure function table — names lean into the Gravity Town theme (Mineworker, Stoneguard, Pyromancer, Wraith, etc). The ability matrix covers all 6 triggers and all 5 effect types. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…Engine slot ArenaEngine is the async ghost autobattler. Highlights: - 5 player verbs (buy/sell/move/freeze/roll) gated by canControlAgent - ELO-bucketed matchmaking with 30min rate-limit per bucket + Fisher-Yates pairing - View-only deterministic simulateMatch(matchId) replays combat from seed + ghost snapshots — battle never writes storage - settleMatch updates ELO and writes an "arena defeat" evaluation entry on the loser via EvaluationLedger GameEngine.spendOre is a single ~10-line operator hook so ArenaEngine can deduct ore without owning a separate balance system. Auto-harvests first to ensure a stale pool doesn't block a valid spend. Router gains a storage-appended `arenaEngine` slot + `getAddressesV2()` returning 7 addresses. The original `getAddresses()` 6-tuple is preserved verbatim so chain.ts and Upgrade.s.sol's length-sniff decoder don't break. Design judgments: - sell() emits the refund amount but skips the credit-back path for the spike — GameEngine has no public refund hook today; documented as TODO with rationale. - Combat draw → defender wins (need a tiebreaker for ELO; spike judgment). - ELO uses a linear approximation of the standard logistic with K=32; pure on-chain fixed-point logistic is overkill for a spike. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ArenaEngine.t.sol: 12 tests covering the 5 player verbs, ELO bucketing, matchmaking rate-limit + pairing, deterministic simulation, ELO settlement, ability-chain triggers, and the queue-depth safeguard. Deploy.s.sol: deploys ArenaEngine proxy, registers it as operator on the registry so spendOre / EvaluationLedger.write succeed, and wires it into Router via setArenaEngine. Upgrade.s.sol: mirrors the EvaluationLedger backfill pattern — if the router slot is unset, deploys a fresh ArenaEngine proxy and registers it; otherwise upgrades the existing impl. Idempotent. Also: renamed UnitCatalog.getUnit's `cost` return name to `unitCost` to silence the shadowing warning vs the sibling `cost()` helper. Pure cosmetic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ON_BUY (Mineworker +1 ATK self, Battlemage +2 ATK right) and ON_SELL (Ravenscout +1 ATK all allies) used to fire in catalog but never reached the battle: bench only stored uint8 unitType and _buildBattleState re-pulled clean base stats from UnitCatalog on every simulate. Add a parallel int16 atkOverride / hpOverride pair to Ghost, snapshot it onto each Match at creation, and stack the overlay on top of base stats in _materialize. AbilityLib gains applyBenchAbility — a shop-phase processor that honors EFF_ADD_ATK / EFF_ADD_HP with TGT_SELF, LEFT/RIGHT_NEIGHBOR and ALL_ALLIES. move() now swaps overlays alongside slots; sell() zeroes the seller's overlay slot after firing its own ON_SELL. Also adds GameEngine.refundOre — operator-only credit that caps at MAX_ORE_POOL — and rewires ArenaEngine.sell to actually refund cost/2 ore (previously the refund was only emitted, never credited). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…field Tighten ability dispatch and matchmaking: - AbilityLib._applyEffect now requires BUFF_NEIGHBOR effects to be registered with TGT_SELF; the buff is always relative to the caster's slot and a target-relative variant is incoherent. Surfaces catalog mistakes in tests instead of mis-buffing the wrong unit at runtime. - ArenaEngine adds MAX_BUCKET_SIZE = 256 so a single bucket can't grow unbounded and brick Fisher-Yates gas during runMatchmaking. Excess submitters revert with "bucket full" and naturally rebalance into other ELO bands. - Draws (both sides alive after 200-turn cap, or simultaneous wipes) now pick a winner from keccak(seed, "draw") instead of always awarding defender — removes a systematic attacker-vs-defender bias. - ShopFrozen event gains a nowFrozen bool so clients can mirror the toggle without reading the mask back. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds 17 new tests pinning the spike contracts that double-review flagged: - ELO symmetric K=32 (low-beats-high and high-beats-low) via a new previewEloUpdate external getter — frontends can reuse for "+X/-X" pre-settle hints, tests use it to lock the symmetric Elo behavior. - settle writes a properly-shaped evaluation entry on the loser (rating=4, category="arena", related=[winner]). - bucket boundary crossing after a win moves the ghost between buckets. - matchmaking: odd-N sits one out; uses Match snapshot (not current bench); respects MAX_BUCKET_SIZE via the new "bucket full" require (storage-poked to dodge 256-real-agent gas blowup). - buy: rejects out-of-range unit types and double-bought slots. - freeze: toggle emits nowFrozen on both true and false transitions. - roll: changes seed and net-spends ROLL_COST despite auto-harvest. - ON_BUY (Mineworker self) and ON_SELL (Ravenscout all-allies) buffs persist all the way into combat damage numbers. - BUFF_NEIGHBOR with non-SELF target reverts via the dispatch require. - simulateMatch is view-only — calling it twice doesn't mutate ELO. - Wraith ON_DEATH summon is suppressed when no empty slot exists. - Draws resolve by keccak(seed,"draw") tiebreak (no attacker/defender bias). Refactor: split _simulateInternal out of simulateMatch into a shared _runCombat loop so settleMatch can skip the 128-slot trace alloc — saves ~5-10k gas per settle. Doc: pin _eloUpdate's symmetric behavior in NatSpec as intentional spike simplification + TODO PR #2 to swap in fixed-point logistic. Also adds TODO comments at the prevrandao / seedMix XOR / canControlAgent / Ghost.season / Router.getAddressesV2 sites called out by review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds ArenaEngine ABI + 7 chain client methods + decodes Router.getAddressesV2 (7-tuple) with backwards-compatible fallback for routers without arena slot. 5 new MCP tools wired in tools.ts: - arena_list_units list 12 unit catalog (id, name, atk, hp, cost, ability) - arena_get_state per-agent bench + ELO + bucket + ore - arena_buy spend ore, fill slot, trigger ON_BUY + persist overlay - arena_submit push ghost into ELO bucket for matchmaking - arena_get_recent_matches read EvaluationLedger entries tagged category="arena" agent_id auto-injected via selfTools convention; arena calls degrade gracefully when router lacks arenaEngine() (returns descriptive error, no crash). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mcp.ts: arena_buy/sell/submit/get_state/get_recent_matches added to selfTools agent_id auto-fill; collectContext now pulls arena_get_state into AgentContext.arenaState (graceful fallback on missing arena) llm.ts: appends ARENA section to system prompt (12-unit catalog summary + 5 verbs + bench rules); buildUserPrompt renders arenaState as a PHASE: ARENA hint block when ghost exists or ore is high enough to participate types.ts: AgentContext gets optional arenaState field Agents pick up Arena play organically — no phase mode switch needed. In end-to-end demo runs with arena-focused personalities, 4 agents played 3 full matches and produced legitimate post-match reasoning (incl. Lila's ON_BUY ordering self-correction captured on-chain in AgentLedger). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Spectator UI for the Arena. 5-block layout per the codex design proposal (chess.com Events × LangSmith trace × Smallville 旁观 hybrid). New files (12, ~1.4k LOC): - app/arena/page.tsx main layout + selection state - components/arena/TopBar.tsx LIVE clock + bucket activity - components/arena/LeaderboardPanel.tsx top ELO + recent matches rail - components/arena/StagePanel.tsx focused-match replay theater - components/arena/AgentMindPanel.tsx selected agent's reasoning timeline - components/arena/ReplayCanvas.tsx turn-by-turn battle renderer - components/arena/UnitCard.tsx single-unit card primitive - components/arena/EvalBar.tsx ELO-delta linearized winner bar - components/arena/HighlightTicker.tsx upset / streak-break feed - hooks/useArenaEngine.ts chain polling (mirrors useGameEngine) - store/useArenaStore.ts zustand slice - lib/arenaUnits.ts 12-unit name + ability mapping Reuses Phaser sprites where possible but renders Arena in pure React DOM + CSS keyframes — SAP-style 5-slot bench doesn't need camera/zoom. Bonus fix (useGameEngine + useArenaEngine): treat any private RFC1918 RPC URL (10/8, 172.16-31/12, 192.168/16, 127.0.0.1, localhost) as a localhost build for network-picker fallback purposes. Previously only literal 127.0.0.1/localhost matched, which broke LAN access to dev server. contracts/foundry.lock: commit forge dependency lockfile (forge-std rev). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- ReplayCanvas: drop unused useState import and dead isAttacker bindings - useArenaEngine: replace `any` in simulate decode with `readonly unknown[]`

keanji-x and others added 12 commits June 1, 2026 15:46

fix(frontend): clear lint errors (unused imports, no-explicit-any)

490d643

- ReplayCanvas: drop unused useState import and dead isAttacker bindings - useArenaEngine: replace `any` in simulate decode with `readonly unknown[]`

fix(frontend): cast simulate tuple at source so map infers correctly

5edfacc

fix(frontend): const-assert W/L literals for typed recent-results array

f9947d5

keanji-x merged commit cf71923 into main Jun 2, 2026
3 checks passed

keanji-x mentioned this pull request Jun 2, 2026

🏆 Hackathon — Arena 可玩 MVP (周五 demo) #31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(arena): SAP-style autobattler — contracts + MCP + Tournament Hall UI#29

feat(arena): SAP-style autobattler — contracts + MCP + Tournament Hall UI#29
keanji-x merged 12 commits into
mainfrom
feat/arena-spike

keanji-x commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

keanji-x commented Jun 2, 2026

Summary

🔨 Contracts (6 commits, fully reviewed + fixed)

🤖 MCP + agent-runner integration (2 commits)

🎨 Frontend: AI Tournament Hall (1 commit)

Verification

What's NOT in this PR

Known TODOs (carried in #27, not blocking merge)

Test plan

Refs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant