feat(v2): cloud-brain smoke script (Phase B prep — gated on Phase 1B merge) by LightDriverCS · Pull Request #8 · BenchAGI/bench-cli

LightDriverCS · 2026-05-06T19:03:27Z

Summary

Phase B prep per the V2 runbook §"Validation script". Adds scripts/cloud-brain-smoke.mjs — the script that exercises the cloud-brain end-to-end path once Phase 1B merges and an agent's deployment is flipped to runtime: 'remote-brain'.

Draft until cloud-brain Phase 1B merges. Convert to ready and run a live smoke once:

BenchAGI #872 W1 schema + #874 W4 directives + #878 W2 orchestrator + #988 relay extension all merge to BenchAGI/main.
openclaw#24 W3 /v1/llm_turn endpoint merges + a build is deployed.
At least one agent's agentDeployments/{instance}_{agent}.runtime is set to remote-brain per the operator-side smoke runbook (~/.openclaw/wiki/main/_boards/forensics/cloud-brain-architecture-journal/20-end-to-end-smoke-runbook.md).

What the script does

Lists known agents from the local openclaw gateway via benchagi agents list.
For each agent, queries Firestore admin REST (gcloud user token + X-Goog-User-Project header per the documented recipe) for agentDeployments/{instanceId}_{agentId} and reads the runtime field.
If runtime === 'remote-brain', spawns benchagi --agent <name> --liveness off "respond: smoke-ok" with stdout captured and a 60s wall-clock timeout.
Asserts (per runbook): exit 0, non-empty stdout, no error markers in output, latency under 60s.
Emits a JSON summary. Exits 0 only if all tested agents passed.

Required env

INSTANCE_ID — Firestore instance id (Cory's primary)

Optional env

GCP_PROJECT (default: benchagi-8ea90)
SMOKE_AGENT_FILTER — regex to limit which agents are tested
SMOKE_PROMPT — override the default respond: smoke-ok prompt
SMOKE_TIMEOUT_MS — override the 60s default

Test plan

Phase 1B merges
At least one deployment flipped to remote-brain
INSTANCE_ID=<your-instance> node scripts/cloud-brain-smoke.mjs → exit 0, JSON summary shows ≥1 agent passed
Capture transcript at docs/v2/cloud-brain-smoke-transcript-2026-XX-XX.md
Run ANVIL-5 Codex pass per runbook §"End-to-end test under cloud-brain"
Update wiki entry per Cory's forward commitment ("make sure we get a wiki entry once this is tested")

Anvil Handoff

This PR will get one Codex Anvil pass (ANVIL-4) per the runbook §"Anvil pass on cloud-brain readiness" — review of the end-to-end path + the smoke script itself BEFORE the live run. ANVIL-5 happens after the live run with the transcript.

🤖 Generated with Claude Code

Writes `scripts/cloud-brain-smoke.mjs` per V2 runbook §"Validation script". Exercises the cloud-brain end-to-end path once Phase 1B merges (BenchAGI #872 W1 + #874 W4 + #878 W2 + #988 relay + openclaw#24 W3) and an agent's deployment is flipped to runtime: 'remote-brain'. What the script does: - Lists known agents from the local openclaw gateway. - Queries Firestore (admin REST + gcloud token per the documented recipe) for `agentDeployments/{instanceId}_{agentId}.runtime`. - For each agent with `runtime === 'remote-brain'`, spawns `benchagi --agent <name> --liveness off "respond: smoke-ok"` with stdout captured and a 60s timeout. - Asserts: exit 0, non-empty stdout, no error markers, latency under 60s. - Emits JSON summary; exits 0 only if all tested agents passed. Required env: INSTANCE_ID. Optional: GCP_PROJECT (default benchagi-8ea90), SMOKE_AGENT_FILTER, SMOKE_PROMPT, SMOKE_TIMEOUT_MS. Gated on cloud-brain Phase 1B merging + a deployment flipped to remote-brain. Until then, the script reports "no remote-brain agents found — Phase 1B may not be merged + flipped yet" and exits 0 (not a failure). Stays as a draft PR until Cory merges Phase 1B and flips at least one deployment, at which point we run the smoke + capture the transcript for ANVIL-5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex Anvil 4 finding: HOLD. Two P0s + two P1s + a missing-test flag. Most important: Codex couldn't find a readable local gateway path that routes chat.send through cloud-brain when an agent's deployment has runtime: 'remote-brain'. Without that dispatch bridge, the smoke script will pass on local execution without exercising cloud-brain at all. The bridge MAY be in W2 (#878) which Codex couldn't see in full diff via the connector; needs verification. Other findings: - P1: bench-cli's chat.history call expects events/frames but the readable OpenClaw chat.history handler returns transcript messages and ignores sinceSeq. V1.1 reconnect replay is best-effort no-op until this contract is reconciled. - P1: Firestore doc id format assumption in the smoke (instances/{instanceId}/agentDeployments/{instanceId}_{agentId}) may not match real W1 backfill output if deploymentIds differ. - P2: Firestore failures recorded as skips can produce false exit-0 if no agents tested. - P2: smoke assumes repo-root cwd. Stronger smoke proposed: assert directive artifacts in Firestore (relayDirectives doc with directiveType: llm_turn), assert directive reaches completed, AND assert CLI saw normal chat/agent.lifecycle frames. That proves both halves of ADR-006 transparency. Doesn't change the smoke script in this commit — the structural concern (missing dispatch bridge) needs resolution first. The smoke script is preserved as-is (P0 #2 acknowledged as a review finding) so Cory can decide which path to take. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

LightDriverCS and others added 2 commits May 6, 2026 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(v2): cloud-brain smoke script (Phase B prep — gated on Phase 1B merge)#8

feat(v2): cloud-brain smoke script (Phase B prep — gated on Phase 1B merge)#8
LightDriverCS wants to merge 2 commits into
mainfrom
feat/cloud-brain-smoke-script

LightDriverCS commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LightDriverCS commented May 6, 2026

Summary

What the script does

Required env

Optional env

Test plan

Anvil Handoff

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant