feat(v2): cloud-brain smoke script (Phase B prep — gated on Phase 1B merge)#8
Draft
LightDriverCS wants to merge 2 commits into
Draft
feat(v2): cloud-brain smoke script (Phase B prep — gated on Phase 1B merge)#8LightDriverCS wants to merge 2 commits into
LightDriverCS wants to merge 2 commits into
Conversation
Writes `scripts/cloud-brain-smoke.mjs` per V2 runbook §"Validation
script". Exercises the cloud-brain end-to-end path once Phase 1B
merges (BenchAGI #872 W1 + #874 W4 + #878 W2 + #988 relay +
openclaw#24 W3) and an agent's deployment is flipped to
runtime: 'remote-brain'.
What the script does:
- Lists known agents from the local openclaw gateway.
- Queries Firestore (admin REST + gcloud token per the documented
recipe) for `agentDeployments/{instanceId}_{agentId}.runtime`.
- For each agent with `runtime === 'remote-brain'`, spawns
`benchagi --agent <name> --liveness off "respond: smoke-ok"`
with stdout captured and a 60s timeout.
- Asserts: exit 0, non-empty stdout, no error markers, latency
under 60s.
- Emits JSON summary; exits 0 only if all tested agents passed.
Required env: INSTANCE_ID. Optional: GCP_PROJECT (default
benchagi-8ea90), SMOKE_AGENT_FILTER, SMOKE_PROMPT, SMOKE_TIMEOUT_MS.
Gated on cloud-brain Phase 1B merging + a deployment flipped to
remote-brain. Until then, the script reports "no remote-brain
agents found — Phase 1B may not be merged + flipped yet" and exits
0 (not a failure).
Stays as a draft PR until Cory merges Phase 1B and flips at least
one deployment, at which point we run the smoke + capture the
transcript for ANVIL-5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex Anvil 4 finding: HOLD. Two P0s + two P1s + a missing-test
flag.
Most important: Codex couldn't find a readable local gateway path
that routes chat.send through cloud-brain when an agent's
deployment has runtime: 'remote-brain'. Without that dispatch
bridge, the smoke script will pass on local execution without
exercising cloud-brain at all. The bridge MAY be in W2 (#878)
which Codex couldn't see in full diff via the connector; needs
verification.
Other findings:
- P1: bench-cli's chat.history call expects events/frames but the
readable OpenClaw chat.history handler returns transcript
messages and ignores sinceSeq. V1.1 reconnect replay is
best-effort no-op until this contract is reconciled.
- P1: Firestore doc id format assumption in the smoke
(instances/{instanceId}/agentDeployments/{instanceId}_{agentId})
may not match real W1 backfill output if deploymentIds differ.
- P2: Firestore failures recorded as skips can produce false
exit-0 if no agents tested.
- P2: smoke assumes repo-root cwd.
Stronger smoke proposed: assert directive artifacts in Firestore
(relayDirectives doc with directiveType: llm_turn), assert directive
reaches completed, AND assert CLI saw normal chat/agent.lifecycle
frames. That proves both halves of ADR-006 transparency.
Doesn't change the smoke script in this commit — the structural
concern (missing dispatch bridge) needs resolution first. The
smoke script is preserved as-is (P0 #2 acknowledged as a review
finding) so Cory can decide which path to take.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase B prep per the V2 runbook §"Validation script". Adds
scripts/cloud-brain-smoke.mjs— the script that exercises the cloud-brain end-to-end path once Phase 1B merges and an agent's deployment is flipped toruntime: 'remote-brain'.Draft until cloud-brain Phase 1B merges. Convert to ready and run a live smoke once:
/v1/llm_turnendpoint merges + a build is deployed.agentDeployments/{instance}_{agent}.runtimeis set toremote-brainper the operator-side smoke runbook (~/.openclaw/wiki/main/_boards/forensics/cloud-brain-architecture-journal/20-end-to-end-smoke-runbook.md).What the script does
benchagi agents list.X-Goog-User-Projectheader per the documented recipe) foragentDeployments/{instanceId}_{agentId}and reads theruntimefield.runtime === 'remote-brain', spawnsbenchagi --agent <name> --liveness off "respond: smoke-ok"with stdout captured and a 60s wall-clock timeout.Required env
INSTANCE_ID— Firestore instance id (Cory's primary)Optional env
GCP_PROJECT(default:benchagi-8ea90)SMOKE_AGENT_FILTER— regex to limit which agents are testedSMOKE_PROMPT— override the defaultrespond: smoke-okpromptSMOKE_TIMEOUT_MS— override the 60s defaultTest plan
remote-brainINSTANCE_ID=<your-instance> node scripts/cloud-brain-smoke.mjs→ exit 0, JSON summary shows ≥1 agent passeddocs/v2/cloud-brain-smoke-transcript-2026-XX-XX.mdAnvil Handoff
This PR will get one Codex Anvil pass (ANVIL-4) per the runbook §"Anvil pass on cloud-brain readiness" — review of the end-to-end path + the smoke script itself BEFORE the live run. ANVIL-5 happens after the live run with the transcript.
🤖 Generated with Claude Code