Skip to content

feat(v2): cloud-brain smoke script (Phase B prep — gated on Phase 1B merge)#8

Draft
LightDriverCS wants to merge 2 commits into
mainfrom
feat/cloud-brain-smoke-script
Draft

feat(v2): cloud-brain smoke script (Phase B prep — gated on Phase 1B merge)#8
LightDriverCS wants to merge 2 commits into
mainfrom
feat/cloud-brain-smoke-script

Conversation

@LightDriverCS
Copy link
Copy Markdown
Contributor

Summary

Phase B prep per the V2 runbook §"Validation script". Adds scripts/cloud-brain-smoke.mjs — the script that exercises the cloud-brain end-to-end path once Phase 1B merges and an agent's deployment is flipped to runtime: 'remote-brain'.

Draft until cloud-brain Phase 1B merges. Convert to ready and run a live smoke once:

  1. BenchAGI #872 W1 schema + #874 W4 directives + #878 W2 orchestrator + #988 relay extension all merge to BenchAGI/main.
  2. openclaw#24 W3 /v1/llm_turn endpoint merges + a build is deployed.
  3. At least one agent's agentDeployments/{instance}_{agent}.runtime is set to remote-brain per the operator-side smoke runbook (~/.openclaw/wiki/main/_boards/forensics/cloud-brain-architecture-journal/20-end-to-end-smoke-runbook.md).

What the script does

  1. Lists known agents from the local openclaw gateway via benchagi agents list.
  2. For each agent, queries Firestore admin REST (gcloud user token + X-Goog-User-Project header per the documented recipe) for agentDeployments/{instanceId}_{agentId} and reads the runtime field.
  3. If runtime === 'remote-brain', spawns benchagi --agent <name> --liveness off "respond: smoke-ok" with stdout captured and a 60s wall-clock timeout.
  4. Asserts (per runbook): exit 0, non-empty stdout, no error markers in output, latency under 60s.
  5. Emits a JSON summary. Exits 0 only if all tested agents passed.

Required env

  • INSTANCE_ID — Firestore instance id (Cory's primary)

Optional env

  • GCP_PROJECT (default: benchagi-8ea90)
  • SMOKE_AGENT_FILTER — regex to limit which agents are tested
  • SMOKE_PROMPT — override the default respond: smoke-ok prompt
  • SMOKE_TIMEOUT_MS — override the 60s default

Test plan

  • Phase 1B merges
  • At least one deployment flipped to remote-brain
  • INSTANCE_ID=<your-instance> node scripts/cloud-brain-smoke.mjs → exit 0, JSON summary shows ≥1 agent passed
  • Capture transcript at docs/v2/cloud-brain-smoke-transcript-2026-XX-XX.md
  • Run ANVIL-5 Codex pass per runbook §"End-to-end test under cloud-brain"
  • Update wiki entry per Cory's forward commitment ("make sure we get a wiki entry once this is tested")

Anvil Handoff

This PR will get one Codex Anvil pass (ANVIL-4) per the runbook §"Anvil pass on cloud-brain readiness" — review of the end-to-end path + the smoke script itself BEFORE the live run. ANVIL-5 happens after the live run with the transcript.

🤖 Generated with Claude Code

LightDriverCS and others added 2 commits May 6, 2026 13:03
Writes `scripts/cloud-brain-smoke.mjs` per V2 runbook §"Validation
script". Exercises the cloud-brain end-to-end path once Phase 1B
merges (BenchAGI #872 W1 + #874 W4 + #878 W2 + #988 relay +
openclaw#24 W3) and an agent's deployment is flipped to
runtime: 'remote-brain'.

What the script does:
- Lists known agents from the local openclaw gateway.
- Queries Firestore (admin REST + gcloud token per the documented
  recipe) for `agentDeployments/{instanceId}_{agentId}.runtime`.
- For each agent with `runtime === 'remote-brain'`, spawns
  `benchagi --agent <name> --liveness off "respond: smoke-ok"`
  with stdout captured and a 60s timeout.
- Asserts: exit 0, non-empty stdout, no error markers, latency
  under 60s.
- Emits JSON summary; exits 0 only if all tested agents passed.

Required env: INSTANCE_ID. Optional: GCP_PROJECT (default
benchagi-8ea90), SMOKE_AGENT_FILTER, SMOKE_PROMPT, SMOKE_TIMEOUT_MS.

Gated on cloud-brain Phase 1B merging + a deployment flipped to
remote-brain. Until then, the script reports "no remote-brain
agents found — Phase 1B may not be merged + flipped yet" and exits
0 (not a failure).

Stays as a draft PR until Cory merges Phase 1B and flips at least
one deployment, at which point we run the smoke + capture the
transcript for ANVIL-5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex Anvil 4 finding: HOLD. Two P0s + two P1s + a missing-test
flag.

Most important: Codex couldn't find a readable local gateway path
that routes chat.send through cloud-brain when an agent's
deployment has runtime: 'remote-brain'. Without that dispatch
bridge, the smoke script will pass on local execution without
exercising cloud-brain at all. The bridge MAY be in W2 (#878)
which Codex couldn't see in full diff via the connector; needs
verification.

Other findings:
- P1: bench-cli's chat.history call expects events/frames but the
  readable OpenClaw chat.history handler returns transcript
  messages and ignores sinceSeq. V1.1 reconnect replay is
  best-effort no-op until this contract is reconciled.
- P1: Firestore doc id format assumption in the smoke
  (instances/{instanceId}/agentDeployments/{instanceId}_{agentId})
  may not match real W1 backfill output if deploymentIds differ.
- P2: Firestore failures recorded as skips can produce false
  exit-0 if no agents tested.
- P2: smoke assumes repo-root cwd.

Stronger smoke proposed: assert directive artifacts in Firestore
(relayDirectives doc with directiveType: llm_turn), assert directive
reaches completed, AND assert CLI saw normal chat/agent.lifecycle
frames. That proves both halves of ADR-006 transparency.

Doesn't change the smoke script in this commit — the structural
concern (missing dispatch bridge) needs resolution first. The
smoke script is preserved as-is (P0 #2 acknowledged as a review
finding) so Cory can decide which path to take.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant