From de90a364cfb6826e8a48f24862af9a1ed42107e8 Mon Sep 17 00:00:00 2001 From: Terada Kousuke Date: Fri, 3 Apr 2026 18:11:28 +0900 Subject: [PATCH] finish guardrails mvp floor --- docs/ai-guardrails/README.md | 4 +- .../adr/002-provider-admission-lanes.md | 5 +- .../adr/005-scripted-scenario-replays.md | 45 +++ .../adr/006-plugin-hardening-floor.md | 70 ++++ packages/guardrails/README.md | 4 +- packages/guardrails/bin/opencode-guardrails | 35 +- packages/guardrails/managed/opencode.json | 43 ++- .../profile/agents/provider-eval.md | 2 +- packages/guardrails/profile/opencode.json | 43 ++- .../guardrails/profile/plugins/guardrail.ts | 327 +++++++++++++++- packages/opencode/bin/opencode | 35 +- .../opencode/test/scenario/guardrails.test.ts | 348 +++++++++++++++++- packages/opencode/test/scenario/harness.ts | 287 +++++++++++++++ packages/opencode/test/scenario/replay.ts | 233 ++++++++++++ 14 files changed, 1423 insertions(+), 58 deletions(-) create mode 100644 docs/ai-guardrails/adr/005-scripted-scenario-replays.md create mode 100644 docs/ai-guardrails/adr/006-plugin-hardening-floor.md create mode 100644 packages/opencode/test/scenario/harness.ts create mode 100644 packages/opencode/test/scenario/replay.ts diff --git a/docs/ai-guardrails/README.md b/docs/ai-guardrails/README.md index 125a66ab6d8a..c9f4be37b018 100644 --- a/docs/ai-guardrails/README.md +++ b/docs/ai-guardrails/README.md @@ -45,6 +45,7 @@ The main source set for this migration is: - Claude Code official hooks and settings docs - Anthropic skill guide PDF (`The Complete Guide to Building Skills for Claude`) and summary - OpenCode rules, skills, commands, and plugins docs +- Z.AI OpenCode / Coding Plan docs In this migration, references to the `BDF` document should be interpreted as Anthropic's PDF `The Complete Guide to Building Skills for Claude`, which is the skill-construction guide the source repository philosophy lines up with operationally. @@ -137,7 +138,7 @@ The remaining work is intentionally split into two stages: - `now required before MVP claim`: `#5`, `#6`, `#7`, and `#13` - `later, after MVP floor`: `#14` and `#12` -The detailed rationale lives in `docs/ai-guardrails/mvp-readiness.md`. Future sessions should start there before expanding issue scope. +The detailed rationale lives in `docs/ai-guardrails/mvp-readiness.md`. The `#13` boundary and intentional deferrals are fixed in `docs/ai-guardrails/adr/006-plugin-hardening-floor.md`. Future sessions should start there before expanding issue scope. ## Tracking @@ -173,6 +174,7 @@ When continuing this work in future sessions: - MVP readiness: `docs/ai-guardrails/mvp-readiness.md` - Migration inventory: `docs/ai-guardrails/migration/` - Scenario tests: `packages/opencode/test/scenario/` +- Scripted replays: `packages/opencode/test/scenario/replay.ts` and `packages/opencode/test/scenario/harness.ts` - Thin distribution package: `packages/guardrails/` ## Primary references diff --git a/docs/ai-guardrails/adr/002-provider-admission-lanes.md b/docs/ai-guardrails/adr/002-provider-admission-lanes.md index 39a9e2d05291..ebeb22f53b7f 100644 --- a/docs/ai-guardrails/adr/002-provider-admission-lanes.md +++ b/docs/ai-guardrails/adr/002-provider-admission-lanes.md @@ -26,7 +26,7 @@ This follows the same philosophy imported from `claude-code-skills` epic `#130` Adopt three admission lanes: -1. `zai` and `openai` are the standard confidential-code lane. +1. `zai`, `zai-coding-plan`, and `openai` are the standard confidential-code lane. 2. `openrouter` is admitted only as a separate evaluation lane. 3. OpenRouter-backed evaluation stays on an explicit `provider-eval` agent and command instead of widening the default implementation lane. @@ -39,8 +39,9 @@ The policy is implemented in two layers: ### Standard lane -- admitted providers: `zai`, `openai` +- admitted providers: `zai`, `zai-coding-plan`, `openai` - admitted models are pinned through provider allowlists +- `zai-coding-plan` is exposed as its own provider because Z.AI's official OpenCode guidance instructs Coding Plan subscribers to select `Z.AI Coding Plan` rather than overloading the general `Z.AI` provider - preview, free, and non-approved variants are excluded by default ### Evaluation lane diff --git a/docs/ai-guardrails/adr/005-scripted-scenario-replays.md b/docs/ai-guardrails/adr/005-scripted-scenario-replays.md new file mode 100644 index 000000000000..e718c1b58e2c --- /dev/null +++ b/docs/ai-guardrails/adr/005-scripted-scenario-replays.md @@ -0,0 +1,45 @@ +# ADR 005: Scripted Scenario Replays + +- Status: Accepted +- Date: 2026-04-03 + +## Context + +`#5` and `#6` added guarded commands, subagents, and provider lanes. The existing scenario suite proved config and plugin slices, but it still left an MVP gap: the guarded workflows were defined in files yet not replayed end to end. + +Epic `#130` from the source Claude harness makes the requirement explicit: implemented behavior is not complete until it is shown to fire in the real runtime path. + +The local runtime also needs a way to grow future release-sensitive checks without creating a deep OpenCode fork or relying on live third-party APIs inside tests. + +## Decision + +Adopt scripted scenario replays for guardrail workflow coverage: + +- run scenario tests under `packages/opencode/test/scenario/` +- boot the packaged guardrail profile through the real config, command, agent, plugin, and session layers +- replace network LLM calls with a deterministic fake LLM server +- script expected model replies as replay steps so guarded workflows can be re-run exactly +- assert on runtime artifacts that matter to MVP claims: session messages, task tool output, provider routing, and guardrail state/log files + +The replay layer is intentionally small. It is not a second runtime. It is a deterministic driver for the existing runtime path. + +## Consequences + +### Positive + +- workflow commands are proven through the same session path users invoke +- provider-lane regressions can be caught without hitting live vendor APIs +- future issues can add replays for release gates, review freshness, or share/server restrictions without forking core runtime behavior + +### Negative + +- replay scripts must stay aligned with upstream session semantics +- fake LLM responses prove routing and workflow mechanics, not model quality + +## Evidence + +- OpenCode commands: https://opencode.ai/docs/commands +- OpenCode plugins: https://opencode.ai/docs/plugins +- OpenCode config: https://opencode.ai/docs/config +- Claude Code hooks guide: https://docs.anthropic.com/en/docs/claude-code/hooks-guide +- Anthropic skill guide PDF: https://resources.anthropic.com/hubfs/The-Complete-Guide-to-Building-Skill-for-Claude.pdf diff --git a/docs/ai-guardrails/adr/006-plugin-hardening-floor.md b/docs/ai-guardrails/adr/006-plugin-hardening-floor.md new file mode 100644 index 000000000000..906ce0750231 --- /dev/null +++ b/docs/ai-guardrails/adr/006-plugin-hardening-floor.md @@ -0,0 +1,70 @@ +# ADR 006: Plugin Hardening Floor For MVP + +## Status + +Accepted + +## Context + +Issue `#13` exists because the first plugin MVP proved the OpenCode hook surface, but it did not yet migrate enough of the high-value fast-feedback guardrails to support an MVP claim. + +The source philosophy from epic `#130`, the source README, the harness-engineering references, and the Claude skill guide all point to the same rule: + +- mechanism before prose +- fastest reliable feedback layer first +- "implemented" is not "working" without runtime proof + +That means the next plugin wave should add only the local-runtime behaviors that materially strengthen the guarded workflows now, while refusing to quietly absorb later operational hardening. + +## Decision + +The MVP floor for plugin hardening in this repo is: + +1. protect runtime-owned and policy-protected files from local mutation +2. block obvious version-baseline regressions before edit or write completion +3. track source-read budget and block further source edits once the budget is exceeded +4. record fact-check freshness and review freshness as local runtime state +5. inject that state into `/review`, `/ship`, `/handoff`, and compaction carry-over +6. scenario-test the above behavior in `packages/opencode/test/scenario/guardrails.test.ts` + +This floor is implemented in `packages/guardrails/profile/plugins/guardrail.ts`. + +## Included In MVP Floor + +These behaviors are part of the MVP claim: + +- provider lane enforcement remains declarative and independent from plugin hardening +- protected runtime/config mutation is blocked at `tool.execute.before` +- version downgrade and `:latest` pin regressions are blocked at `tool.execute.before` +- source-read budget is tracked in plugin state and blocks further source edits once exceeded +- successful `read`, `webfetch`, Context7, selected CLI checks, `edit`, `write`, and `task` completion update local guardrail state +- `/review`, `/ship`, `/handoff`, and session compaction consume that state so guarded workflows can report stale or missing checks explicitly + +## Explicit Deferrals + +These items are intentionally not part of the MVP floor: + +- authoritative merge/review freshness enforcement in GitHub or CI +- post-merge and deployment verification +- Claude-specific local hook deployment integrity +- broader structural reminders that need more repository-specific tuning +- a separate `post-lint-format` plugin clone when OpenCode already formats on `edit` and `write` +- stronger fact-check-before-edit or GitHub-write blocking until the workflow and source-of-truth state are better defined + +Those items belong to later maturity work such as `#14`, not to the MVP floor. + +## Consequences + +- the thin distribution stays upstream-friendly because enforcement remains in the packaged profile/plugin layer +- the guarded workflows now have file-backed state rather than prompt-only expectations +- the repo has a written boundary for what `#13` must do now versus what later issues should carry + +## Sources + +- `docs/ai-guardrails/README.md` +- `docs/ai-guardrails/mvp-readiness.md` +- `docs/ai-guardrails/migration/claude-code-skills-inventory.md` +- `claude-code-skills` README +- `claude-code-skills` epic `#130` +- `claude-code-skills/docs/references/harness-engineering-best-practices-2026.md` +- Anthropic `The Complete Guide to Building Skills for Claude` diff --git a/packages/guardrails/README.md b/packages/guardrails/README.md index 6f0f4139e6fc..e5c475a262bd 100644 --- a/packages/guardrails/README.md +++ b/packages/guardrails/README.md @@ -47,7 +47,7 @@ Current contents focus on the first thin-distribution slice: - packaged custom config dir profile - packaged plugin for runtime guardrail hooks - guarded `implement` and `review` agents plus packaged `/implement`, `/review`, `/ship`, and `/handoff` workflow commands -- declarative provider admission policy for `zai`, `openai`, and the isolated OpenRouter evaluation lane +- declarative provider admission policy for `zai`, `zai-coding-plan`, `openai`, and the isolated OpenRouter evaluation lane - scenario coverage for managed config precedence, project-local asset compatibility, plugin behavior, and workflow safety defaults Planned next slices are tracked in the fork: @@ -73,7 +73,7 @@ It respects an existing `OPENCODE_CONFIG_DIR` so project- or environment-specifi The packaged profile defaults to the `implement` agent. Review and release-readiness work should run through the packaged `/review`, `/ship`, and `/handoff` commands so the workflow stays read-only at the gate layer. -Provider admission is also packaged here. Standard confidential-code work is admitted on the `zai` and `openai` lane. OpenRouter-backed candidates are available only through the dedicated `provider-eval` lane so evaluation traffic does not silently become the default implementation path. +Provider admission is also packaged here. Standard confidential-code work is admitted on the `zai`, `zai-coding-plan`, and `openai` lane. `zai-coding-plan` is kept as a separate provider because Z.AI's official OpenCode guide tells Coding Plan subscribers to select `Z.AI Coding Plan` explicitly. OpenRouter-backed candidates are available only through the dedicated `provider-eval` lane so evaluation traffic does not silently become the default implementation path. ## Managed deployment diff --git a/packages/guardrails/bin/opencode-guardrails b/packages/guardrails/bin/opencode-guardrails index 9293865e887f..215fa26c5612 100755 --- a/packages/guardrails/bin/opencode-guardrails +++ b/packages/guardrails/bin/opencode-guardrails @@ -1,8 +1,10 @@ #!/usr/bin/env node -const child = require("child_process") -const fs = require("fs") -const path = require("path") +import { spawnSync } from "node:child_process" +import fs from "node:fs" +import path from "node:path" +import { fileURLToPath } from "node:url" +import { parseEnv } from "node:util" function fail(msg) { console.error(msg) @@ -10,7 +12,7 @@ function fail(msg) { } function bin() { - const file = require.resolve("opencode/package.json") + const file = path.resolve(path.dirname(fileURLToPath(import.meta.url)), "..", "..", "opencode", "package.json") const root = path.dirname(file) const json = JSON.parse(fs.readFileSync(file, "utf8")) const rel = typeof json.bin === "string" ? json.bin : json.bin?.opencode @@ -18,10 +20,31 @@ function bin() { return path.resolve(root, rel) } -const dir = path.resolve(__dirname, "..", "profile") +function env(dir) { + let cur = dir + for (;;) { + const file = path.join(cur, ".env") + if (fs.existsSync(file)) return file + const parent = path.dirname(cur) + if (parent === cur) return + cur = parent + } +} + +function load(dir) { + const file = env(dir) + if (!file) return + const data = parseEnv(fs.readFileSync(file, "utf8").replace(/^\s*export\s+/gm, "")) + for (const [key, val] of Object.entries(data)) { + process.env[key] ??= val + } +} + +const dir = path.resolve(path.dirname(fileURLToPath(import.meta.url)), "..", "profile") +load(process.cwd()) process.env.OPENCODE_CONFIG_DIR ||= dir -const out = child.spawnSync(bin(), process.argv.slice(2), { +const out = spawnSync(bin(), process.argv.slice(2), { stdio: "inherit", env: process.env, }) diff --git a/packages/guardrails/managed/opencode.json b/packages/guardrails/managed/opencode.json index 73010d028600..5806b2bef714 100644 --- a/packages/guardrails/managed/opencode.json +++ b/packages/guardrails/managed/opencode.json @@ -2,6 +2,7 @@ "$schema": "https://opencode.ai/config.json", "enabled_providers": [ "zai", + "zai-coding-plan", "openai", "openrouter" ], @@ -18,8 +19,32 @@ "glm-4.5-air" ] }, + "zai-coding-plan": { + "whitelist": [ + "glm-4.5", + "glm-4.5-air", + "glm-4.5-flash", + "glm-4.5v", + "glm-4.6", + "glm-4.6v", + "glm-4.7", + "glm-4.7-flash", + "glm-4.7-flashx", + "glm-5", + "glm-5-turbo", + "glm-5.1" + ] + }, "openai": { "whitelist": [ + "gpt-5.4", + "gpt-5.4-mini", + "gpt-5.3-codex", + "gpt-5.2", + "gpt-5.2-codex", + "gpt-5.1-codex", + "gpt-5.1-codex-mini", + "gpt-5.1-codex-max", "gpt-5", "gpt-5-mini", "gpt-5-nano", @@ -28,10 +53,22 @@ }, "openrouter": { "whitelist": [ - "openai/gpt-5", - "openai/gpt-5-mini", + "anthropic/claude-haiku-4.5", + "anthropic/claude-opus-4.5", + "anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.5", - "google/gemini-2.5-pro" + "anthropic/claude-sonnet-4.6", + "google/gemini-2.5-flash", + "google/gemini-2.5-pro", + "minimax/minimax-m2.1", + "minimax/minimax-m2.5", + "moonshotai/kimi-k2.5", + "openai/gpt-5.2", + "openai/gpt-5.2-codex", + "openai/gpt-5.3-codex", + "openai/gpt-5.4", + "openai/gpt-5.4-mini", + "qwen/qwen3-coder" ] } }, diff --git a/packages/guardrails/profile/agents/provider-eval.md b/packages/guardrails/profile/agents/provider-eval.md index c117300e3480..5ac1d02f8e54 100644 --- a/packages/guardrails/profile/agents/provider-eval.md +++ b/packages/guardrails/profile/agents/provider-eval.md @@ -1,7 +1,7 @@ --- description: Evaluate admitted OpenRouter-backed candidates without widening the default confidential-code lane. mode: subagent -model: openrouter/openai/gpt-5-mini +model: openrouter/openai/gpt-5.4-mini permission: "*": deny read: allow diff --git a/packages/guardrails/profile/opencode.json b/packages/guardrails/profile/opencode.json index 73bbbbf3b596..6cf165d34607 100644 --- a/packages/guardrails/profile/opencode.json +++ b/packages/guardrails/profile/opencode.json @@ -3,6 +3,7 @@ "default_agent": "implement", "enabled_providers": [ "zai", + "zai-coding-plan", "openai", "openrouter" ], @@ -19,8 +20,32 @@ "glm-4.5-air" ] }, + "zai-coding-plan": { + "whitelist": [ + "glm-4.5", + "glm-4.5-air", + "glm-4.5-flash", + "glm-4.5v", + "glm-4.6", + "glm-4.6v", + "glm-4.7", + "glm-4.7-flash", + "glm-4.7-flashx", + "glm-5", + "glm-5-turbo", + "glm-5.1" + ] + }, "openai": { "whitelist": [ + "gpt-5.4", + "gpt-5.4-mini", + "gpt-5.3-codex", + "gpt-5.2", + "gpt-5.2-codex", + "gpt-5.1-codex", + "gpt-5.1-codex-mini", + "gpt-5.1-codex-max", "gpt-5", "gpt-5-mini", "gpt-5-nano", @@ -29,10 +54,22 @@ }, "openrouter": { "whitelist": [ - "openai/gpt-5", - "openai/gpt-5-mini", + "anthropic/claude-haiku-4.5", + "anthropic/claude-opus-4.5", + "anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.5", - "google/gemini-2.5-pro" + "anthropic/claude-sonnet-4.6", + "google/gemini-2.5-flash", + "google/gemini-2.5-pro", + "minimax/minimax-m2.1", + "minimax/minimax-m2.5", + "moonshotai/kimi-k2.5", + "openai/gpt-5.2", + "openai/gpt-5.2-codex", + "openai/gpt-5.3-codex", + "openai/gpt-5.4", + "openai/gpt-5.4-mini", + "qwen/qwen3-coder" ] } }, diff --git a/packages/guardrails/profile/plugins/guardrail.ts b/packages/guardrails/profile/plugins/guardrail.ts index b295920a0be0..70293919a016 100644 --- a/packages/guardrails/profile/plugins/guardrail.ts +++ b/packages/guardrails/profile/plugins/guardrail.ts @@ -37,6 +37,56 @@ const mut = [ />/, ] +const src = new Set([ + ".ts", + ".tsx", + ".js", + ".jsx", + ".py", + ".go", + ".rs", + ".swift", + ".kt", + ".java", + ".rb", + ".php", + ".vue", + ".svelte", + ".css", + ".scss", + ".sql", + ".prisma", + ".graphql", + ".sh", +]) + +const paid: Record> = { + "zai-coding-plan": new Set([ + "glm-4.5", + "glm-4.5-air", + "glm-4.5-flash", + "glm-4.5v", + "glm-4.6", + "glm-4.6v", + "glm-4.7", + "glm-4.7-flash", + "glm-4.7-flashx", + "glm-5", + "glm-5-turbo", + "glm-5.1", + ]), + openai: new Set([ + "gpt-5.1-codex", + "gpt-5.1-codex-max", + "gpt-5.1-codex-mini", + "gpt-5.2", + "gpt-5.2-codex", + "gpt-5.3-codex", + "gpt-5.4", + "gpt-5.4-mini", + ]), +} + function norm(file: string) { return path.resolve(file).replaceAll("\\", "/") } @@ -52,6 +102,10 @@ function has(file: string, list: RegExp[]) { return list.some((item) => item.test(file)) } +function ext(file: string) { + return path.extname(file).toLowerCase() +} + function stash(file: string) { return Bun.file(file) .json() @@ -84,7 +138,21 @@ function list(data: unknown) { return Array.isArray(data) ? data.filter((item): item is string => typeof item === "string" && item !== "") : [] } +function num(data: unknown) { + return typeof data === "number" && Number.isFinite(data) ? data : 0 +} + +function flag(data: unknown) { + return data === true +} + +function str(data: unknown) { + return typeof data === "string" ? data : "" +} + function free(data: { + id?: unknown + providerID?: unknown cost?: { input?: number output?: number @@ -95,19 +163,40 @@ function free(data: { const outCost = data.cost?.output ?? 0 const readCost = data.cost?.cache?.read ?? 0 const writeCost = data.cost?.cache?.write ?? 0 - return inCost === 0 && outCost === 0 && readCost === 0 && writeCost === 0 + if (!(inCost === 0 && outCost === 0 && readCost === 0 && writeCost === 0)) return false + const ids = paid[str(data.providerID)] + return !(ids && ids.has(str(data.id))) } function preview(data: { id?: unknown status?: unknown }) { - const id = typeof data.id === "string" ? data.id : "" - const status = typeof data.status === "string" ? data.status : "" + const id = str(data.id) + const status = str(data.status) if (status && status !== "active") return true return /(preview|alpha|beta|exp|experimental|:free\b|\bfree\b)/i.test(id) } +function vers(text: string) { + return [...text.matchAll(/\bv?\d+\.\d+\.\d+\b/g)].map((item) => item[0]).slice(0, 8) +} + +function semver(text: string) { + const hit = text.match(/^v?(\d+)\.(\d+)\.(\d+)$/) + if (!hit) return + return hit.slice(1).map((item) => Number(item)) +} + +function cmp(left: string, right: string) { + const a = semver(left) + const b = semver(right) + if (!a || !b) return 0 + if (a[0] !== b[0]) return a[0] - b[0] + if (a[1] !== b[1]) return a[1] - b[1] + return a[2] - b[2] +} + export default async function guardrail(input: { directory: string worktree: string @@ -136,8 +225,8 @@ export default async function guardrail(input: { function note(props: Record | undefined) { return { - sessionID: typeof props?.sessionID === "string" ? props.sessionID : undefined, - permission: typeof props?.permission === "string" ? props.permission : undefined, + sessionID: str(props?.sessionID) || undefined, + permission: str(props?.permission) || undefined, patterns: Array.isArray(props?.patterns) ? props.patterns : undefined, } } @@ -146,6 +235,66 @@ export default async function guardrail(input: { return rel(input.worktree, file).startsWith(".opencode/guardrails/") } + function code(file: string) { + const item = rel(input.worktree, file) + if (hidden(file)) return false + if (item === "AGENTS.md") return false + if (item.startsWith(".claude/")) return false + if (item.startsWith(".opencode/")) return false + if (item.startsWith("docs/")) return false + if (item.includes("/docs/")) return false + if (item.startsWith("node_modules/")) return false + if (item.includes("/node_modules/")) return false + if (item.startsWith("tmp/")) return false + if (item.includes("/tmp/")) return false + return src.has(ext(item)) + } + + function fact(file: string) { + const item = rel(input.worktree, file) + if (hidden(file)) return false + if (code(file)) return true + if (/(^|\/)(README|AGENTS)\.md$/i.test(item)) return true + if (item.startsWith("docs/") || item.includes("/docs/")) return true + if (item.startsWith("hooks/") || item.includes("/hooks/")) return true + if (item.startsWith("scripts/") || item.includes("/scripts/")) return true + if (item.startsWith("src/") || item.includes("/src/")) return true + return [".md", ".mdx", ".json", ".yaml", ".yml", ".toml"].includes(ext(item)) + } + + function stale(data: Record, key: "edit_count_since_check" | "edits_since_review") { + return num(data[key]) > 0 + } + + function factLine(data: Record) { + if (!flag(data.factchecked)) return "missing" + const source = str(data.factcheck_source) || "unknown" + const at = str(data.factcheck_at) || "unknown" + if (!stale(data, "edit_count_since_check")) return `fresh via ${source} at ${at}` + return `stale after ${num(data.edit_count_since_check)} edit(s) since ${source} at ${at}` + } + + function reviewLine(data: Record) { + if (!flag(data.reviewed)) return "missing" + const at = str(data.review_at) || "unknown" + if (!stale(data, "edits_since_review")) return `fresh at ${at}` + return `stale after ${num(data.edits_since_review)} edit(s) since ${at}` + } + + function compact(data: Record) { + const block = str(data.last_block) || "none" + const reason = str(data.last_reason) + return [ + "Guardrail runtime state:", + `- unique source reads: ${num(data.read_count)}`, + `- edit/write count: ${num(data.edit_count)}`, + `- fact-check: ${factLine(data)}`, + `- review state: ${reviewLine(data)}`, + `- last block: ${block}${reason ? ` (${reason})` : ""}`, + "Treat missing or stale fact-check/review state as an explicit gate.", + ].join("\n") + } + function deny(file: string, kind: "read" | "edit") { const item = rel(input.worktree, file) if (kind === "read" && has(item, sec)) return "secret material is outside the allowed read surface" @@ -153,6 +302,36 @@ export default async function guardrail(input: { if (kind === "edit" && has(item, cfg)) return "linter or formatter configuration is policy-protected" } + function baseline(old: string, next: string) { + if (/:latest\b/i.test(old) && vers(next).length > 0) { + return ":latest pin requires ADR-backed compatibility verification" + } + const left = vers(old) + const right = vers(next) + if (!left.length || !right.length) return + if (left.length !== right.length || left.length > 3) return + for (let i = 0; i < left.length; i++) { + if (cmp(right[i], left[i]) < 0) return `version baseline regression ${left[i]} -> ${right[i]}` + } + } + + async function version(args: Record) { + const file = pick(args) + if (!file || hidden(file)) return + if (typeof args.oldString === "string" && typeof args.newString === "string") { + return baseline(args.oldString, args.newString) + } + if (typeof args.content !== "string") return + const prev = await Bun.file(file).text().catch(() => "") + if (!prev) return + return baseline(prev, args.content) + } + + async function budget() { + const data = await stash(state) + return num(data.read_count) + } + function gate(data: { agent?: string model?: { @@ -166,8 +345,8 @@ export default async function guardrail(input: { } } }) { - const provider = typeof data.model?.providerID === "string" ? data.model.providerID : "" - const agent = typeof data.agent === "string" ? data.agent : "" + const provider = str(data.model?.providerID) + const agent = str(data.agent) if (!provider) return if (evals.has(provider) && agent !== evalAgent) { @@ -178,7 +357,7 @@ export default async function guardrail(input: { } const ids = allow[provider] - const model = typeof data.model?.id === "string" ? data.model.id : "" + const model = str(data.model?.id) if (ids?.size && model && !ids.has(model)) { return `${provider}/${model} is not admitted by provider policy` } @@ -207,6 +386,19 @@ export default async function guardrail(input: { await mark({ last_session: event.properties?.sessionID, last_event: event.type, + read_files: [], + read_count: 0, + edited_files: [], + edit_count: 0, + factchecked: false, + factcheck_source: "", + factcheck_at: "", + edit_count_since_check: 0, + reviewed: false, + review_at: "", + edits_since_review: 0, + last_block: "", + last_reason: "", }) } if (event.type === "permission.asked") { @@ -230,9 +422,25 @@ export default async function guardrail(input: { const file = pick(out.args ?? item.args) if (file && (item.tool === "read" || item.tool === "edit" || item.tool === "write")) { const err = deny(file, item.tool === "read" ? "read" : "edit") - if (!err) return - await mark({ last_block: item.tool, last_file: rel(input.worktree, file), last_reason: err }) - throw new Error(text(err)) + if (err) { + await mark({ last_block: item.tool, last_file: rel(input.worktree, file), last_reason: err }) + throw new Error(text(err)) + } + } + if (item.tool === "edit" || item.tool === "write") { + const err = await version(out.args ?? {}) + if (err) { + await mark({ last_block: item.tool, last_file: file ? rel(input.worktree, file) : "", last_reason: err }) + throw new Error(text(err)) + } + } + if ((item.tool === "edit" || item.tool === "write") && file && code(file)) { + const count = await budget() + if (count >= 4) { + const err = `context budget exceeded after ${count} source reads; narrow scope or delegate before editing` + await mark({ last_block: item.tool, last_file: rel(input.worktree, file), last_reason: err }) + throw new Error(text(err)) + } } if (item.tool === "bash") { const cmd = typeof out.args?.command === "string" ? out.args.command : "" @@ -248,6 +456,95 @@ export default async function guardrail(input: { throw new Error(text("protected runtime or config mutation")) } }, + "tool.execute.after": async ( + item: { tool: string; args?: Record }, + _out: { title: string; output: string; metadata: Record }, + ) => { + const now = new Date().toISOString() + const file = pick(item.args) + const data = await stash(state) + + if (item.tool === "read" && file) { + if (code(file)) { + const seen = list(data.read_files) + const next = seen.includes(rel(input.worktree, file)) ? seen : [...seen, rel(input.worktree, file)] + await mark({ + read_files: next, + read_count: next.length, + last_read: rel(input.worktree, file), + }) + } + if (fact(file)) { + await mark({ + factchecked: true, + factcheck_source: "DocRead", + factcheck_at: now, + edit_count_since_check: 0, + }) + } + } + + if (item.tool === "webfetch" || item.tool.startsWith("mcp__context7__")) { + await mark({ + factchecked: true, + factcheck_source: item.tool === "webfetch" ? "WebFetch" : "Context7", + factcheck_at: now, + edit_count_since_check: 0, + }) + } + + if (item.tool === "bash") { + const cmd = typeof item.args?.command === "string" ? item.args.command : "" + if (/(^|&&|\|\||;)\s*(gcloud|kubectl|aws)\s+/i.test(cmd)) { + await mark({ + factchecked: true, + factcheck_source: "CLI", + factcheck_at: now, + edit_count_since_check: 0, + }) + } + } + + if ((item.tool === "edit" || item.tool === "write") && file) { + const seen = list(data.edited_files) + const next = seen.includes(rel(input.worktree, file)) ? seen : [...seen, rel(input.worktree, file)] + await mark({ + edited_files: next, + edit_count: num(data.edit_count) + 1, + edit_count_since_check: num(data.edit_count_since_check) + 1, + edits_since_review: num(data.edits_since_review) + 1, + last_edit: rel(input.worktree, file), + }) + } + + if (item.tool === "task") { + const cmd = typeof item.args?.command === "string" ? item.args.command : "" + const agent = typeof item.args?.subagent_type === "string" ? item.args.subagent_type : "" + if (cmd === "review" || agent.includes("review")) { + await mark({ + reviewed: true, + review_at: now, + review_agent: agent, + edits_since_review: 0, + }) + } + } + }, + "command.execute.before": async ( + item: { command: string; sessionID: string; arguments: string }, + out: { + parts: { + type?: string + prompt?: string + }[] + }, + ) => { + if (!["review", "ship", "handoff"].includes(item.command)) return + const data = await stash(state) + const part = out.parts.find((item) => item.type === "subtask" && typeof item.prompt === "string") + if (!part?.prompt) return + part.prompt = `${part.prompt}\n\n${compact(data)}` + }, "shell.env": async (_item: { cwd: string }, out: { env: Record }) => { out.env.OPENCODE_GUARDRAIL_MODE = mode out.env.OPENCODE_GUARDRAIL_ROOT = root @@ -295,8 +592,12 @@ export default async function guardrail(input: { [ `Guardrail mode: ${mode}.`, `Preserve policy state from ${rel(input.worktree, state)} when handing work to the next agent.`, - `Last guardrail event: ${typeof data.last_event === "string" ? data.last_event : "none"}.`, - `Last guardrail block: ${typeof data.last_block === "string" ? data.last_block : "none"}.`, + `Last guardrail event: ${str(data.last_event) || "none"}.`, + `Last guardrail block: ${str(data.last_block) || "none"}.`, + `Unique source reads: ${num(data.read_count)}.`, + `Edit/write count: ${num(data.edit_count)}.`, + `Fact-check state: ${factLine(data)}.`, + `Review state: ${reviewLine(data)}.`, ].join(" "), ) }, diff --git a/packages/opencode/bin/opencode b/packages/opencode/bin/opencode index a7674ce2f875..2526b0f76a05 100755 --- a/packages/opencode/bin/opencode +++ b/packages/opencode/bin/opencode @@ -1,9 +1,10 @@ #!/usr/bin/env node -const childProcess = require("child_process") -const fs = require("fs") -const path = require("path") -const os = require("os") +import * as childProcess from "node:child_process" +import * as fs from "node:fs" +import * as path from "node:path" +import * as os from "node:os" +import { fileURLToPath } from "node:url" function run(target) { const result = childProcess.spawnSync(target, process.argv.slice(2), { @@ -17,12 +18,29 @@ function run(target) { process.exit(code) } +function bun(dir) { + const result = childProcess.spawnSync( + "bun", + ["run", "--conditions=browser", "./src/index.ts", ...process.argv.slice(2)], + { + stdio: "inherit", + cwd: dir, + }, + ) + if (result.error) { + console.error(result.error.message) + process.exit(1) + } + const code = typeof result.status === "number" ? result.status : 0 + process.exit(code) +} + const envPath = process.env.OPENCODE_BIN_PATH if (envPath) { run(envPath) } -const scriptPath = fs.realpathSync(__filename) +const scriptPath = fs.realpathSync(fileURLToPath(import.meta.url)) const scriptDir = path.dirname(scriptPath) // @@ -168,12 +186,7 @@ function findBinary(startDir) { const resolved = findBinary(scriptDir) if (!resolved) { - console.error( - "It seems that your package manager failed to install the right version of the opencode CLI for your platform. You can try manually installing " + - names.map((n) => `\"${n}\"`).join(" or ") + - " package", - ) - process.exit(1) + bun(path.resolve(scriptDir, "..")) } run(resolved) diff --git a/packages/opencode/test/scenario/guardrails.test.ts b/packages/opencode/test/scenario/guardrails.test.ts index a789a5f95d7d..edbdaeca69fc 100644 --- a/packages/opencode/test/scenario/guardrails.test.ts +++ b/packages/opencode/test/scenario/guardrails.test.ts @@ -1,7 +1,9 @@ import { afterEach, expect, test } from "bun:test" import fs from "fs/promises" import path from "path" +import { Effect } from "effect" import { Agent } from "../../src/agent/agent" +import { Auth } from "../../src/auth" import { Command } from "../../src/command" import { Config } from "../../src/config/config" import { Env } from "../../src/env" @@ -13,6 +15,8 @@ import { ProviderID } from "../../src/provider/schema" import { Skill } from "../../src/skill" import { Filesystem } from "../../src/util/filesystem" import { tmpdir } from "../fixture/fixture" +import { assertReplay, it, run } from "./harness" +import { replays } from "./replay" const managed = process.env.OPENCODE_TEST_MANAGED_CONFIG_DIR! const profile = path.resolve(import.meta.dir, "../../../guardrails/profile") @@ -289,24 +293,85 @@ test("guardrail profile enforces provider admission lanes", async () => { const evalAgent = await Agent.get("provider-eval") const openrouter = providers[ProviderID.openrouter] const zai = providers[ProviderID.make("zai")] + const plan = providers[ProviderID.make("zai-coding-plan")] const openai = providers[ProviderID.openai] + const zaiModels = Object.keys(zai.models).sort() + const planModels = Object.keys(plan.models).sort() + const openaiModels = Object.keys(openai.models).sort() - expect(cfg.enabled_providers).toEqual(["zai", "openai", "openrouter"]) + expect(cfg.enabled_providers).toEqual(["zai", "zai-coding-plan", "openai", "openrouter"]) expect(zai).toBeDefined() + expect(plan).toBeDefined() expect(openai).toBeDefined() expect(openrouter).toBeDefined() - expect(Object.keys(zai.models).sort()).toEqual(["glm-4.5", "glm-4.5-air", "glm-5"]) - expect(Object.keys(openai.models).sort()).toEqual(["gpt-4.1", "gpt-5", "gpt-5-mini", "gpt-5-nano"]) + expect(zaiModels).toEqual(["glm-4.5", "glm-4.5-air", "glm-5"]) + for (const item of [ + "glm-4.5", + "glm-4.5-air", + "glm-4.5-flash", + "glm-4.5v", + "glm-4.6", + "glm-4.6v", + "glm-4.7", + "glm-4.7-flash", + "glm-4.7-flashx", + "glm-5", + "glm-5-turbo", + "glm-5.1", + ]) { + expect(planModels).toContain(item) + } + expect(openaiModels).toEqual( + expect.arrayContaining([ + "gpt-4.1", + "gpt-5", + "gpt-5-mini", + "gpt-5-nano", + "gpt-5.1-codex", + "gpt-5.1-codex-max", + "gpt-5.1-codex-mini", + "gpt-5.2", + "gpt-5.2-codex", + "gpt-5.3-codex", + "gpt-5.4", + ]), + ) expect(Object.keys(openrouter.models).sort()).toEqual([ + "anthropic/claude-haiku-4.5", + "anthropic/claude-opus-4.5", + "anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.5", + "anthropic/claude-sonnet-4.6", + "google/gemini-2.5-flash", "google/gemini-2.5-pro", - "openai/gpt-5", - "openai/gpt-5-mini", + "minimax/minimax-m2.1", + "minimax/minimax-m2.5", + "moonshotai/kimi-k2.5", + "openai/gpt-5.2", + "openai/gpt-5.2-codex", + "openai/gpt-5.3-codex", + "openai/gpt-5.4", + "openai/gpt-5.4-mini", + "qwen/qwen3-coder", ]) + expect(plan.models["glm-5.1"]?.api.id).toBe("glm-5.1") + expect(plan.models["glm-5.1"]?.api.url).toBe("https://api.z.ai/api/coding/paas/v4") expect(evalAgent?.mode).toBe("subagent") expect(cmds.some((item) => item.name === "provider-eval" && item.agent === "provider-eval")).toBe(true) - const evalModel = openrouter.models["openai/gpt-5-mini"] + await expect( + Plugin.trigger( + "chat.params", + { + sessionID: "session_test", + agent: "implement", + model: plan.models["glm-5.1"], + }, + { temperature: undefined, topP: undefined, topK: undefined, options: {} }, + ), + ).resolves.toEqual({ temperature: undefined, topP: undefined, topK: undefined, options: {} }) + + const evalModel = openrouter.models["openai/gpt-5.4-mini"] await expect( Plugin.trigger( @@ -340,17 +405,14 @@ test("guardrail profile enforces provider admission lanes", async () => { agent: "provider-eval", model: { ...evalModel, - id: "deepseek/deepseek-r1:free" as typeof evalModel.id, - cost: { - ...evalModel.cost, - input: 0, - output: 0, - cache: { - read: 0, - write: 0, + id: "google/gemini-3-pro-preview" as typeof evalModel.id, + cost: { + ...evalModel.cost, + input: 0.1, + output: 0.2, + cache: { read: 0, write: 0 }, }, }, - }, }, { temperature: undefined, topP: undefined, topK: undefined, options: {} }, ), @@ -370,7 +432,7 @@ test("guardrail profile enforces provider admission lanes", async () => { }, }) }) -}) +}, 15000) test("guardrail profile plugin injects shell env and blocks protected files", async () => { await withProfile(async () => { @@ -413,6 +475,247 @@ test("guardrail profile plugin injects shell env and blocks protected files", as }) }) +test("guardrail profile keeps OpenAI OAuth Codex models visible", async () => { + await withProfile(async () => { + await using tmp = await tmpdir({ + git: true, + init: async (dir) => { + await write(dir, "opencode.json", { + $schema: "https://opencode.ai/config.json", + share: "auto", + }) + }, + }) + + await Instance.provide({ + directory: tmp.path, + init: async () => { + await Auth.set( + "openai", + new Auth.Oauth({ + type: "oauth", + access: "test-openai-access", + refresh: "test-openai-refresh", + expires: Date.now() + 60_000, + }), + ) + }, + fn: async () => { + const providers = await Provider.list() + const openai = providers[ProviderID.openai] + const models = Object.keys(openai.models).sort() + + expect(openai).toBeDefined() + expect(models).toEqual( + expect.arrayContaining([ + "gpt-5.1-codex", + "gpt-5.1-codex-max", + "gpt-5.1-codex-mini", + "gpt-5.2", + "gpt-5.2-codex", + "gpt-5.3-codex", + "gpt-5.4", + ]), + ) + expect(openai.models["gpt-5.4"]?.cost.input).toBe(0) + await expect( + Plugin.trigger( + "chat.params", + { + sessionID: "session_test", + agent: "implement", + model: openai.models["gpt-5.4"], + }, + { temperature: undefined, topP: undefined, topK: undefined, options: {} }, + ), + ).resolves.toEqual({ temperature: undefined, topP: undefined, topK: undefined, options: {} }) + }, + }) + }) +}) + +test("guardrail profile plugin enforces version baselines and context budget", async () => { + await withProfile(async () => { + await using tmp = await tmpdir({ + git: true, + init: async (dir) => { + await fs.mkdir(path.join(dir, "src"), { recursive: true }) + await Bun.write(path.join(dir, "package.json"), JSON.stringify({ version: "1.2.3" }, null, 2)) + await Bun.write(path.join(dir, "Dockerfile"), "FROM app:latest\n") + await Bun.write(path.join(dir, "src", "a.ts"), "export const a = 1\n") + await Bun.write(path.join(dir, "src", "b.ts"), "export const b = 1\n") + await Bun.write(path.join(dir, "src", "c.ts"), "export const c = 1\n") + await Bun.write(path.join(dir, "src", "d.ts"), "export const d = 1\n") + await Bun.write(path.join(dir, "src", "e.ts"), "export const e = 1\n") + }, + }) + const files = guard(tmp.path) + + await Instance.provide({ + directory: tmp.path, + fn: async () => { + const hook = (await Plugin.list()).find((item) => typeof item.event === "function") + await hook?.event?.({ + event: { + type: "session.created", + properties: { + sessionID: "session_test", + }, + }, + } as any) + + await expect( + Plugin.trigger( + "tool.execute.before", + { tool: "edit", sessionID: "session_test", callID: "call_ver" }, + { + args: { + filePath: path.join(tmp.path, "package.json"), + oldString: `"version": "1.2.3"`, + newString: `"version": "1.1.9"`, + }, + }, + ), + ).rejects.toThrow("version baseline regression") + + await expect( + Plugin.trigger( + "tool.execute.before", + { tool: "edit", sessionID: "session_test", callID: "call_latest" }, + { + args: { + filePath: path.join(tmp.path, "Dockerfile"), + oldString: "FROM app:latest", + newString: "FROM app:v1.2.3", + }, + }, + ), + ).rejects.toThrow("ADR-backed compatibility verification") + + for (const file of ["src/a.ts", "src/b.ts", "src/c.ts", "src/d.ts"]) { + await Plugin.trigger( + "tool.execute.after", + { tool: "read", sessionID: "session_test", callID: file, args: { filePath: path.join(tmp.path, file) } }, + { title: "read", output: "", metadata: {} }, + ) + } + + const state = await Bun.file(files.state).json() + expect(state.read_count).toBe(4) + expect(state.read_files).toEqual(["src/a.ts", "src/b.ts", "src/c.ts", "src/d.ts"]) + + await expect( + Plugin.trigger( + "tool.execute.before", + { tool: "edit", sessionID: "session_test", callID: "call_budget" }, + { + args: { + filePath: path.join(tmp.path, "src", "e.ts"), + oldString: "export const e = 1", + newString: "export const e = 2", + }, + }, + ), + ).rejects.toThrow("context budget exceeded") + }, + }) + }) +}) + +test("guardrail profile plugin records factcheck and review freshness state", async () => { + await withProfile(async () => { + await using tmp = await tmpdir({ + git: true, + init: async (dir) => { + await fs.mkdir(path.join(dir, "docs"), { recursive: true }) + await fs.mkdir(path.join(dir, "src"), { recursive: true }) + await Bun.write(path.join(dir, "docs", "plan.md"), "# plan\n") + await Bun.write(path.join(dir, "src", "flow.ts"), "export const flow = 1\n") + }, + }) + const files = guard(tmp.path) + + await Instance.provide({ + directory: tmp.path, + fn: async () => { + const hook = (await Plugin.list()).find((item) => typeof item.event === "function") + await hook?.event?.({ + event: { + type: "session.created", + properties: { + sessionID: "session_test", + }, + }, + } as any) + + await Plugin.trigger( + "tool.execute.after", + { + tool: "read", + sessionID: "session_test", + callID: "call_doc", + args: { filePath: path.join(tmp.path, "docs", "plan.md") }, + }, + { title: "read", output: "", metadata: {} }, + ) + await Plugin.trigger( + "tool.execute.after", + { + tool: "write", + sessionID: "session_test", + callID: "call_write", + args: { filePath: path.join(tmp.path, "src", "flow.ts"), content: "export const flow = 2\n" }, + }, + { title: "write", output: "", metadata: {} }, + ) + await Plugin.trigger( + "tool.execute.after", + { + tool: "task", + sessionID: "session_test", + callID: "call_review", + args: { + command: "review", + subagent_type: "review", + }, + }, + { title: "review", output: "", metadata: {} }, + ) + await Plugin.trigger( + "tool.execute.after", + { + tool: "edit", + sessionID: "session_test", + callID: "call_edit", + args: { + filePath: path.join(tmp.path, "src", "flow.ts"), + oldString: "export const flow = 2", + newString: "export const flow = 3", + }, + }, + { title: "edit", output: "", metadata: {} }, + ) + + const state = await Bun.file(files.state).json() + const compact = await Plugin.trigger( + "experimental.session.compacting", + { sessionID: "session_test" }, + { context: [], prompt: undefined }, + ) + + expect(state.factchecked).toBe(true) + expect(state.factcheck_source).toBe("DocRead") + expect(state.edit_count).toBe(2) + expect(state.edit_count_since_check).toBe(2) + expect(state.reviewed).toBe(true) + expect(state.edits_since_review).toBe(1) + expect(compact.context.join("\n")).toContain("Fact-check state: stale after 2 edit(s)") + expect(compact.context.join("\n")).toContain("Review state: stale after 1 edit(s)") + }, + }) + }) +}) + test("guardrail profile plugin records lifecycle events and compaction context", async () => { await withProfile(async () => { await using tmp = await tmpdir({ git: true }) @@ -464,6 +767,9 @@ test("guardrail profile plugin records lifecycle events and compaction context", expect(log).toContain("\"type\":\"permission.asked\"") expect(log).toContain("\"type\":\"session.idle\"") expect(state.last_session).toBe("session_test") + expect(state.read_count).toBe(0) + expect(state.factchecked).toBe(false) + expect(state.reviewed).toBe(false) expect(state.last_permission).toBe("bash") expect(compact.context.join("\n")).toContain("Guardrail mode: enforced.") expect(compact.context.join("\n")).toContain(".opencode/guardrails/state.json") @@ -471,3 +777,13 @@ test("guardrail profile plugin records lifecycle events and compaction context", }) }) }) + +for (const replay of Object.values(replays)) { + it.live(`guardrail replay keeps ${replay.command} executable`, () => + run(replay).pipe( + Effect.map((data) => { + assertReplay(replay, data) + }), + ), + ) +} diff --git a/packages/opencode/test/scenario/harness.ts b/packages/opencode/test/scenario/harness.ts new file mode 100644 index 000000000000..87ddf048eef1 --- /dev/null +++ b/packages/opencode/test/scenario/harness.ts @@ -0,0 +1,287 @@ +import { NodeFileSystem } from "@effect/platform-node" +import { expect } from "bun:test" +import fs from "fs/promises" +import { Effect, Layer } from "effect" +import path from "path" +import { Agent as AgentSvc } from "../../src/agent/agent" +import { Auth } from "../../src/auth" +import { Bus } from "../../src/bus" +import { Command } from "../../src/command" +import { Config } from "../../src/config/config" +import { Env } from "../../src/env" +import { FileTime } from "../../src/file/time" +import { AppFileSystem } from "../../src/filesystem" +import { LSP } from "../../src/lsp" +import { MCP } from "../../src/mcp" +import { Permission } from "../../src/permission" +import { Plugin } from "../../src/plugin" +import { Provider as ProviderSvc } from "../../src/provider/provider" +import { Session } from "../../src/session" +import { MessageV2 } from "../../src/session/message-v2" +import { SessionCompaction } from "../../src/session/compaction" +import { Instruction } from "../../src/session/instruction" +import { LLM } from "../../src/session/llm" +import { SessionProcessor } from "../../src/session/processor" +import { SessionPrompt } from "../../src/session/prompt" +import { SessionStatus } from "../../src/session/status" +import { Snapshot } from "../../src/snapshot" +import { ToolRegistry } from "../../src/tool/registry" +import { Truncate } from "../../src/tool/truncate" +import * as CrossSpawnSpawner from "../../src/effect/cross-spawn-spawner" +import { Instance } from "../../src/project/instance" +import { provideTmpdirInstance } from "../fixture/fixture" +import { testEffect } from "../lib/effect" +import { TestLLMServer } from "../lib/llm-server" +import type { Replay } from "./replay" + +const profile = path.resolve(import.meta.dir, "../../../guardrails/profile") + +const mcp = Layer.succeed( + MCP.Service, + MCP.Service.of({ + status: () => Effect.succeed({}), + clients: () => Effect.succeed({}), + tools: () => Effect.succeed({}), + prompts: () => Effect.succeed({}), + resources: () => Effect.succeed({}), + add: () => Effect.succeed({ status: { status: "disabled" as const } }), + connect: () => Effect.void, + disconnect: () => Effect.void, + getPrompt: () => Effect.succeed(undefined), + readResource: () => Effect.succeed(undefined), + startAuth: () => Effect.die("unexpected MCP auth in scenario tests"), + authenticate: () => Effect.die("unexpected MCP auth in scenario tests"), + finishAuth: () => Effect.die("unexpected MCP auth in scenario tests"), + removeAuth: () => Effect.void, + supportsOAuth: () => Effect.succeed(false), + hasStoredTokens: () => Effect.succeed(false), + getAuthStatus: () => Effect.succeed("not_authenticated" as const), + }), +) + +const lsp = Layer.succeed( + LSP.Service, + LSP.Service.of({ + init: () => Effect.void, + status: () => Effect.succeed([]), + hasClients: () => Effect.succeed(false), + touchFile: () => Effect.void, + diagnostics: () => Effect.succeed({}), + hover: () => Effect.succeed(undefined), + definition: () => Effect.succeed([]), + references: () => Effect.succeed([]), + implementation: () => Effect.succeed([]), + documentSymbol: () => Effect.succeed([]), + workspaceSymbol: () => Effect.succeed([]), + prepareCallHierarchy: () => Effect.succeed([]), + incomingCalls: () => Effect.succeed([]), + outgoingCalls: () => Effect.succeed([]), + }), +) + +const filetime = Layer.succeed( + FileTime.Service, + FileTime.Service.of({ + read: () => Effect.void, + get: () => Effect.succeed(undefined), + assert: () => Effect.void, + withLock: (_file, fn) => Effect.promise(fn), + }), +) + +const status = SessionStatus.layer.pipe(Layer.provideMerge(Bus.layer)) +const infra = Layer.mergeAll(NodeFileSystem.layer, CrossSpawnSpawner.defaultLayer) + +function make() { + const deps = Layer.mergeAll( + Session.defaultLayer, + Snapshot.defaultLayer, + LLM.defaultLayer, + AgentSvc.defaultLayer, + Command.defaultLayer, + Permission.defaultLayer, + Plugin.defaultLayer, + Config.defaultLayer, + ProviderSvc.defaultLayer, + filetime, + lsp, + mcp, + AppFileSystem.defaultLayer, + status, + ).pipe(Layer.provideMerge(infra)) + const reg = ToolRegistry.layer.pipe(Layer.provideMerge(deps)) + const trunc = Truncate.layer.pipe(Layer.provideMerge(deps)) + const proc = SessionProcessor.layer.pipe(Layer.provideMerge(deps)) + const compact = SessionCompaction.layer.pipe(Layer.provideMerge(proc), Layer.provideMerge(deps)) + return Layer.mergeAll( + TestLLMServer.layer, + SessionPrompt.layer.pipe( + Layer.provideMerge(compact), + Layer.provideMerge(proc), + Layer.provideMerge(reg), + Layer.provideMerge(trunc), + Layer.provide(Instruction.defaultLayer), + Layer.provideMerge(deps), + ), + ) +} + +export const it = testEffect(make()) + +function root(dir: string) { + return path.join(dir, ".opencode", "guardrails") +} + +function withProfile(fx: Effect.Effect) { + return Effect.acquireUseRelease( + Effect.sync(() => { + const prev = process.env.OPENCODE_CONFIG_DIR + process.env.OPENCODE_CONFIG_DIR = profile + return prev + }), + () => fx, + (prev) => + Effect.sync(() => { + if (prev === undefined) delete process.env.OPENCODE_CONFIG_DIR + else process.env.OPENCODE_CONFIG_DIR = prev + }), + ) +} + +function queue(llm: TestLLMServer["Service"], replay: Replay) { + return Effect.forEach(replay.steps, (step) => { + if (step.kind === "text") return llm.text(step.text) + return llm.tool(step.name, step.input) + }).pipe(Effect.asVoid) +} + +export function run(replay: Replay) { + return withProfile( + provideTmpdirInstance( + (dir) => + Effect.gen(function* () { + const llm = yield* TestLLMServer + const prompt = yield* SessionPrompt.Service + + yield* Effect.promise(() => Auth.remove("openai")) + yield* Effect.promise(() => Auth.remove("openrouter")) + yield* Effect.promise(() => Auth.remove("zai")) + + Env.set("OPENCODE_E2E_LLM_URL", llm.url) + Env.set("ZHIPU_API_KEY", "test-zai-key") + Env.set("OPENAI_API_KEY", "test-openai-key") + Env.set("OPENROUTER_API_KEY", "test-openrouter-key") + + if (replay.state) { + yield* Effect.promise(() => fs.mkdir(root(dir), { recursive: true })) + yield* Effect.promise(() => + Bun.write(path.join(root(dir), "state.json"), JSON.stringify(replay.state, null, 2) + "\n"), + ) + } + + yield* queue(llm, replay) + + const chat = yield* Effect.promise(() => Session.create({ title: replay.name })) + const out = yield* prompt.command({ + sessionID: chat.id, + command: replay.command, + arguments: replay.args, + model: replay.model, + }) + const msgs = yield* MessageV2.filterCompactedEffect(chat.id) + const hits = yield* llm.hits + const log = yield* Effect.promise(() => Bun.file(path.join(root(dir), "events.jsonl")).text().catch(() => "")) + const state = yield* Effect.promise>(() => + Bun.file(path.join(root(dir), "state.json")) + .json() + .catch(() => ({})), + ) + + return { chat, dir, hits, log, msgs, out, state } + }), + { + git: true, + config: { + model: replay.model, + share: "auto", + }, + }, + ), + ) +} + +export function hitModels(hits: { body: Record }[]) { + return hits + .filter((hit) => !JSON.stringify(hit.body).includes("Generate a title for this conversation")) + .map((hit) => hit.body.model) + .filter((item): item is string => typeof item === "string") +} + +export function userText(msgs: MessageV2.WithParts[]) { + const msg = msgs.find((item) => item.info.role === "user") + const part = msg?.parts.find((item): item is MessageV2.TextPart => item.type === "text") + return part?.text +} + +export function userTask(msgs: MessageV2.WithParts[]) { + const msg = msgs.find((item) => item.info.role === "user") + return msg?.parts.find((item): item is MessageV2.SubtaskPart => item.type === "subtask") +} + +export function toolMsg(msgs: MessageV2.WithParts[], agent: string) { + return msgs.find((item) => item.info.role === "assistant" && item.info.agent === agent) +} + +export function assertReplay( + replay: Replay, + data: { + hits: { body: Record }[] + log: string + msgs: MessageV2.WithParts[] + out: MessageV2.WithParts + state: Record + }, +) { + const models = hitModels(data.hits) + expect(models.length).toBeGreaterThan(0) + if (replay.command === "provider-eval") { + expect(models).toContain("gpt-5.4-mini") + expect(models).toContain("openai/gpt-5.4-mini") + } else { + expect(models.every((item) => item === replay.models[0])).toBe(true) + } + + if (replay.mode === "primary") { + const text = userText(data.msgs) ?? "" + for (const item of replay.prompt) expect(text).toContain(item) + expect(text).toContain(replay.args) + expect(data.out.parts.some((item) => item.type === "text" && item.text.includes(replay.result))).toBe(true) + return + } + + const task = userTask(data.msgs) + expect(task?.agent).toBe(replay.agent) + for (const item of replay.prompt) expect(task?.prompt ?? "").toContain(item) + for (const item of replay.guard ?? []) expect(task?.prompt ?? "").toContain(item) + + const msg = toolMsg(data.msgs, replay.agent) + expect(msg?.info.role).toBe("assistant") + const part = msg?.parts.find((item): item is MessageV2.ToolPart => item.type === "tool") + expect(part?.state.status).toBe("completed") + if (!part || part.state.status !== "completed") return + expect(part.state.output).toContain("") + expect(part.state.output).toContain(replay.result) + expect(part.state.input.command).toBe(replay.command) + + if (replay.command === "provider-eval") { + expect(part.state.metadata?.model).toEqual({ + providerID: "openrouter", + modelID: "openai/gpt-5.4-mini", + }) + } +} + +export async function clean() { + await Instance.disposeAll() + await Config.invalidate(true) +} diff --git a/packages/opencode/test/scenario/replay.ts b/packages/opencode/test/scenario/replay.ts new file mode 100644 index 000000000000..5896699a80bb --- /dev/null +++ b/packages/opencode/test/scenario/replay.ts @@ -0,0 +1,233 @@ +type Text = { + kind: "text" + text: string +} + +type Tool = { + kind: "tool" + name: string + input: Record +} + +type Step = Text | Tool + +export type Replay = { + name: string + command: string + args: string + mode: "primary" | "subtask" + agent: string + model: string + models: string[] + prompt: string[] + guard?: string[] + state?: Record + result: string + steps: Step[] +} + +export const replays = { + implement: { + name: "implement command", + command: "implement", + args: "Keep the change bounded to one file.", + mode: "primary", + agent: "implement", + model: "openai/gpt-5-mini", + models: ["gpt-5-mini"], + prompt: [ + "Implement the requested change under the guardrail profile.", + "keep the scope bounded to the stated goal", + "run the smallest relevant verification before claiming completion", + ], + result: "Implemented the requested change in a bounded way.", + steps: [{ kind: "text", text: "Implemented the requested change in a bounded way." }], + }, + review: { + name: "review command", + command: "review", + args: "Focus on the current diff only.", + mode: "subtask", + agent: "review", + model: "openai/gpt-5-mini", + models: ["gpt-5-mini", "gpt-5-mini"], + prompt: [ + "Review the current work for correctness, regressions, missing tests, and missing workflow gates.", + "Required sections:", + "Findings", + "Recommended next step", + ], + guard: [ + "Guardrail runtime state:", + "unique source reads: 4", + "fact-check: stale after 1 edit(s) since DocRead at 2026-04-03T09:00:00.000Z", + "review state: stale after 1 edit(s) since 2026-04-03T09:10:00.000Z", + ], + state: { + read_count: 4, + edit_count: 2, + factchecked: true, + factcheck_source: "DocRead", + factcheck_at: "2026-04-03T09:00:00.000Z", + edit_count_since_check: 1, + reviewed: true, + review_at: "2026-04-03T09:10:00.000Z", + edits_since_review: 1, + last_block: "edit", + last_reason: "context budget exceeded", + }, + result: "Findings\n- none\nVerification\n- reviewed current diff\nOpen risks\n- none\nRecommended next step\n- ship if checks stay green", + steps: [ + { + kind: "tool", + name: "task", + input: { + description: "review current work", + prompt: "Review the current work for correctness, regressions, missing tests, and missing workflow gates.", + subagent_type: "review", + command: "review", + }, + }, + { + kind: "text", + text: "Findings\n- none\nVerification\n- reviewed current diff\nOpen risks\n- none\nRecommended next step\n- ship if checks stay green", + }, + ], + }, + ship: { + name: "ship command", + command: "ship", + args: "Check the current work only.", + mode: "subtask", + agent: "review", + model: "openai/gpt-5-mini", + models: ["gpt-5-mini", "gpt-5-mini"], + prompt: [ + "Run a release-readiness check for the current work.", + "Ready or Not ready", + "Blocking gates", + "Next action", + ], + guard: [ + "Guardrail runtime state:", + "unique source reads: 4", + "fact-check: stale after 1 edit(s) since DocRead at 2026-04-03T09:00:00.000Z", + "review state: stale after 1 edit(s) since 2026-04-03T09:10:00.000Z", + ], + state: { + read_count: 4, + edit_count: 2, + factchecked: true, + factcheck_source: "DocRead", + factcheck_at: "2026-04-03T09:00:00.000Z", + edit_count_since_check: 1, + reviewed: true, + review_at: "2026-04-03T09:10:00.000Z", + edits_since_review: 1, + last_block: "edit", + last_reason: "context budget exceeded", + }, + result: "Ready or Not ready\n- Not ready\nEvidence\n- local scenario replay only\nBlocking gates\n- full package checks not cited here\nNext action\n- run the narrowest required verification", + steps: [ + { + kind: "tool", + name: "task", + input: { + description: "ship current work", + prompt: "Run a release-readiness check for the current work.", + subagent_type: "review", + command: "ship", + }, + }, + { + kind: "text", + text: "Ready or Not ready\n- Not ready\nEvidence\n- local scenario replay only\nBlocking gates\n- full package checks not cited here\nNext action\n- run the narrowest required verification", + }, + ], + }, + handoff: { + name: "handoff command", + command: "handoff", + args: "Summarize the current work.", + mode: "subtask", + agent: "review", + model: "openai/gpt-5-mini", + models: ["gpt-5-mini", "gpt-5-mini"], + prompt: [ + "Prepare a handoff for the current work.", + "Goal", + "Guardrail state", + "Next steps", + ], + guard: [ + "Guardrail runtime state:", + "unique source reads: 4", + "fact-check: stale after 1 edit(s) since DocRead at 2026-04-03T09:00:00.000Z", + "review state: stale after 1 edit(s) since 2026-04-03T09:10:00.000Z", + ], + state: { + read_count: 4, + edit_count: 2, + factchecked: true, + factcheck_source: "DocRead", + factcheck_at: "2026-04-03T09:00:00.000Z", + edit_count_since_check: 1, + reviewed: true, + review_at: "2026-04-03T09:10:00.000Z", + edits_since_review: 1, + last_block: "edit", + last_reason: "context budget exceeded", + }, + result: "Goal\n- summarize guarded work\nConstraints\n- keep release gates explicit\nGuardrail state\n- enforced mode\nFiles changed\n- none in replay\nVerification\n- scenario replay only\nOpen risks\n- follow-up checks still required\nNext steps\n- continue from the cited guardrail state", + steps: [ + { + kind: "tool", + name: "task", + input: { + description: "prepare handoff", + prompt: "Prepare a handoff for the current work.", + subagent_type: "review", + command: "handoff", + }, + }, + { + kind: "text", + text: "Goal\n- summarize guarded work\nConstraints\n- keep release gates explicit\nGuardrail state\n- enforced mode\nFiles changed\n- none in replay\nVerification\n- scenario replay only\nOpen risks\n- follow-up checks still required\nNext steps\n- continue from the cited guardrail state", + }, + ], + }, + "provider-eval": { + name: "provider-eval command", + command: "provider-eval", + args: "Assess the isolated evaluation lane.", + mode: "subtask", + agent: "provider-eval", + model: "openai/gpt-5.4-mini", + models: ["gpt-5.4-mini", "openai/gpt-5.4-mini"], + prompt: [ + "Evaluate the requested provider or model candidate using the dedicated provider-eval lane.", + "Candidate", + "Routing and data-policy notes", + "Follow-up config change", + ], + result: + "Candidate\n- openrouter/openai/gpt-5.4-mini\nEvidence\n- routed through provider-eval\nRouting and data-policy notes\n- keep OpenRouter isolated from the default lane\nRecommendation\n- retain evaluation-only admission\nFollow-up config change\n- none", + steps: [ + { + kind: "tool", + name: "task", + input: { + description: "evaluate provider candidate", + prompt: "Evaluate the requested provider or model candidate using the dedicated provider-eval lane.", + subagent_type: "provider-eval", + command: "provider-eval", + }, + }, + { + kind: "text", + text: + "Candidate\n- openrouter/openai/gpt-5.4-mini\nEvidence\n- routed through provider-eval\nRouting and data-policy notes\n- keep OpenRouter isolated from the default lane\nRecommendation\n- retain evaluation-only admission\nFollow-up config change\n- none", + }, + ], + }, +} satisfies Record