feat: pre-execution critic gate for side-effecting tools#863
feat: pre-execution critic gate for side-effecting tools#863anandgupta42 wants to merge 1 commit into
Conversation
Flag-gated (`ALTIMATE_CRITIC_GATE`, default off) gate that runs before a
side-effecting tool (bash/write/edit/sql_execute/dbt_run/patch) executes.
On a hard verdict the call is skipped and the reason is fed back so the
model can fix-and-retry, instead of executing a bad action.
- `tool/critic.ts` — pure, testable gate + pluggable `Verifier` interface.
- `ALLOW_ALL` — ungated (opt-out / tests).
- `basicSafetyVerifier` — the wired default: a conservative, dependency-free
heuristic that blocks catastrophic, unambiguous host-destructive bash
(`rm -rf /` incl. system-path/glob/compound/fully-qualified/brace variants,
fork bombs, `mkfs`/`dd` on a raw device, recursive chmod of `/`). Best-effort
safety NET, not a security boundary — documented as such; defense-in-depth
stays with the OS sandbox, the permission system, and a richer verifier a
product may inject.
- `gate()` is fail-open on verifier error AND on a verifier timeout, so a
broken or hung verifier never breaks/hangs the agent.
- `session/prompt.ts` — wired into the native-tool execute wrapper just before
`item.execute`, marker-wrapped, emits `tool.execute.after` on a block so
observability sees it. No-op when off. (MCP tools use a separate wrapper and
are intentionally not gated; documented on `DEFAULT_GATED`.)
Tests (100): unit, adversarial bypass probes (caught-vs-documented-limits), and
e2e driving the REAL BashTool through the gate with real filesystem effects —
no mocked tool calls.
Hardening from multi-model review (Gemini / GLM-5 / MiniMax):
- FIX false positive: `rm -rf *`, `rm -rf .`, `rm -rf ./` no longer blocked
(routine workspace cleanup; no workdir context to judge them).
- FIX false positive: `rm` mentioned inside another command's argument
(`git commit -m "...rm -rf /..."`) no longer blocked — `rm` is only treated
as the command when in command position (with a transparent-prefix allowlist
so `sudo`/`bash -c` still match).
- FIX false negatives: glob wipes of system dirs (`rm -rf /var/*`), `/.`/`/..`,
fully-qualified `/bin/rm`, `${HOME}` brace expansion, and long-form/glob
recursive chmod (`chmod --recursive 777 /`, `chmod -R 777 /*`).
- Add input length cap + verifier timeout; `attachments: []` on blocked result.
- Adversarial testing also found+fixed two earlier holes: compound commands
where the fatal `rm` wasn't last, and a separator glued to the target.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR implements a flag-gated ( ChangesCritic gate implementation and integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/opencode/src/tool/critic.ts`:
- Around line 121-129: The isCommandPosition function treats shell assignment
words as normal arguments; update its backward token-scan to also skip
assignment words by recognizing tokens that match a shell variable assignment
pattern (e.g. /^[A-Za-z_][A-Za-z0-9_]*=.*$/) and treat them like flags/prefixes
(similar to TRANSPARENT_PREFIX), so tokens like FOO=1 or HOME=/ don't stop the
scan; apply the same change to the other identical token-scan logic elsewhere in
the file (the analogous loop that decides command position further down) so both
checks skip assignment words.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 82f5e058-6170-4af3-85b9-0262bd8ed2ac
📒 Files selected for processing (5)
packages/opencode/src/session/prompt.tspackages/opencode/src/tool/critic.tspackages/opencode/test/tool/critic-adversarial.test.tspackages/opencode/test/tool/critic-e2e.test.tspackages/opencode/test/tool/critic.test.ts
| function isCommandPosition(tokens: string[], i: number, sep: Set<string>): boolean { | ||
| for (let j = i - 1; j >= 0; j--) { | ||
| const t = tokens[j] | ||
| if (sep.has(t)) return true | ||
| if (t.startsWith("-")) continue // a flag (e.g. `sudo -E`, `bash -c`) | ||
| if (TRANSPARENT_PREFIX.has(t)) continue | ||
| return false // a real preceding word -> rm is an argument, not the command | ||
| } | ||
| return true // reached the start through only flags/prefixes |
There was a problem hiding this comment.
Skip shell assignment words when deciding command position.
isCommandPosition() currently treats assignment words as normal arguments, so catastrophic commands like FOO=1 rm -rf / or env HOME=/ rm -rf / bypass the detector because rm is no longer seen in command position. That is a simple false negative for the default safety gate.
Suggested fix
+ function isAssignmentWord(token: string): boolean {
+ return /^[a-z_][a-z0-9_]*=.*/.test(token)
+ }
+
function isCommandPosition(tokens: string[], i: number, sep: Set<string>): boolean {
for (let j = i - 1; j >= 0; j--) {
const t = tokens[j]
if (sep.has(t)) return true
if (t.startsWith("-")) continue // a flag (e.g. `sudo -E`, `bash -c`)
+ if (isAssignmentWord(t)) continue
if (TRANSPARENT_PREFIX.has(t)) continue
return false // a real preceding word -> rm is an argument, not the command
}
return true // reached the start through only flags/prefixes
}Also applies to: 180-184
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/src/tool/critic.ts` around lines 121 - 129, The
isCommandPosition function treats shell assignment words as normal arguments;
update its backward token-scan to also skip assignment words by recognizing
tokens that match a shell variable assignment pattern (e.g.
/^[A-Za-z_][A-Za-z0-9_]*=.*$/) and treat them like flags/prefixes (similar to
TRANSPARENT_PREFIX), so tokens like FOO=1 or HOME=/ don't stop the scan; apply
the same change to the other identical token-scan logic elsewhere in the file
(the analogous loop that decides command position further down) so both checks
skip assignment words.
There was a problem hiding this comment.
2 issues found across 5 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/tool/critic.ts">
<violation number="1" location="packages/opencode/src/tool/critic.ts:32">
P1: `DEFAULT_GATED` uses `patch` instead of the real `apply_patch` tool id, so patch edits are not gated.</violation>
<violation number="2" location="packages/opencode/src/tool/critic.ts:125">
P2: `isCommandPosition` misses `rm` after `env` assignments (for example `env FOO=1 rm -rf /`), allowing dangerous commands to evade detection.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| * no-op gap today — but a product injecting a verifier for `sql_execute`/ | ||
| * `dbt_run` must confirm those are native, not MCP, tools. | ||
| */ | ||
| export const DEFAULT_GATED = ["bash", "write", "edit", "sql_execute", "dbt_run", "patch"] |
There was a problem hiding this comment.
P1: DEFAULT_GATED uses patch instead of the real apply_patch tool id, so patch edits are not gated.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/tool/critic.ts, line 32:
<comment>`DEFAULT_GATED` uses `patch` instead of the real `apply_patch` tool id, so patch edits are not gated.</comment>
<file context>
@@ -0,0 +1,262 @@
+ * no-op gap today — but a product injecting a verifier for `sql_execute`/
+ * `dbt_run` must confirm those are native, not MCP, tools.
+ */
+ export const DEFAULT_GATED = ["bash", "write", "edit", "sql_execute", "dbt_run", "patch"]
+
+ export interface Verdict {
</file context>
| for (let j = i - 1; j >= 0; j--) { | ||
| const t = tokens[j] | ||
| if (sep.has(t)) return true | ||
| if (t.startsWith("-")) continue // a flag (e.g. `sudo -E`, `bash -c`) |
There was a problem hiding this comment.
P2: isCommandPosition misses rm after env assignments (for example env FOO=1 rm -rf /), allowing dangerous commands to evade detection.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/tool/critic.ts, line 125:
<comment>`isCommandPosition` misses `rm` after `env` assignments (for example `env FOO=1 rm -rf /`), allowing dangerous commands to evade detection.</comment>
<file context>
@@ -0,0 +1,262 @@
+ for (let j = i - 1; j >= 0; j--) {
+ const t = tokens[j]
+ if (sep.has(t)) return true
+ if (t.startsWith("-")) continue // a flag (e.g. `sudo -E`, `bash -c`)
+ if (TRANSPARENT_PREFIX.has(t)) continue
+ return false // a real preceding word -> rm is an argument, not the command
</file context>
dev-punia-altimate
left a comment
There was a problem hiding this comment.
Multi-Persona Review — Verdict: comment
Multi-persona review completed.
6/6 agents completed · 32s · 0 findings (0 critical, 0 high, 0 medium)
Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·
What does this PR do?
Adds a flag-gated (
ALTIMATE_CRITIC_GATE, default OFF) pre-execution "critic gate". Before a side-effecting tool (bash/write/edit/sql_execute/dbt_run/patch) runs, a pluggableVerifierchecks the proposed args; on a hard verdict the call is skipped and the reason is fed back so the model can fix-and-retry, instead of executing a bad action. No behavior change when the flag is unset.tool/critic.ts— pure, testable gate + pluggableVerifierinterface.ALLOW_ALL— ungated (opt-out / tests).basicSafetyVerifier— the wired default: a conservative, dependency-free heuristic that blocks catastrophic, unambiguous host-destructive bash (rm -rf /incl. system-path / glob / compound / fully-qualified /${HOME}variants, fork bombs,mkfs/ddon a raw device, recursivechmodof/). Best-effort safety net, not a security boundary — documented as such; defense-in-depth stays with the OS sandbox, the permission system, and a richer verifier a product may inject.gate()is fail-open on verifier error AND on a verifier timeout, so a broken or hung verifier never breaks/hangs the agent.session/prompt.ts— wired into the native-tool execute wrapper just beforeitem.execute, marker-wrapped, emitstool.execute.afteron a block so observability sees it. MCP tools use a separate wrapper and are intentionally not gated (documented onDEFAULT_GATED).Type of change
Issue for this PR
Closes #862
How did you verify your code works?
critic.test.ts), adversarial bypass probes that document caught-vs-known-limits (critic-adversarial.test.ts), and e2e driving the REALBashToolthrough the gate with real filesystem effects (critic-e2e.test.ts) — no mocked tool calls. All green; 448 tool-suite tests pass;tsgotypecheck clean;altimate_changemarkers inprompt.tsbalanced (37/37); default-off path unchanged.rmwasn't last (rm -rf / && rm -rf ./safe), and a separator glued to the target (rm -rf /;).rm -rf *,rm -rf ., andrmburied in an argument likegit commit -m "...rm -rf /..."), closed false negatives (glob wipesrm -rf /var/*,/.//.., fully-qualified/bin/rm,${HOME}, long-form/globchmod), and added an input length cap + verifier timeout.Checklist
Summary by cubic
Adds a flag-gated pre-execution “critic gate” that screens side-effecting tool calls and blocks clearly dangerous actions (e.g., catastrophic bash) before they run. Default is off; when enabled, blocked calls return feedback so the model can fix-and-retry.
ALTIMATE_CRITIC_GATE(default off).ALLOW_ALLand a defaultbasicSafetyVerifier.rm -rf /(and system-path/glob/compound/fully-qualified/${HOME}variants), fork bombs,mkfs/ddon devices, and recursive chmod/chown of/.session/prompt.tsright before execution; on block, emitstool.execute.afterand skips execution. MCP tools are not gated.bash,write,edit,sql_execute,dbt_run,patch.BashTool).Written for commit 769af84. Summary will update on new commits.
Summary by CodeRabbit
Release Notes