Skip to content

feat: pre-execution critic gate for side-effecting tools#863

Open
anandgupta42 wants to merge 1 commit into
mainfrom
feat/critic-gate
Open

feat: pre-execution critic gate for side-effecting tools#863
anandgupta42 wants to merge 1 commit into
mainfrom
feat/critic-gate

Conversation

@anandgupta42
Copy link
Copy Markdown
Contributor

@anandgupta42 anandgupta42 commented Jun 1, 2026

What does this PR do?

Adds a flag-gated (ALTIMATE_CRITIC_GATE, default OFF) pre-execution "critic gate". Before a side-effecting tool (bash/write/edit/sql_execute/dbt_run/patch) runs, a pluggable Verifier checks the proposed args; on a hard verdict the call is skipped and the reason is fed back so the model can fix-and-retry, instead of executing a bad action. No behavior change when the flag is unset.

  • tool/critic.ts — pure, testable gate + pluggable Verifier interface.
    • ALLOW_ALL — ungated (opt-out / tests).
    • basicSafetyVerifier — the wired default: a conservative, dependency-free heuristic that blocks catastrophic, unambiguous host-destructive bash (rm -rf / incl. system-path / glob / compound / fully-qualified / ${HOME} variants, fork bombs, mkfs/dd on a raw device, recursive chmod of /). Best-effort safety net, not a security boundary — documented as such; defense-in-depth stays with the OS sandbox, the permission system, and a richer verifier a product may inject.
    • gate() is fail-open on verifier error AND on a verifier timeout, so a broken or hung verifier never breaks/hangs the agent.
  • session/prompt.ts — wired into the native-tool execute wrapper just before item.execute, marker-wrapped, emits tool.execute.after on a block so observability sees it. MCP tools use a separate wrapper and are intentionally not gated (documented on DEFAULT_GATED).

Split out of #857 / PR #858, where the critic was previously an unwired no-op flag. This PR makes the flag actually do something.

Type of change

  • New feature (non-breaking change which adds functionality)

Issue for this PR

Closes #862

How did you verify your code works?

  • 100 tests across three files: unit (critic.test.ts), adversarial bypass probes that document caught-vs-known-limits (critic-adversarial.test.ts), and e2e driving the REAL BashTool through the gate with real filesystem effects (critic-e2e.test.ts) — no mocked tool calls. All green; 448 tool-suite tests pass; tsgo typecheck clean; altimate_change markers in prompt.ts balanced (37/37); default-off path unchanged.
  • Adversarial testing found + fixed real bugs: compound commands where the fatal rm wasn't last (rm -rf / && rm -rf ./safe), and a separator glued to the target (rm -rf /;).
  • Reviewed by a multi-model panel (Gemini 3.1 Pro, GLM-5, MiniMax M2.7) before opening. Their findings were applied: removed false positives (rm -rf *, rm -rf ., and rm buried in an argument like git commit -m "...rm -rf /..."), closed false negatives (glob wipes rm -rf /var/*, /.//.., fully-qualified /bin/rm, ${HOME}, long-form/glob chmod), and added an input length cap + verifier timeout.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have added tests that prove my feature works
  • New and existing unit tests pass locally with my changes

Summary by cubic

Adds a flag-gated pre-execution “critic gate” that screens side-effecting tool calls and blocks clearly dangerous actions (e.g., catastrophic bash) before they run. Default is off; when enabled, blocked calls return feedback so the model can fix-and-retry.

  • New Features
    • Gate toggled by ALTIMATE_CRITIC_GATE (default off).
    • Pluggable Verifier API with ALLOW_ALL and a default basicSafetyVerifier.
    • Heuristic blocks catastrophic bash: rm -rf / (and system-path/glob/compound/fully-qualified/${HOME} variants), fork bombs, mkfs/dd on devices, and recursive chmod/chown of /.
    • Fail-open on verifier errors and on timeout (5s), so agents never hang.
    • Wired in session/prompt.ts right before execution; on block, emits tool.execute.after and skips execution. MCP tools are not gated.
    • Gated tools: bash, write, edit, sql_execute, dbt_run, patch.
    • Tests: 100 total (unit, adversarial, e2e with real BashTool).

Written for commit 769af84. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

Release Notes

  • New Features
    • Added an optional pre-execution safety gate for bash commands that detects and blocks potentially destructive operations (e.g., critical system deletions, fork bombs) when enabled, with informative feedback on blocked executions.

Flag-gated (`ALTIMATE_CRITIC_GATE`, default off) gate that runs before a
side-effecting tool (bash/write/edit/sql_execute/dbt_run/patch) executes.
On a hard verdict the call is skipped and the reason is fed back so the
model can fix-and-retry, instead of executing a bad action.

- `tool/critic.ts` — pure, testable gate + pluggable `Verifier` interface.
  - `ALLOW_ALL` — ungated (opt-out / tests).
  - `basicSafetyVerifier` — the wired default: a conservative, dependency-free
    heuristic that blocks catastrophic, unambiguous host-destructive bash
    (`rm -rf /` incl. system-path/glob/compound/fully-qualified/brace variants,
    fork bombs, `mkfs`/`dd` on a raw device, recursive chmod of `/`). Best-effort
    safety NET, not a security boundary — documented as such; defense-in-depth
    stays with the OS sandbox, the permission system, and a richer verifier a
    product may inject.
  - `gate()` is fail-open on verifier error AND on a verifier timeout, so a
    broken or hung verifier never breaks/hangs the agent.
- `session/prompt.ts` — wired into the native-tool execute wrapper just before
  `item.execute`, marker-wrapped, emits `tool.execute.after` on a block so
  observability sees it. No-op when off. (MCP tools use a separate wrapper and
  are intentionally not gated; documented on `DEFAULT_GATED`.)

Tests (100): unit, adversarial bypass probes (caught-vs-documented-limits), and
e2e driving the REAL BashTool through the gate with real filesystem effects —
no mocked tool calls.

Hardening from multi-model review (Gemini / GLM-5 / MiniMax):
- FIX false positive: `rm -rf *`, `rm -rf .`, `rm -rf ./` no longer blocked
  (routine workspace cleanup; no workdir context to judge them).
- FIX false positive: `rm` mentioned inside another command's argument
  (`git commit -m "...rm -rf /..."`) no longer blocked — `rm` is only treated
  as the command when in command position (with a transparent-prefix allowlist
  so `sudo`/`bash -c` still match).
- FIX false negatives: glob wipes of system dirs (`rm -rf /var/*`), `/.`/`/..`,
  fully-qualified `/bin/rm`, `${HOME}` brace expansion, and long-form/glob
  recursive chmod (`chmod --recursive 777 /`, `chmod -R 777 /*`).
- Add input length cap + verifier timeout; `attachments: []` on blocked result.
- Adversarial testing also found+fixed two earlier holes: compound commands
  where the fatal `rm` wasn't last, and a separator glued to the target.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements a flag-gated (ALTIMATE_CRITIC_GATE) pre-execution critic gate for side-effecting tools, focusing on bash safety. The gate verifies proposed tool arguments before execution; on a hard verdict, it blocks execution and returns actionable feedback so the model can retry. A conservative default verifier detects catastrophic bash patterns (e.g., rm -rf /, fork bombs, raw-disk operations, recursive root chmod). The gate integrates into session tool execution, never throws, fails open on timeout/error, and includes extensive unit, adversarial, and real e2e test coverage.

Changes

Critic gate implementation and integration

Layer / File(s) Summary
Critic gate module implementation
packages/opencode/src/tool/critic.ts
Implements the Critic namespace with configurable gating, a bash-safety detection heuristic, a default safety verifier, and an async gate function. Detects fork bombs, root-targeting, mkfs/dd on raw devices, recursive destructive operations, and command-position rm on fatal targets. Gate enforces verification via Promise.race with timeout, blocks with feedback on failed verdicts, and fails open on errors.
Tool execution wrapper integration
packages/opencode/src/session/prompt.ts
Integrates the critic gate into SessionPrompt.resolveTools' tool execute wrapper. When enabled and gate verdict denies execution, constructs a blocked result with critic feedback, triggers tool.execute.after for observability, and returns early without invoking the underlying tool. Normal execution continues unchanged when verdict allows.
Critic module test suite
packages/opencode/test/tool/critic.test.ts, packages/opencode/test/tool/critic-adversarial.test.ts, packages/opencode/test/tool/critic-e2e.test.ts
Comprehensive coverage: unit tests validating gate behavior (fail-open defaults, allow-all non-gated tools, blocked verdicts, error handling), extensive dangerous-bash detection (fork bombs, root-targeting, mkfs, dd, recursive deletions), safe-command validation (ordinary bash, safe globs), basic verifier integration, adversarial robustness (malformed args, malformed verifier responses, performance with whitespace-heavy commands), known-bypass documentation (command substitution, backticks, base64, find -delete, aliases, xargs), and real e2e tests with BashTool across safe, non-fatal, and catastrophic commands with filesystem verification.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A critic gate stands watch and tall, 🚪
Catching rm -rf / before the fall, 🛑
With bash-safe heuristics, sharp and keen,
No fork bombs here, no /dev/null scene, 🐰
Safe side-effects, at last, ensured. ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning Description is missing the required 'PINEAPPLE' keyword at the top (per template), though it is otherwise comprehensive with changes, testing, and verification details. Add 'PINEAPPLE' at the very top of the PR description before any other content, as required by the template for AI-generated contributions.
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Title accurately summarizes the main change—adding a pre-execution critic gate for side-effecting tools—and is clear and specific.
Linked Issues check ✅ Passed All core objectives from issue #862 are met: flag-gated gate (default OFF), pluggable Verifier interface with basicSafetyVerifier, wired into native-tool path, fail-open behavior, and comprehensive testing (unit, adversarial, e2e).
Out of Scope Changes check ✅ Passed All changes are in scope: core gate logic (critic.ts), integration into prompt.ts, and supporting tests. No unrelated refactoring or modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/critic-gate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@anandgupta42 anandgupta42 marked this pull request as ready for review June 1, 2026 06:37
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/tool/critic.ts`:
- Around line 121-129: The isCommandPosition function treats shell assignment
words as normal arguments; update its backward token-scan to also skip
assignment words by recognizing tokens that match a shell variable assignment
pattern (e.g. /^[A-Za-z_][A-Za-z0-9_]*=.*$/) and treat them like flags/prefixes
(similar to TRANSPARENT_PREFIX), so tokens like FOO=1 or HOME=/ don't stop the
scan; apply the same change to the other identical token-scan logic elsewhere in
the file (the analogous loop that decides command position further down) so both
checks skip assignment words.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 82f5e058-6170-4af3-85b9-0262bd8ed2ac

📥 Commits

Reviewing files that changed from the base of the PR and between 6ad8b47 and 769af84.

📒 Files selected for processing (5)
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/tool/critic.ts
  • packages/opencode/test/tool/critic-adversarial.test.ts
  • packages/opencode/test/tool/critic-e2e.test.ts
  • packages/opencode/test/tool/critic.test.ts

Comment on lines +121 to +129
function isCommandPosition(tokens: string[], i: number, sep: Set<string>): boolean {
for (let j = i - 1; j >= 0; j--) {
const t = tokens[j]
if (sep.has(t)) return true
if (t.startsWith("-")) continue // a flag (e.g. `sudo -E`, `bash -c`)
if (TRANSPARENT_PREFIX.has(t)) continue
return false // a real preceding word -> rm is an argument, not the command
}
return true // reached the start through only flags/prefixes
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Skip shell assignment words when deciding command position.

isCommandPosition() currently treats assignment words as normal arguments, so catastrophic commands like FOO=1 rm -rf / or env HOME=/ rm -rf / bypass the detector because rm is no longer seen in command position. That is a simple false negative for the default safety gate.

Suggested fix
+  function isAssignmentWord(token: string): boolean {
+    return /^[a-z_][a-z0-9_]*=.*/.test(token)
+  }
+
   function isCommandPosition(tokens: string[], i: number, sep: Set<string>): boolean {
     for (let j = i - 1; j >= 0; j--) {
       const t = tokens[j]
       if (sep.has(t)) return true
       if (t.startsWith("-")) continue // a flag (e.g. `sudo -E`, `bash -c`)
+      if (isAssignmentWord(t)) continue
       if (TRANSPARENT_PREFIX.has(t)) continue
       return false // a real preceding word -> rm is an argument, not the command
     }
     return true // reached the start through only flags/prefixes
   }

Also applies to: 180-184

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/tool/critic.ts` around lines 121 - 129, The
isCommandPosition function treats shell assignment words as normal arguments;
update its backward token-scan to also skip assignment words by recognizing
tokens that match a shell variable assignment pattern (e.g.
/^[A-Za-z_][A-Za-z0-9_]*=.*$/) and treat them like flags/prefixes (similar to
TRANSPARENT_PREFIX), so tokens like FOO=1 or HOME=/ don't stop the scan; apply
the same change to the other identical token-scan logic elsewhere in the file
(the analogous loop that decides command position further down) so both checks
skip assignment words.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 5 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/tool/critic.ts">

<violation number="1" location="packages/opencode/src/tool/critic.ts:32">
P1: `DEFAULT_GATED` uses `patch` instead of the real `apply_patch` tool id, so patch edits are not gated.</violation>

<violation number="2" location="packages/opencode/src/tool/critic.ts:125">
P2: `isCommandPosition` misses `rm` after `env` assignments (for example `env FOO=1 rm -rf /`), allowing dangerous commands to evade detection.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

* no-op gap today — but a product injecting a verifier for `sql_execute`/
* `dbt_run` must confirm those are native, not MCP, tools.
*/
export const DEFAULT_GATED = ["bash", "write", "edit", "sql_execute", "dbt_run", "patch"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: DEFAULT_GATED uses patch instead of the real apply_patch tool id, so patch edits are not gated.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/tool/critic.ts, line 32:

<comment>`DEFAULT_GATED` uses `patch` instead of the real `apply_patch` tool id, so patch edits are not gated.</comment>

<file context>
@@ -0,0 +1,262 @@
+   * no-op gap today — but a product injecting a verifier for `sql_execute`/
+   * `dbt_run` must confirm those are native, not MCP, tools.
+   */
+  export const DEFAULT_GATED = ["bash", "write", "edit", "sql_execute", "dbt_run", "patch"]
+
+  export interface Verdict {
</file context>

for (let j = i - 1; j >= 0; j--) {
const t = tokens[j]
if (sep.has(t)) return true
if (t.startsWith("-")) continue // a flag (e.g. `sudo -E`, `bash -c`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: isCommandPosition misses rm after env assignments (for example env FOO=1 rm -rf /), allowing dangerous commands to evade detection.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/tool/critic.ts, line 125:

<comment>`isCommandPosition` misses `rm` after `env` assignments (for example `env FOO=1 rm -rf /`), allowing dangerous commands to evade detection.</comment>

<file context>
@@ -0,0 +1,262 @@
+    for (let j = i - 1; j >= 0; j--) {
+      const t = tokens[j]
+      if (sep.has(t)) return true
+      if (t.startsWith("-")) continue // a flag (e.g. `sudo -E`, `bash -c`)
+      if (TRANSPARENT_PREFIX.has(t)) continue
+      return false // a real preceding word -> rm is an argument, not the command
</file context>

Copy link
Copy Markdown

@dev-punia-altimate dev-punia-altimate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Persona Review — Verdict: comment

Multi-persona review completed.

6/6 agents completed · 32s · 0 findings (0 critical, 0 high, 0 medium)


Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pre-execution critic gate for side-effecting tools

2 participants