feat: pre-execution critic gate for side-effecting tools by anandgupta42 · Pull Request #863 · AltimateAI/altimate-code

anandgupta42 · 2026-06-01T04:32:43Z

What does this PR do?

Adds a flag-gated (ALTIMATE_CRITIC_GATE, default OFF) pre-execution "critic gate". Before a side-effecting tool (bash/write/edit/sql_execute/dbt_run/patch) runs, a pluggable Verifier checks the proposed args; on a hard verdict the call is skipped and the reason is fed back so the model can fix-and-retry, instead of executing a bad action. No behavior change when the flag is unset.

tool/critic.ts — pure, testable gate + pluggable Verifier interface.
- ALLOW_ALL — ungated (opt-out / tests).
- basicSafetyVerifier — the wired default: a conservative, dependency-free heuristic that blocks catastrophic, unambiguous host-destructive bash (rm -rf / incl. system-path / glob / compound / fully-qualified / ${HOME} variants, fork bombs, mkfs/dd on a raw device, recursive chmod of /). Best-effort safety net, not a security boundary — documented as such; defense-in-depth stays with the OS sandbox, the permission system, and a richer verifier a product may inject.
- gate() is fail-open on verifier error AND on a verifier timeout, so a broken or hung verifier never breaks/hangs the agent.
session/prompt.ts — wired into the native-tool execute wrapper just before item.execute, marker-wrapped, emits tool.execute.after on a block so observability sees it. MCP tools use a separate wrapper and are intentionally not gated (documented on DEFAULT_GATED).

Split out of #857 / PR #858, where the critic was previously an unwired no-op flag. This PR makes the flag actually do something.

Type of change

New feature (non-breaking change which adds functionality)

Issue for this PR

Closes #862

How did you verify your code works?

100 tests across three files: unit (critic.test.ts), adversarial bypass probes that document caught-vs-known-limits (critic-adversarial.test.ts), and e2e driving the REAL BashTool through the gate with real filesystem effects (critic-e2e.test.ts) — no mocked tool calls. All green; 448 tool-suite tests pass; tsgo typecheck clean; altimate_change markers in prompt.ts balanced (37/37); default-off path unchanged.
Adversarial testing found + fixed real bugs: compound commands where the fatal rm wasn't last (rm -rf / && rm -rf ./safe), and a separator glued to the target (rm -rf /;).
Reviewed by a multi-model panel (Gemini 3.1 Pro, GLM-5, MiniMax M2.7) before opening. Their findings were applied: removed false positives (rm -rf *, rm -rf ., and rm buried in an argument like git commit -m "...rm -rf /..."), closed false negatives (glob wipes rm -rf /var/*, /.//.., fully-qualified /bin/rm, ${HOME}, long-form/glob chmod), and added an input length cap + verifier timeout.

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my code
I have added tests that prove my feature works
New and existing unit tests pass locally with my changes

Summary by cubic

Adds a flag-gated pre-execution “critic gate” that screens side-effecting tool calls and blocks clearly dangerous actions (e.g., catastrophic bash) before they run. Default is off; when enabled, blocked calls return feedback so the model can fix-and-retry.

New Features
- Gate toggled by ALTIMATE_CRITIC_GATE (default off).
- Pluggable Verifier API with ALLOW_ALL and a default basicSafetyVerifier.
- Heuristic blocks catastrophic bash: rm -rf / (and system-path/glob/compound/fully-qualified/${HOME} variants), fork bombs, mkfs/dd on devices, and recursive chmod/chown of /.
- Fail-open on verifier errors and on timeout (5s), so agents never hang.
- Wired in session/prompt.ts right before execution; on block, emits tool.execute.after and skips execution. MCP tools are not gated.
- Gated tools: bash, write, edit, sql_execute, dbt_run, patch.
- Tests: 100 total (unit, adversarial, e2e with real BashTool).

^{Written for commit 769af84. Summary will update on new commits.}

Summary by CodeRabbit

Release Notes

New Features
- Added an optional pre-execution safety gate for bash commands that detects and blocks potentially destructive operations (e.g., critical system deletions, fork bombs) when enabled, with informative feedback on blocked executions.

Flag-gated (`ALTIMATE_CRITIC_GATE`, default off) gate that runs before a side-effecting tool (bash/write/edit/sql_execute/dbt_run/patch) executes. On a hard verdict the call is skipped and the reason is fed back so the model can fix-and-retry, instead of executing a bad action. - `tool/critic.ts` — pure, testable gate + pluggable `Verifier` interface. - `ALLOW_ALL` — ungated (opt-out / tests). - `basicSafetyVerifier` — the wired default: a conservative, dependency-free heuristic that blocks catastrophic, unambiguous host-destructive bash (`rm -rf /` incl. system-path/glob/compound/fully-qualified/brace variants, fork bombs, `mkfs`/`dd` on a raw device, recursive chmod of `/`). Best-effort safety NET, not a security boundary — documented as such; defense-in-depth stays with the OS sandbox, the permission system, and a richer verifier a product may inject. - `gate()` is fail-open on verifier error AND on a verifier timeout, so a broken or hung verifier never breaks/hangs the agent. - `session/prompt.ts` — wired into the native-tool execute wrapper just before `item.execute`, marker-wrapped, emits `tool.execute.after` on a block so observability sees it. No-op when off. (MCP tools use a separate wrapper and are intentionally not gated; documented on `DEFAULT_GATED`.) Tests (100): unit, adversarial bypass probes (caught-vs-documented-limits), and e2e driving the REAL BashTool through the gate with real filesystem effects — no mocked tool calls. Hardening from multi-model review (Gemini / GLM-5 / MiniMax): - FIX false positive: `rm -rf *`, `rm -rf .`, `rm -rf ./` no longer blocked (routine workspace cleanup; no workdir context to judge them). - FIX false positive: `rm` mentioned inside another command's argument (`git commit -m "...rm -rf /..."`) no longer blocked — `rm` is only treated as the command when in command position (with a transparent-prefix allowlist so `sudo`/`bash -c` still match). - FIX false negatives: glob wipes of system dirs (`rm -rf /var/*`), `/.`/`/..`, fully-qualified `/bin/rm`, `${HOME}` brace expansion, and long-form/glob recursive chmod (`chmod --recursive 777 /`, `chmod -R 777 /*`). - Add input length cap + verifier timeout; `attachments: []` on blocked result. - Adversarial testing also found+fixed two earlier holes: compound commands where the fatal `rm` wasn't last, and a separator glued to the target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-01T04:32:50Z

📝 Walkthrough

Walkthrough

This PR implements a flag-gated (ALTIMATE_CRITIC_GATE) pre-execution critic gate for side-effecting tools, focusing on bash safety. The gate verifies proposed tool arguments before execution; on a hard verdict, it blocks execution and returns actionable feedback so the model can retry. A conservative default verifier detects catastrophic bash patterns (e.g., rm -rf /, fork bombs, raw-disk operations, recursive root chmod). The gate integrates into session tool execution, never throws, fails open on timeout/error, and includes extensive unit, adversarial, and real e2e test coverage.

Changes

Critic gate implementation and integration

Layer / File(s)	Summary
Critic gate module implementation `packages/opencode/src/tool/critic.ts`	Implements the `Critic` namespace with configurable gating, a bash-safety detection heuristic, a default safety verifier, and an async gate function. Detects fork bombs, root-targeting, mkfs/dd on raw devices, recursive destructive operations, and command-position `rm` on fatal targets. Gate enforces verification via `Promise.race` with timeout, blocks with feedback on failed verdicts, and fails open on errors.
Tool execution wrapper integration `packages/opencode/src/session/prompt.ts`	Integrates the critic gate into `SessionPrompt.resolveTools`' tool `execute` wrapper. When enabled and gate verdict denies execution, constructs a blocked result with critic feedback, triggers `tool.execute.after` for observability, and returns early without invoking the underlying tool. Normal execution continues unchanged when verdict allows.
Critic module test suite `packages/opencode/test/tool/critic.test.ts`, `packages/opencode/test/tool/critic-adversarial.test.ts`, `packages/opencode/test/tool/critic-e2e.test.ts`	Comprehensive coverage: unit tests validating gate behavior (fail-open defaults, allow-all non-gated tools, blocked verdicts, error handling), extensive dangerous-bash detection (fork bombs, root-targeting, mkfs, dd, recursive deletions), safe-command validation (ordinary bash, safe globs), basic verifier integration, adversarial robustness (malformed args, malformed verifier responses, performance with whitespace-heavy commands), known-bypass documentation (command substitution, backticks, base64, find -delete, aliases, xargs), and real e2e tests with BashTool across safe, non-fatal, and catastrophic commands with filesystem verification.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A critic gate stands watch and tall, 🚪
Catching rm -rf / before the fall, 🛑
With bash-safe heuristics, sharp and keen,
No fork bombs here, no /dev/null scene, 🐰
Safe side-effects, at last, ensured. ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	Description is missing the required 'PINEAPPLE' keyword at the top (per template), though it is otherwise comprehensive with changes, testing, and verification details.	Add 'PINEAPPLE' at the very top of the PR description before any other content, as required by the template for AI-generated contributions.
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title accurately summarizes the main change—adding a pre-execution critic gate for side-effecting tools—and is clear and specific.
Linked Issues check	✅ Passed	All core objectives from issue `#862` are met: flag-gated gate (default OFF), pluggable Verifier interface with basicSafetyVerifier, wired into native-tool path, fail-open behavior, and comprehensive testing (unit, adversarial, e2e).
Out of Scope Changes check	✅ Passed	All changes are in scope: core gate logic (critic.ts), integration into prompt.ts, and supporting tests. No unrelated refactoring or modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/critic-gate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/tool/critic.ts`:
- Around line 121-129: The isCommandPosition function treats shell assignment
words as normal arguments; update its backward token-scan to also skip
assignment words by recognizing tokens that match a shell variable assignment
pattern (e.g. /^[A-Za-z_][A-Za-z0-9_]*=.*$/) and treat them like flags/prefixes
(similar to TRANSPARENT_PREFIX), so tokens like FOO=1 or HOME=/ don't stop the
scan; apply the same change to the other identical token-scan logic elsewhere in
the file (the analogous loop that decides command position further down) so both
checks skip assignment words.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 82f5e058-6170-4af3-85b9-0262bd8ed2ac

📥 Commits

Reviewing files that changed from the base of the PR and between 6ad8b47 and 769af84.

📒 Files selected for processing (5)

packages/opencode/src/session/prompt.ts
packages/opencode/src/tool/critic.ts
packages/opencode/test/tool/critic-adversarial.test.ts
packages/opencode/test/tool/critic-e2e.test.ts
packages/opencode/test/tool/critic.test.ts

coderabbitai · 2026-06-01T06:41:19Z

+  function isCommandPosition(tokens: string[], i: number, sep: Set<string>): boolean {
+    for (let j = i - 1; j >= 0; j--) {
+      const t = tokens[j]
+      if (sep.has(t)) return true
+      if (t.startsWith("-")) continue // a flag (e.g. `sudo -E`, `bash -c`)
+      if (TRANSPARENT_PREFIX.has(t)) continue
+      return false // a real preceding word -> rm is an argument, not the command
+    }
+    return true // reached the start through only flags/prefixes


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Skip shell assignment words when deciding command position.

isCommandPosition() currently treats assignment words as normal arguments, so catastrophic commands like FOO=1 rm -rf / or env HOME=/ rm -rf / bypass the detector because rm is no longer seen in command position. That is a simple false negative for the default safety gate.

Suggested fix

+ function isAssignmentWord(token: string): boolean { + return /^[a-z_][a-z0-9_]*=.*/.test(token) + } + function isCommandPosition(tokens: string[], i: number, sep: Set<string>): boolean { for (let j = i - 1; j >= 0; j--) { const t = tokens[j] if (sep.has(t)) return true if (t.startsWith("-")) continue // a flag (e.g. `sudo -E`, `bash -c`) + if (isAssignmentWord(t)) continue if (TRANSPARENT_PREFIX.has(t)) continue return false // a real preceding word -> rm is an argument, not the command } return true // reached the start through only flags/prefixes }

Also applies to: 180-184

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opencode/src/tool/critic.ts` around lines 121 - 129, The isCommandPosition function treats shell assignment words as normal arguments; update its backward token-scan to also skip assignment words by recognizing tokens that match a shell variable assignment pattern (e.g. /^[A-Za-z_][A-Za-z0-9_]*=.*$/) and treat them like flags/prefixes (similar to TRANSPARENT_PREFIX), so tokens like FOO=1 or HOME=/ don't stop the scan; apply the same change to the other identical token-scan logic elsewhere in the file (the analogous loop that decides command position further down) so both checks skip assignment words.

cubic-dev-ai

2 issues found across 5 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/tool/critic.ts">

<violation number="1" location="packages/opencode/src/tool/critic.ts:32">
P1: `DEFAULT_GATED` uses `patch` instead of the real `apply_patch` tool id, so patch edits are not gated.</violation>

<violation number="2" location="packages/opencode/src/tool/critic.ts:125">
P2: `isCommandPosition` misses `rm` after `env` assignments (for example `env FOO=1 rm -rf /`), allowing dangerous commands to evade detection.</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai · 2026-06-01T06:45:08Z

+   * no-op gap today — but a product injecting a verifier for `sql_execute`/
+   * `dbt_run` must confirm those are native, not MCP, tools.
+   */
+  export const DEFAULT_GATED = ["bash", "write", "edit", "sql_execute", "dbt_run", "patch"]


P1: DEFAULT_GATED uses patch instead of the real apply_patch tool id, so patch edits are not gated.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/tool/critic.ts, line 32: <comment>`DEFAULT_GATED` uses `patch` instead of the real `apply_patch` tool id, so patch edits are not gated.</comment> <file context> @@ -0,0 +1,262 @@ + * no-op gap today — but a product injecting a verifier for `sql_execute`/ + * `dbt_run` must confirm those are native, not MCP, tools. + */ + export const DEFAULT_GATED = ["bash", "write", "edit", "sql_execute", "dbt_run", "patch"] + + export interface Verdict { </file context>

cubic-dev-ai · 2026-06-01T06:45:08Z

+    for (let j = i - 1; j >= 0; j--) {
+      const t = tokens[j]
+      if (sep.has(t)) return true
+      if (t.startsWith("-")) continue // a flag (e.g. `sudo -E`, `bash -c`)


P2: isCommandPosition misses rm after env assignments (for example env FOO=1 rm -rf /), allowing dangerous commands to evade detection.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/tool/critic.ts, line 125: <comment>`isCommandPosition` misses `rm` after `env` assignments (for example `env FOO=1 rm -rf /`), allowing dangerous commands to evade detection.</comment> <file context> @@ -0,0 +1,262 @@ + for (let j = i - 1; j >= 0; j--) { + const t = tokens[j] + if (sep.has(t)) return true + if (t.startsWith("-")) continue // a flag (e.g. `sudo -E`, `bash -c`) + if (TRANSPARENT_PREFIX.has(t)) continue + return false // a real preceding word -> rm is an argument, not the command </file context>

dev-punia-altimate

Multi-Persona Review — Verdict: comment

Multi-persona review completed.

6/6 agents completed · 32s · 0 findings (0 critical, 0 high, 0 medium)

_{Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·}

github-actions Bot added the contributor label Jun 1, 2026

anandgupta42 marked this pull request as ready for review June 1, 2026 06:37

claude Bot reviewed Jun 1, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Jun 1, 2026

View reviewed changes

dev-punia-altimate reviewed Jun 1, 2026

View reviewed changes

anandgupta42 mentioned this pull request Jun 1, 2026

feat: per-turn tool retrieval (trim tool-definition context flood) #858

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: pre-execution critic gate for side-effecting tools#863

feat: pre-execution critic gate for side-effecting tools#863
anandgupta42 wants to merge 1 commit into
mainfrom
feat/critic-gate

anandgupta42 commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

claude Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Uh oh!

dev-punia-altimate left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anandgupta42 commented Jun 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Issue for this PR

How did you verify your code works?

Checklist

Summary by cubic

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

dev-punia-altimate left a comment

Choose a reason for hiding this comment

Multi-Persona Review — Verdict: comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anandgupta42 commented Jun 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 1, 2026 •

edited

Loading