Skip to content

Pre-execution critic gate for side-effecting tools #862

@anandgupta42

Description

@anandgupta42

Problem

There is no pre-execution check on side-effecting tool calls (bash/write/edit/sql_execute/dbt_run/patch). A model can propose a catastrophic action (e.g. rm -rf /) and it runs immediately.

Proposal

Add a flag-gated (ALTIMATE_CRITIC_GATE, default OFF) pre-execution "critic gate" wired into the native-tool execute path. Before a side-effecting tool runs, a pluggable Verifier checks the proposed args; on a hard verdict the call is skipped and a feedback string is returned in place of execution so the model can fix-and-retry.

Ship a conservative, dependency-free default verifier (basicSafetyVerifier) that blocks only catastrophic, unambiguous host-destructive bash (rm -rf / and variants, fork bombs, mkfs/dd on a raw device, recursive chmod of /). This is a best-effort safety net, not a security boundary; a product can inject a richer verifier via the Verifier interface.

Requirements

  • Default OFF → true no-op (no behavior change when the flag is unset).
  • Gate must never throw or hang (fail-open on verifier error/timeout).
  • Extensive unit + adversarial + real e2e tests (no mocked tool calls).

Split out of #857 (inference-stack PR #858), where the gate was previously an unwired no-op flag.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions