Problem
There is no pre-execution check on side-effecting tool calls (bash/write/edit/sql_execute/dbt_run/patch). A model can propose a catastrophic action (e.g. rm -rf /) and it runs immediately.
Proposal
Add a flag-gated (ALTIMATE_CRITIC_GATE, default OFF) pre-execution "critic gate" wired into the native-tool execute path. Before a side-effecting tool runs, a pluggable Verifier checks the proposed args; on a hard verdict the call is skipped and a feedback string is returned in place of execution so the model can fix-and-retry.
Ship a conservative, dependency-free default verifier (basicSafetyVerifier) that blocks only catastrophic, unambiguous host-destructive bash (rm -rf / and variants, fork bombs, mkfs/dd on a raw device, recursive chmod of /). This is a best-effort safety net, not a security boundary; a product can inject a richer verifier via the Verifier interface.
Requirements
- Default OFF → true no-op (no behavior change when the flag is unset).
- Gate must never throw or hang (fail-open on verifier error/timeout).
- Extensive unit + adversarial + real e2e tests (no mocked tool calls).
Split out of #857 (inference-stack PR #858), where the gate was previously an unwired no-op flag.
Problem
There is no pre-execution check on side-effecting tool calls (bash/write/edit/sql_execute/dbt_run/patch). A model can propose a catastrophic action (e.g.
rm -rf /) and it runs immediately.Proposal
Add a flag-gated (
ALTIMATE_CRITIC_GATE, default OFF) pre-execution "critic gate" wired into the native-tool execute path. Before a side-effecting tool runs, a pluggableVerifierchecks the proposed args; on a hard verdict the call is skipped and a feedback string is returned in place of execution so the model can fix-and-retry.Ship a conservative, dependency-free default verifier (
basicSafetyVerifier) that blocks only catastrophic, unambiguous host-destructive bash (rm -rf /and variants, fork bombs,mkfs/ddon a raw device, recursive chmod of/). This is a best-effort safety net, not a security boundary; a product can inject a richer verifier via theVerifierinterface.Requirements
Split out of #857 (inference-stack PR #858), where the gate was previously an unwired no-op flag.