fix(bash-policy): allow pwd and ls / ls <path> to cut top-deny pattern#798
Merged
Conversation
…ttern ## Dashboard signal (May 8-15, 2026) - `wizard cli: error encountered` with `error category=Bash Policy`: 285 events / 14 days, peak ~80/day. Dominant failure mode. - Deny-reason breakdown: - 121 (43%) `disallowed pipe` - 105 (37%) `not in allowlist` <-- this PR addresses this slice - 29 (10%) `multiple pipes` - 20 (7%) `dangerous operators` - `wizard cli: bash deny circuit breaker tripped` last-command samples (7d): 5 of 10 are bare `ls "<path>"`, plus `find` variants. The circuit breaker halts the run after 5 consecutive denies, which then feeds the post-agent leak (78% completed -> outro reached). ## The fix Add a narrowly bounded read-only inspection allowlist (`pwd`, `ls`, `ls <single-path>`) to `wizardCanUseTool`. Placed AFTER every existing deny rule so: - shell metacharacters (`; \` $ ( )`) -- already denied above - pipes / `&` background -- already denied above - multi-pipe / redirect-to-file -- already denied above The new check only allows the exact shapes: pwd ls ls /bare/path ls "/quoted/multi-word path" ls '/single-quoted/path' `find` is deliberately excluded (`-exec` is arbitrary code exec). `cd` is deliberately excluded (stateful builtin enables chained `cd && <cmd>` bypass attempts). `ls` flags are rejected to keep the attack surface zero. ## Expected lift `not in allowlist` is 37% of Bash Policy denies (~30/day at peak). Eliminating the `ls`/`pwd` slice (estimated ~50-70% of that bucket from the circuit-breaker last-command samples) should cut total Bash Policy denies by ~15-25%, which proportionally reduces circuit- breaker trips (currently ~1/day terminating a run). At ~27 runs/day peak, recovering even half those terminated runs is a ~2% activation lift -- on top of the silent benefit of fewer wasted agent turns on deny-retry loops. ## Verification - `pnpm tsc --noEmit`: clean - `pnpm lint`: clean - `pnpm test`: 4308 / 4308 pass (288 files) - 17 new regression tests covering the allow path and every adjacent attack vector (quote injection, chained `&&`, redirect, `$()`, backticks, second positional, flags, etc.) - `src/utils/wizard-abort.ts`: untouched - `src/ui/tui/`: untouched ## Deferred (top-3 ROI follow-ups) 1. Pipe-to-tail/head extension to cover `ls`/`pwd` (would catch the remaining `ls | head` / `find | head -50` samples). Risk: medium, needs careful redirect normalization tests. 2. Post-agent outro funnel investigation (`agent completed` -> `outro reached` = 78% conversion, 22% leak). Needs UX track buy-in; blocked by design-kit redesign. 3. Active-phase stall investigation (90/7d, all in `active` phase after assistant/system messages). Likely upstream Anthropic/Vertex network, not a wizard bug. Needs network-trace tooling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Dashboard signal (May 8 → May 15, 2026)
From the wizard's self-reported Amplitude project (
Amplitude 2.0, appId 187520):wizard cli: error encounteredwitherror category=Bash Policywizard cli: bash deny circuit breaker trippedlast-command samples (7d)ls "<path>"; remaining are `find ...agent completed→outro reached)activephase, last-message-type assistant (49) / system (32)The circuit breaker fires after 5 consecutive Bash denies and terminates the run. So even though only ~6/week trips fire, each one kills an active wizard run and feeds the post-agent leak.
What this PR does
Adds a narrowly bounded read-only inspection allowlist to
wizardCanUseToolinsrc/lib/agent/tool-policy.ts:Placed after every existing deny rule (dangerous operators, pipes,
&, redirects, multi-pipes, command substitution, chaining) so the existing safety net stays in front. The new helperisReadOnlyInspectionCommandalso re-rejects the same metacharacters as defense in depth.Deliberately excluded (each documented in code):
find—-exec/-execdiris arbitrary code executioncd— stateful shell builtin; enablescd && <cmd>bypass attemptscat/grep— already covered byRead/Greptools (and their.envdeny path)lsflags — every flag widens the surface; kept at zeroThe denial message at the bottom of the bash hook now also lists
pwd/lsso future runs learn the allow path.Expected lift
not in allowlistis 37% of Bash Policy denies (~30/day at peak).last commandsamples, ~50–70% of that bucket isls/pwd.Test plan
pnpm tsc --noEmit— cleanpnpm lint— cleanpnpm test— 4308 / 4308 pass (288 files)src/lib/__tests__/agent-interface.test.tscovering happy path (barepwd, barels,ls /bare/path, double-quoted and single-quoted multi-word paths) and attack vectors (flags, two positionals, pipe-to-head, command substitution, backticks,&&chaining, redirect-to-file, quote-injection-with-semicolon,find,cd,pwd -P, empty-string quoted arg).src/utils/wizard-abort.ts— untouchedsrc/ui/tui/— untouched (design-kit track owns UX changes)Deferred — top-3 ROI follow-ups
tail/headto coverls/pwd— would catch the remainingls /Users/foo 2>/dev/null | head -30samples (medium effort, needs careful normalization tests).agent completed→outro reached= 78%; 22% leak). Likely needs UX track buy-in — blocked behind design-kit redesign.activephase after assistant/system messages). Likely upstream Anthropic/Vertex network behaviour, not a wizard bug — needs network-trace tooling to confirm.🤖 Generated with Claude Code
Note
Medium Risk
Modifies the wizard’s Bash execution allowlist, which is security-sensitive, but the new allow path is narrowly constrained (
pwdandlswith at most one path argument) and is exercised by comprehensive regression tests.Overview
Reduces unnecessary wizard run failures by allowing a strictly bounded set of read-only Bash inspection commands:
pwd,ls, andls <single-path>(supports quoted multi-word paths, no flags). The Bash policy continues to reject dangerous operators, pipes, redirects, chaining, substitutions, etc., and updates the deny message to document the new allowed commands.Adds a focused regression test suite covering the allowed
pwd/lsshapes and common bypass attempts (flags, multiple args, pipes, substitutions, redirects,&&, quote injection, and explicitly denyingfind/cd).Reviewed by Cursor Bugbot for commit 2e9e611. Bugbot is set up for automated code reviews on this repo. Configure here.