Skip to content

fix(bash-policy): allow pwd and ls / ls <path> to cut top-deny pattern#798

Merged
kelsonpw merged 1 commit into
mainfrom
fix/bash-policy-readonly-allowlist
May 15, 2026
Merged

fix(bash-policy): allow pwd and ls / ls <path> to cut top-deny pattern#798
kelsonpw merged 1 commit into
mainfrom
fix/bash-policy-readonly-allowlist

Conversation

@kelsonpw
Copy link
Copy Markdown
Member

@kelsonpw kelsonpw commented May 15, 2026

Dashboard signal (May 8 → May 15, 2026)

From the wizard's self-reported Amplitude project (Amplitude 2.0, appId 187520):

Metric Value
wizard cli: error encountered with error category=Bash Policy 285 events / 14 days (peak ~80/day, 80% of failures)
Deny-reason breakdown 43% disallowed-pipe / 37% not-in-allowlist / 10% multi-pipe / 7% dangerous-ops
wizard cli: bash deny circuit breaker tripped last-command samples (7d) 5 of 10 are bare ls "<path>"; remaining are `find ...
Post-agent funnel (agent completedoutro reached) 78% conversion (22% leak — distinct, follow-up item)
Active-phase stalls 90 / 7d, all in active phase, last-message-type assistant (49) / system (32)

The circuit breaker fires after 5 consecutive Bash denies and terminates the run. So even though only ~6/week trips fire, each one kills an active wizard run and feeds the post-agent leak.

What this PR does

Adds a narrowly bounded read-only inspection allowlist to wizardCanUseTool in src/lib/agent/tool-policy.ts:

pwd
ls
ls <bare-path>
ls "<quoted multi-word path>"
ls '<single-quoted path>'

Placed after every existing deny rule (dangerous operators, pipes, &, redirects, multi-pipes, command substitution, chaining) so the existing safety net stays in front. The new helper isReadOnlyInspectionCommand also re-rejects the same metacharacters as defense in depth.

Deliberately excluded (each documented in code):

  • find-exec/-execdir is arbitrary code execution
  • cd — stateful shell builtin; enables cd && <cmd> bypass attempts
  • cat / grep — already covered by Read / Grep tools (and their .env deny path)
  • ls flags — every flag widens the surface; kept at zero

The denial message at the bottom of the bash hook now also lists pwd / ls so future runs learn the allow path.

Expected lift

  • not in allowlist is 37% of Bash Policy denies (~30/day at peak).
  • From the circuit-breaker last command samples, ~50–70% of that bucket is ls/pwd.
  • Expected reduction in total Bash Policy denies: ~15–25%.
  • Proportional reduction in circuit-breaker trips → recovering ~half of the ~1/day terminated runs → ~2% activation lift at ~27 runs/day peak, on top of fewer wasted agent turns / faster runs for everyone else.

Test plan

  • pnpm tsc --noEmit — clean
  • pnpm lint — clean
  • pnpm test4308 / 4308 pass (288 files)
  • 17 new regression tests in src/lib/__tests__/agent-interface.test.ts covering happy path (bare pwd, bare ls, ls /bare/path, double-quoted and single-quoted multi-word paths) and attack vectors (flags, two positionals, pipe-to-head, command substitution, backticks, && chaining, redirect-to-file, quote-injection-with-semicolon, find, cd, pwd -P, empty-string quoted arg).
  • src/utils/wizard-abort.ts — untouched
  • src/ui/tui/ — untouched (design-kit track owns UX changes)

Deferred — top-3 ROI follow-ups

  1. Extend pipe-to-tail/head to cover ls/pwd — would catch the remaining ls /Users/foo 2>/dev/null | head -30 samples (medium effort, needs careful normalization tests).
  2. Post-agent outro funnel investigation (agent completedoutro reached = 78%; 22% leak). Likely needs UX track buy-in — blocked behind design-kit redesign.
  3. Active-phase stall investigation (90 stalls / 7d, all in active phase after assistant/system messages). Likely upstream Anthropic/Vertex network behaviour, not a wizard bug — needs network-trace tooling to confirm.

🤖 Generated with Claude Code


Note

Medium Risk
Modifies the wizard’s Bash execution allowlist, which is security-sensitive, but the new allow path is narrowly constrained (pwd and ls with at most one path argument) and is exercised by comprehensive regression tests.

Overview
Reduces unnecessary wizard run failures by allowing a strictly bounded set of read-only Bash inspection commands: pwd, ls, and ls <single-path> (supports quoted multi-word paths, no flags). The Bash policy continues to reject dangerous operators, pipes, redirects, chaining, substitutions, etc., and updates the deny message to document the new allowed commands.

Adds a focused regression test suite covering the allowed pwd/ls shapes and common bypass attempts (flags, multiple args, pipes, substitutions, redirects, &&, quote injection, and explicitly denying find/cd).

Reviewed by Cursor Bugbot for commit 2e9e611. Bugbot is set up for automated code reviews on this repo. Configure here.

…ttern

## Dashboard signal (May 8-15, 2026)

- `wizard cli: error encountered` with `error category=Bash Policy`:
  285 events / 14 days, peak ~80/day. Dominant failure mode.
- Deny-reason breakdown:
  - 121 (43%) `disallowed pipe`
  - 105 (37%) `not in allowlist`     <-- this PR addresses this slice
  -  29 (10%) `multiple pipes`
  -  20  (7%) `dangerous operators`
- `wizard cli: bash deny circuit breaker tripped` last-command samples
  (7d): 5 of 10 are bare `ls "<path>"`, plus `find` variants. The
  circuit breaker halts the run after 5 consecutive denies, which then
  feeds the post-agent leak (78% completed -> outro reached).

## The fix

Add a narrowly bounded read-only inspection allowlist (`pwd`, `ls`,
`ls <single-path>`) to `wizardCanUseTool`. Placed AFTER every existing
deny rule so:

- shell metacharacters (`; \` $ ( )`)  -- already denied above
- pipes / `&` background                -- already denied above
- multi-pipe / redirect-to-file         -- already denied above

The new check only allows the exact shapes:

  pwd
  ls
  ls /bare/path
  ls "/quoted/multi-word path"
  ls '/single-quoted/path'

`find` is deliberately excluded (`-exec` is arbitrary code exec).
`cd` is deliberately excluded (stateful builtin enables chained
`cd && <cmd>` bypass attempts). `ls` flags are rejected to keep the
attack surface zero.

## Expected lift

`not in allowlist` is 37% of Bash Policy denies (~30/day at peak).
Eliminating the `ls`/`pwd` slice (estimated ~50-70% of that bucket
from the circuit-breaker last-command samples) should cut total Bash
Policy denies by ~15-25%, which proportionally reduces circuit-
breaker trips (currently ~1/day terminating a run). At ~27 runs/day
peak, recovering even half those terminated runs is a ~2% activation
lift -- on top of the silent benefit of fewer wasted agent turns on
deny-retry loops.

## Verification

- `pnpm tsc --noEmit`: clean
- `pnpm lint`: clean
- `pnpm test`: 4308 / 4308 pass (288 files)
- 17 new regression tests covering the allow path and every adjacent
  attack vector (quote injection, chained `&&`, redirect, `$()`,
  backticks, second positional, flags, etc.)
- `src/utils/wizard-abort.ts`: untouched
- `src/ui/tui/`: untouched

## Deferred (top-3 ROI follow-ups)

1. Pipe-to-tail/head extension to cover `ls`/`pwd` (would catch the
   remaining `ls | head` / `find | head -50` samples). Risk: medium,
   needs careful redirect normalization tests.
2. Post-agent outro funnel investigation (`agent completed` ->
   `outro reached` = 78% conversion, 22% leak). Needs UX track buy-in;
   blocked by design-kit redesign.
3. Active-phase stall investigation (90/7d, all in `active` phase
   after assistant/system messages). Likely upstream Anthropic/Vertex
   network, not a wizard bug. Needs network-trace tooling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kelsonpw kelsonpw requested a review from a team as a code owner May 15, 2026 22:26
@kelsonpw kelsonpw merged commit 4e0ae9f into main May 15, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant