perf(agent): tighten commandments + framework prompts for scale and quality#799
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Commandment falsely claims
--force-with-leaseis scanner-blocked- Removed
/--force-with-lease`` from the commandment's blocked list since the safety scanner intentionally allows it via a negative lookahead.
- Removed
- ✅ Fixed: Test misses three safety-scanner rules it claims to cover
- Added the three missing scanner shapes (
git checkout .,git restore,git clean -f) and split the test arrays intoscannerShapesandallowlistShapeswith an accurate comment.
- Added the three missing scanner shapes (
Or push these changes by commenting:
@cursor push 7da65a036c
Preview (7da65a036c)
diff --git a/src/lib/__tests__/commandments.test.ts b/src/lib/__tests__/commandments.test.ts
--- a/src/lib/__tests__/commandments.test.ts
+++ b/src/lib/__tests__/commandments.test.ts
@@ -615,18 +615,20 @@
it('pre-empts destructive bash commands by name', () => {
// The agent should learn about these from the prompt, not by tripping
- // safety-scanner.ts and burning a retry cycle. Each pattern matches a
- // rule in `src/lib/safety-scanner.ts`.
- const blockedShapes = [
+ // safety-scanner.ts and burning a retry cycle. Each pattern in
+ // `scannerShapes` matches a rule in `src/lib/safety-scanner.ts`;
+ // `allowlistShapes` are blocked by the bash allowlist, not the scanner.
+ const scannerShapes = [
'rm -rf',
'git reset --hard',
'git push --force',
+ 'git checkout .',
+ 'git restore',
+ 'git clean -f',
'curl ... | sh',
- 'install -g',
- 'publish',
- 'sudo',
];
- for (const shape of blockedShapes) {
+ const allowlistShapes = ['install -g', 'publish', 'sudo'];
+ for (const shape of [...scannerShapes, ...allowlistShapes]) {
expect(
text,
`commandments should pre-empt "${shape}" so the agent never burns a retry discovering it's blocked.`,
diff --git a/src/lib/commandments.ts b/src/lib/commandments.ts
--- a/src/lib/commandments.ts
+++ b/src/lib/commandments.ts
@@ -71,7 +71,7 @@
// Motivated by Bash Policy denies at ~80/day peak. These commands are
// hard-blocked by safety-scanner.ts; pre-empting saves a retry cycle.
- 'NEVER attempt these destructive bash commands — they are pre-blocked by the safety scanner and no rephrasing changes the outcome: `rm -rf` (any form), `git reset --hard`, `git push --force` / `--force-with-lease`, `git checkout .` / broad `git restore`, `git clean -f`, `curl ... | sh` / `wget ... | bash` (any pipe-to-shell), `npm install -g` / `pnpm add -g` / `yarn global add`, `npm publish` / `pnpm publish` / `yarn publish`, `sudo` (anything). The wizard never needs these. If a workflow seems to require one, the workflow itself is wrong — note it in the setup report and proceed without.',
+ 'NEVER attempt these destructive bash commands — they are pre-blocked by the safety scanner and no rephrasing changes the outcome: `rm -rf` (any form), `git reset --hard`, `git push --force`, `git checkout .` / broad `git restore`, `git clean -f`, `curl ... | sh` / `wget ... | bash` (any pipe-to-shell), `npm install -g` / `pnpm add -g` / `yarn global add`, `npm publish` / `pnpm publish` / `yarn publish`, `sudo` (anything). The wizard never needs these. If a workflow seems to require one, the workflow itself is wrong — note it in the setup report and proceed without.',
'When a wizard tool returns a structured error payload (`{"success": false, "error": ..., "guidance": ..., "suggestedTool": ..., "suggestedArgs": ..., "context": ...}`), READ the `guidance` field and follow it. If `suggestedTool` / `suggestedArgs` are present, call THAT tool with THOSE args next — do NOT retry the failing tool with the same args. The same shape comes back for PreToolUse denials (Bash policy, denied paths, denied event-plan / dashboard writes). Treating structured errors as recovery instructions is the difference between a 1-turn fix and a 5-turn loop that trips the consecutive-deny circuit breaker.',You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit 957305b. Configure here.
|
|
||
| // Motivated by Bash Policy denies at ~80/day peak. These commands are | ||
| // hard-blocked by safety-scanner.ts; pre-empting saves a retry cycle. | ||
| 'NEVER attempt these destructive bash commands — they are pre-blocked by the safety scanner and no rephrasing changes the outcome: `rm -rf` (any form), `git reset --hard`, `git push --force` / `--force-with-lease`, `git checkout .` / broad `git restore`, `git clean -f`, `curl ... | sh` / `wget ... | bash` (any pipe-to-shell), `npm install -g` / `pnpm add -g` / `yarn global add`, `npm publish` / `pnpm publish` / `yarn publish`, `sudo` (anything). The wizard never needs these. If a workflow seems to require one, the workflow itself is wrong — note it in the setup report and proceed without.', |
There was a problem hiding this comment.
Commandment falsely claims --force-with-lease is scanner-blocked
Medium Severity
The destructive-bash commandment lists `git push --force` / `--force-with-lease` as "pre-blocked by the safety scanner and no rephrasing changes the outcome." However, safety-scanner.ts explicitly allows --force-with-lease via a (?!-with-lease) negative lookahead, and the scanner's own deny message suggests it as a safer alternative. This creates a bidirectional contradiction: the prompt claims something is hard-blocked that is in fact intentionally permitted, and if the agent ever encounters the scanner's suggestion to use --force-with-lease, the prompt has already told it that's also impossible. The / separator in the listing unambiguously reads as "both variants are blocked," matching the pattern used elsewhere in the same sentence.
Reviewed by Cursor Bugbot for commit 957305b. Configure here.
| text, | ||
| `commandments should pre-empt "${shape}" so the agent never burns a retry discovering it's blocked.`, | ||
| ).toContain(shape); | ||
| } |
There was a problem hiding this comment.
Test misses three safety-scanner rules it claims to cover
Low Severity
The test comment says "Each pattern matches a rule in src/lib/safety-scanner.ts" but the blockedShapes array omits three actual safety-scanner rules — git checkout (broad), git restore (broad), and git clean -f — while including three shapes (install -g, publish, sudo) that have no corresponding safety-scanner rule (they're blocked by a separate bash allowlist). Future edits removing those scanner-matched shapes from the commandments won't be caught by this test.
Reviewed by Cursor Bugbot for commit 957305b. Configure here.
…uality Three load-bearing rules added to the wizard's always-on commandments, plus a contradiction fix in the supplement skill that was causing agents to attempt dashboard work that the universal rule forbids. Motivating dashboard signals: 1. Strategy retry cap (3-approach ceiling) Closes the ~5pt completion→activation gap. Agents finish nominally but have looped on the same broken approach. Caps strategic retry at three attempts per goal and forces escalation into the setup report's "Known limitations" section instead of silent looping. 2. Destructive bash pre-emption Bash Policy denies peaked ~80/day. The safety scanner already blocks `rm -rf`, `git reset --hard`, `git push --force`, `curl … | sh`, `install -g`, `publish`, and `sudo` — but the prompt never said so, so agents discovered the deny rule by hitting it. Pre-emption saves ~one full retry cycle per occurrence. 3. Monorepo scope clamp Large monorepos saw cross-package edits the user never asked for. The rule restricts default work to the install-dir subtree, and forces a `wizard_feedback` confirmation when the install dir is itself a workspace root with ambiguous intent. Also (deduplication / contradiction fix): 4. Supplement skill no longer instructs agents to create dashboards The `post-instrumentation-events-and-dashboard.md` reference still said "create 4–6 charts and a dashboard via the Amplitude MCP, then call `record_dashboard`" — directly contradicting the universal commandment forbidding chart/dashboard tools in this run (DEFER _DASHBOARD_PLAN PR 4). Reconciled to match. 5. Setup Report requirements now demand a structured Files Changed table with +/- line counts and an Events table with file + line metadata, raising receipt quality. Token delta (assembled commandments, per-turn cached): Universal: 4972 → 5405 tokens (+433, +8.7%) Browser: 6160 → 6593 tokens (+433, +7.0%) Each added rule trades ~150 cached tokens for one avoided retry/loop cycle (1500–5000 tokens). Net positive at expected occurrence rates. Regression test: `scale + safety guardrails commandments` in `src/lib/__tests__/commandments.test.ts` pins all three new sentinels (strategy cap, destructive-bash list, monorepo scope language). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
957305b to
c4f6e6f
Compare
|
Could not push Autofix changes. The PR branch may have changed since the Autofix ran, or the Autofix commit may no longer exist. |



Summary
Three load-bearing rules added to the wizard's always-on commandments, plus a contradiction fix in the supplement skill that was causing agents to attempt dashboard work the universal rule forbids.
Changes shipped (ranked by ROI)
1. Strategy retry cap (3-approach ceiling)
Motivating signal: ~5pt completion→activation drop. Agents finish nominally but have looped on the same broken approach. The existing "retry budget 1, after 2 failures STOP" rule covers tactical retries; this caps strategic retry at 3 distinct approaches per goal and forces escalation into the setup report's "Known limitations" section.
2. Destructive bash pre-emption
Motivating signal: Bash Policy denies at ~80/day peak.
src/lib/safety-scanner.tsalready blocksrm -rf,git reset --hard,git push --force,curl … | sh,install -g,publish, andsudo— but the prompt never said so. Agents discovered the deny rule by hitting it, paying a full retry cycle each time. Pre-emption names every blocked shape upfront.3. Monorepo scope clamp
Motivating signal: Large-monorepo failure mode where 1-event runs turned into 30-file blast-radius PRs. The rule restricts default work to the install-dir subtree and forces a
wizard_feedbackconfirmation hop when the install dir is itself a workspace root with ambiguous intent (single package vs whole workspace).4. Supplement skill contradiction fix (deduplication)
post-instrumentation-events-and-dashboard.mdstill said "create 4–6 charts and a dashboard via the Amplitude MCP, then callrecord_dashboard" — directly contradicting the universal commandment forbidding chart/dashboard tools in this run (DEFER_DASHBOARD_PLAN PR 4). Reconciled.5. Setup Report receipt quality
setup-report-requirements.mdnow demands a structured Files Changed table with +/- line counts and an Events table with file + line metadata, lifting the receipts ledger above narrative prose.Token delta
Assembled commandments (per-turn, prompt-cached):
Each added rule trades ~150 cached tokens for one avoided retry/loop cycle (1500–5000 uncached tokens per loop). Net positive at expected occurrence rates — the Bash Policy denies alone (~80/day × ~3000 tokens/avoidance) dwarf the per-turn overhead.
Regression test
scale + safety guardrails commandmentsinsrc/lib/__tests__/commandments.test.tspins:Known limitationsescalation sentinelsafety-scanner.ts)wizard_feedbackescalation hop3 new tests, 38 total in the file (was 35). All 4294 tests pass on the branch.
Top 3 deferred improvements
src/frameworks/generic/generic-wizard-agent.ts(11KB) inline prompt. Heavy overlap with both commandments and the browser-sdk-init-defaults supplement reference. Would save ~1KB on the generic-fallback path but needs a careful pass to preserve the CSP / Netlify-redirect guidance specific to static sites.wizard-tools.tstool descriptions. Several descriptions duplicate phrasing the commandments already established (e.g.check_env_keysvs the universal "never use Bash to verify env vars" rule). Light gain (~100 tokens), high test coverage churn.Test plan
pnpm tsc --noEmitcleanpnpm lintcleanpnpm test(4294 tests, all passing)src/utils/wizard-abort.tsuntouchedsrc/ui/tui/untouched/tmp/prompt-auditwill be removed after PR open--agentmode dry-run on a sample Next.js repoBash Policy deniesrate drops in production telemetry over the week after merge🤖 Generated with Claude Code
Note
Medium Risk
Medium risk because it changes the wizard’s core prompt/guardrails and adds strict test sentinels, which can alter agent behavior across all runs and cause brittle test failures on copy edits.
Overview
Tightens the wizard’s universal commandment prompt with three new guardrails: a 3-approach strategy retry cap that forces escalation to a setup report Known limitations section, explicit pre-emption of destructive bash command shapes, and a monorepo scope clamp that defaults work to the install-directory subtree and requires
wizard_feedbackbefore cross-workspace edits.Updates the prompt supplement to clarify
.amplitude/events.jsonownership/shape (optionally adding a per-eventfilepointer) and to defer all chart/dashboard work to the separateamplitude-wizard dashboardflow, including removing dashboard link expectations from the setup report.Strengthens setup report requirements by mandating receipts-style tables (files changed with +/- line counts; per-
track()call event table with file+line+properties), and adds regression tests that pin the new guardrail language to prevent drift.Reviewed by Cursor Bugbot for commit c4f6e6f. Bugbot is set up for automated code reviews on this repo. Configure here.