chore(agents): trim 6 largest agent prompts — 2925 → 2525 lines (−13.7%)#178
Merged
Conversation
Second-order token-economy optimization stacking on PR #175 (which reduced spawn FREQUENCY via gate-moment routing). This PR reduces per-spawn SIZE by trimming the genuine bloat from the 6 largest agent prompts. Per-file deltas: - incident-commander 218 → 126 (−92, −42%) - test-engineer 189 → 113 (−76, −40%) - stock-detail-auditor 177 → 120 (−57, −32%) - docs-reviewer 180 → 125 (−55, −31%) - release-captain 211 → 145 (−66, −31%) - security-reviewer 185 → 131 (−54, −29%) Total across 15-agent set: 2925 → 2525 lines (−400, −13.7%). What got cut (genuine bloat, NOT substance): - "Read these first" prose lists enumerating obvious files → compressed to one sentence pointing at CLAUDE.md / SKILL.md / AGENTS.md anchors - incident-commander's Step 6 post-mortem template (full schema) → defer to `9arm-post-mortem` skill, keep only the skeleton- emit instruction - test-engineer's "Project test conventions (memorize)" section duplicating AGENTS.md §Testing → "see AGENTS.md §Testing" - release-captain's "Recent releases" subsection (stale, lives in PHASE_STATUS.md) → removed - release-captain's Step 6 user-checklist (long prose) → one-line summary pointing at `release-tag` skill - security-reviewer's Section A-H verbose intros → compressed per-Section header + grep command + criteria - docs-reviewer's Step 2 "Substance check (per doc)" 6 prose blocks → single matrix table - stock-detail-auditor's verbose Step 1 recon prose + escalation- path narrative → kept the matrix, removed wrapper text What's PRESERVED in every trimmed agent: - YAML frontmatter (name, description, tools, model) — verbatim; no description shortening (the description is the auto-routing trigger and must stay rich for matcher quality) - Workflow steps with concrete commands - Hard constraints / "What you do NOT do" section - Output format template (compressed but complete) - Escalation tables to other agents - Tools list (no tool removed from any agent) Companion artifacts in same session: - Issue #176: STZ market_cap = null on 2026-05-14 cron (XBRL fact extraction missing `shares_outstanding`, related to issue #10) - Issue #177: 15 tickers |mos_pct| > 500% on 2026-05-14 cron (fair- price ensemble producing extreme estimates on growth/goodwill- heavy stocks: APP, AXON, CASY, CIEN, DD, DDOG, GE, HWM, ...) Both surfaced by the deterministic prefilter in `stock-detail- auditor` during PR #175 dry-run. Filed as separate issues; this PR intentionally does NOT include compute / valuation fixes — focuses purely on the agent-prompt-size lever. Doc lockstep: CLAUDE.md §Phase status gains "Trim agent prompts in flight (this PR)" entry. AGENTS.md §Claude-Code-specific tooling gains a paragraph on the keep-it-tight principle for future agent additions (trim target when an agent grows past ~150 lines). No compute / schema / scoring / valuation / frontend code change.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
6 tasks
dackclup
added a commit
that referenced
this pull request
May 23, 2026
…caps to drain Sonnet-only pool (#219) * chore(agents): reset sonnet sub-agent thoroughness — lift artificial caps to drain Sonnet-only pool User observation: "Weekly · Sonnet only" pool on the Max plan sits near 2% utilization while "Weekly · all models" moves normally — meaning sonnet sub-agents are under-used. Root cause: artificial work-bounding caps I added incidentally during PR #178 trim pass ("≤ 20 tickers", "terse", "do not exceed N file Reads") were treating sub-agent effort as a cost to minimize, when it's actually the intended use of the separately-billed Sonnet-only pool. Caps lifted: - `.claude/agents/stock-detail-auditor.md` Step 3 — removed 20-ticker hard cap. Agent now walks every prefilter-flagged ticker, dedups multi-rule hits, and may fetch 1-2 adjacent peers when a multi-ticker pattern is suspected. Added "DO NOT skip flagged tickers to keep the report short" hard constraint to codify the new principle. Frontmatter `description:` rewritten to remove the "≤ 20 flagged tickers" wording (kept the project's auto-routing cue intact otherwise). - `.claude/agents/quantrank-reviewer.md` Output format — removed "Reply terse" instruction. Agent now lists every PASS / FAIL / WARN finding while walking Sections A-H. - `.claude/agents/README.md` Flow 2 release ladder — release- captain's stock-detail-auditor lane no longer says "≤ 20 LLM verdicts"; says "thorough LLM verdicts for every flagged ticker". Roster row description updated to match. Policy update: - `CLAUDE.md` §Auto-routing policy §Spawn discipline — two new bullets: (a) "Don't gatekeep sub-agent effort" explaining the Max-plan dual-pool topology (sonnet sub-agents drain Weekly · Sonnet only; opus + main session drain Weekly · all models) and why bounding sub-agent output wastes paid budget (b) "Prefer delegation to sub-agents over inline main-session work" — when both options exist, route work through sonnet sub-agents so main-session tokens don't get spent doing what a thorough sonnet agent can do for free against a separate pool - `AGENTS.md` §Claude-Code-specific tooling — rewrote the "prompts are kept tight" paragraph to make explicit that the trim target is BOILERPLATE (read-these-first lists, duplicated material from the canonical docs), NOT investigation depth or output length. Hard prompt-size constraint ≠ hard work-size cap. Model assignments unchanged: 4 opus by design (incident-commander + release-captain + methodology-scientist + quantrank-reviewer) + 11 sonnet. Tested-and-reverted in same PR a temporary swap of quantrank-reviewer + methodology-scientist to sonnet — wrong interpretation of user intent. The fix is to stop capping work, not to demote models. Companion follow-up (not in this PR): per-session usage report post-merge that confirms Sonnet-only pool actually moves more after this lands. If not, re-investigate whether spawn frequency needs to increase (separate PR). No compute / schema / scoring / valuation / frontend code change. * chore(agents): also lift spawn frequency for sonnet sub-agents on non-trivial edits Stacks on the depth-only fix already in this PR. User followup: loosening per-spawn caps alone isn't enough — if sonnet agents don't spawn often enough, the Sonnet-only pool stays idle anyway. Adds six new edit-trigger rows to CLAUDE.md §Auto-routing policy that fire sonnet agents immediately on non-trivial edit to their domain (replacing the lean "Edits alone do NOT auto-spawn" rule from PR #175): | Edit | Spawns (sonnet) | |---|---| | Schema triple file | schema-sentinel | | compute/scoring or compute/valuation | defense-layer-auditor | | frontend/components or frontend/app | frontend-design-reviewer | | .github/workflows or new dep or new env-var | security-reviewer | | Prod code without same-PR test | test-engineer | | Any of 7 top-level docs | docs-reviewer | "Non-trivial" definition spelled out: > 5 added lines OR touches non-comment code OR adds/removes a public symbol. Comment / whitespace / single-line fixes do not trigger. Opus agents (incident-commander · release-captain · methodology-scientist · quantrank-reviewer) keep the rare-fire policy — they bill against the "Weekly · all models" pool, so firing them more often does not help drain the underutilized Sonnet-only pool. The "ready to push" gate stays as a safety-net re-batch: opus reviewer + phase-coordinator fire fresh; sonnet agents skip via the existing 10-min dedup window if they already ran on the same diff during the edit-trigger pass. So worst-case spawn count per PR rises ~2-3× vs PR #175 baseline, but every extra spawn drains the paid-for-but-currently-idle Sonnet-only pool. AGENTS.md §Claude-Code-specific tooling — added paragraph describing the spawn-frequency discipline so cross-tool readers (Copilot / Cursor / Devin) understand the dual-pool topology. §Phase status entry in CLAUDE.md updated to describe both lifts (per-spawn cap AND spawn frequency) as one two-part change. No compute / schema / scoring / valuation / frontend code change. --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Second-order token-economy optimization stacking on #175 (which reduced spawn FREQUENCY via gate-moment routing). This PR reduces per-spawn SIZE by trimming bloat from the 6 largest agent prompts.
The two PRs are complementary:
What got cut
incident-commander.mdtest-engineer.mdstock-detail-auditor.mddocs-reviewer.mdrelease-captain.mdsecurity-reviewer.mdWhole
.claude/agents/directory: 2925 → 2525 lines (−13.7%).What was cut (genuine bloat, NOT substance)
incident-commander's Step 6 post-mortem template (full schema duplicated from9arm-post-mortemskill) — keep only the skeleton-emit instruction; defer the full template to the skilltest-engineer's "Project test conventions (memorize)" section duplicating AGENTS.md §Testing — replaced with "see AGENTS.md §Testing"release-captain's "Recent releases" subsection (stale; lives in PHASE_STATUS.md) — removedrelease-captain's Step 6 user-checklist (long prose) — one-line summary pointing atrelease-tagskillsecurity-reviewer's Section A-H verbose intros — compressed to per-Section header + grep command + criteria bulletsdocs-reviewer's Step 2 6-prose-block "Substance check (per doc)" — collapsed to a single matrix tablestock-detail-auditor's verbose Step 1 recon prose + escalation-path narrative — kept the matrix, removed wrapper textWhat's PRESERVED in every trimmed agent
name,description,tools,model) — the description is the auto-routing trigger; shortening it would degrade matcher qualityCompanion issues filed (from the auditor's dry-run)
While building the
stock-detail-auditorin PR #175, its deterministic prefilter surfaced real data bugs on the 2026-05-14 cron. Filed as separate issues so they can be picked up independently:market_cap: null. XBRL fact extraction missingshares_outstanding,eps_basic,eps_diluted,research_and_development. Related to issue Phase-3 fundamentals: shares_outstanding ingested wrong for ~12 tickers #10 (shares_outstandingbug for ~12 tickers), but STZ is a different failure mode: "all None" vs "wrong value". Pickup ladder:edgar-debugger→ patchcompute/ingest/fundamentals.py+ tests.|mos_pct| > 500%: APP / AXON / CASY / CIEN / DD / DDOG / GE / HWM / ... All show 4-6 of 6 valuation methods firingextreme_*_estimatewarnings simultaneously on the same ticker. Methodology issue (fair-price ensemble producing wildly disagreeing estimates on growth/goodwill-heavy stocks), not input corruption. Pickup ladder:methodology-scientist→ reviewcompute/valuation/ensemble.pymedian-exclusion logic.This PR intentionally does NOT include the compute / valuation fixes — focuses purely on the agent-prompt-size lever.
Doc lockstep (§Conventions)
Test plan
name/description/tools/modellines present)## What you do NOT dosection preserved in all 6 (grep returns 1 occurrence each)description:line shortened (those are auto-routing triggers — must stay rich for matcher quality)stock-detail-auditoron the next cron — verify it still produces the same SCHEMA/CONSISTENCY/RULE_16/KNOWN_ISSUE structured output as the pre-trim version did during feat(agents): lean auto-routing policy + 15th agent stock-detail-auditor #175 dry-run (no behavior regression)quantrank-reviewerstill walks the same Sections A-H (it wasn't trimmed in this PR; the gate behavior should be identical to feat(agents): lean auto-routing policy + 15th agent stock-detail-auditor #175 baseline)Estimated token savings
At a typical English line of ~14 tokens, 400 line cut ≈ 5600 tokens of system-prompt context saved per spawn of any of the 6 trimmed agents.
For a typical PR cycle (post #175):
Real savings depend on which agents fire, but the trimmed 6 are exactly the ones that fire most often (incident-commander on incidents, test-engineer on new code, security-reviewer pre-push, etc.).
Out of scope
README.md(303 lines): it's the directory's TOC + 6 coordination flows; lines pay back as a single read-once reference for future agent additions.description:lines: those are the auto-routing triggers. Shorter description = worse matcher quality. Out of scope by design.Reverse-action plan
If a trimmed agent shows behavior regression in real use, the full original prompt is available via
git show <pre-trim-sha>:.claude/agents/<agent>.mdand can be restored selectively (just the part that mattered, not the whole pre-trim file).Generated by Claude Code