feat(agents): lean auto-routing policy + 15th agent stock-detail-auditor#175
Merged
Conversation
Two coupled changes to the agent layer that address the token-cost issue surfaced after the 14-agent enterprise rollout (PR #168). === Change A: §Auto-routing policy rewrite (gate-moment biased) === The original 17-row table fired most cues on EDIT events ("Parallel with the edit"). On a multi-file edit (schemas.py + compute/scoring/ + frontend/components/) that meant 3-5 parallel agent spawns per diff — token cost compounded fast. The rewrite: - Adds a "lean-by-design" intro paragraph explaining the gate-moment discipline. - Splits cues into two implicit tiers in one table: (i) signal-driven (auto-spawn immediately — test failure, cron perf regression, Dependabot, prod failure, scheduled audit, rare methodology event) — KEPT as-is (ii) gate-moment (spawn at "ready to push" / explicit ask) — REWORDED: the "ready to push" cue now lists conditional batch-mates (schema-sentinel / defense-layer-auditor / frontend-design-reviewer / docs-reviewer / security-reviewer / test-engineer) that fire as part of ONE parallel batch instead of N separate spawns per edit. - Adds explicit "Edits alone do NOT auto-spawn" note above the table — the schema-triple hook covers per-edit reminders. - Spawn discipline rewritten: "Default model = sonnet" with a short list of opus exceptions; "Parallel at gate moments, not on every edit"; per-session disable phrase ("spawn only on explicit ask this session"). === Change B: 15th agent — stock-detail-auditor (Tier 1, sonnet) === New data-correctness reviewer for the per-stock JSON the frontend renders at /stock/[ticker]. Covers what quantrank-reviewer deliberately won't: output correctness of cron-generated JSON. Two-step protocol that bounds token cost: 1. Deterministic prefilter (no LLM in the loop): - SCHEMA violations: composite_score / pillar_scores out of [0, 100]; current_price ≤ 0; market_cap ≤ 0; fair_price.median ≤ 0 or > $10K; rank out of [1, universe_size] - CONSISTENCY bugs: |market_cap − price × shares| / market_cap > 5% (issue #10 territory); revenue < 0; FCF ≠ OCF − capex ±$1M; |eps_diluted| > 500 (XBRL unit parse); |mos_pct| > 500% - RULE 16 violations: entered_top5 = True AND risk_flags non-empty (annotate-and-veto-Top-N invariant from SKILL.md) - KNOWN_ISSUE overlap: #7 Sloan-Financials, #10 shares_outstanding, #11 value_trap noise 2. LLM-judgment review, capped at ≤ 20 tickers: - real_outlier vs broken_data verdict per ticker - Upstream-cause hypothesis for broken_data (fundamentals.py / prices.py / filing_text.py / universe source) Dry-run on current cron output (rankings 2026-05-14, 503 tickers): - 1 SCHEMA violation: STZ market_cap = None - 15 CONSISTENCY bugs: |mos_pct| > 500% across APP / AXON / CASY / CIEN / DD / DDOG / GE / HWM / ... — likely the fair-price ensemble producing extreme valuations - 0 RULE_16 violations - 0 KNOWN_ISSUE overlaps at the 5% mcap-vs-shares threshold Hard constraints in the agent prompt: - DO NOT modify frontend/public/data/*.json (CI-job-only) - DO NOT propose threshold recalibration (methodology layer's job) - DO NOT validate underlying formulas (Altman Z weights, Beneish M coefficients — methodology-scientist's slot) - DO NOT spawn other agents (escalate via the escalation table) - Hard cap at 20 stock file Reads in Step 3 === Doc lockstep === - CLAUDE.md §Layout: 14 → 15 agents (Tier 1 Core 4 → 5) - CLAUDE.md §Auto-routing policy: full rewrite per Change A - CLAUDE.md §Phase status: "Lean auto-routing + stock-detail- auditor in flight (this PR)" entry - AGENTS.md project-structure tree: 14 → 15 agents - AGENTS.md §Claude-Code-specific tooling: new paragraph on the gate-moment policy + stock-detail-auditor mention - .claude/agents/README.md: roster table 14 → 15 (Tier 1 grows); Flow 2 (release ladder) gains stock-detail-auditor between schema-sentinel and security-reviewer No compute / schema / scoring / valuation / frontend code change.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This was referenced May 21, 2026
Closed
This was referenced May 21, 2026
dackclup
added a commit
that referenced
this pull request
May 21, 2026
…7%) (#178) Second-order token-economy optimization stacking on PR #175 (which reduced spawn FREQUENCY via gate-moment routing). This PR reduces per-spawn SIZE by trimming the genuine bloat from the 6 largest agent prompts. Per-file deltas: - incident-commander 218 → 126 (−92, −42%) - test-engineer 189 → 113 (−76, −40%) - stock-detail-auditor 177 → 120 (−57, −32%) - docs-reviewer 180 → 125 (−55, −31%) - release-captain 211 → 145 (−66, −31%) - security-reviewer 185 → 131 (−54, −29%) Total across 15-agent set: 2925 → 2525 lines (−400, −13.7%). What got cut (genuine bloat, NOT substance): - "Read these first" prose lists enumerating obvious files → compressed to one sentence pointing at CLAUDE.md / SKILL.md / AGENTS.md anchors - incident-commander's Step 6 post-mortem template (full schema) → defer to `9arm-post-mortem` skill, keep only the skeleton- emit instruction - test-engineer's "Project test conventions (memorize)" section duplicating AGENTS.md §Testing → "see AGENTS.md §Testing" - release-captain's "Recent releases" subsection (stale, lives in PHASE_STATUS.md) → removed - release-captain's Step 6 user-checklist (long prose) → one-line summary pointing at `release-tag` skill - security-reviewer's Section A-H verbose intros → compressed per-Section header + grep command + criteria - docs-reviewer's Step 2 "Substance check (per doc)" 6 prose blocks → single matrix table - stock-detail-auditor's verbose Step 1 recon prose + escalation- path narrative → kept the matrix, removed wrapper text What's PRESERVED in every trimmed agent: - YAML frontmatter (name, description, tools, model) — verbatim; no description shortening (the description is the auto-routing trigger and must stay rich for matcher quality) - Workflow steps with concrete commands - Hard constraints / "What you do NOT do" section - Output format template (compressed but complete) - Escalation tables to other agents - Tools list (no tool removed from any agent) Companion artifacts in same session: - Issue #176: STZ market_cap = null on 2026-05-14 cron (XBRL fact extraction missing `shares_outstanding`, related to issue #10) - Issue #177: 15 tickers |mos_pct| > 500% on 2026-05-14 cron (fair- price ensemble producing extreme estimates on growth/goodwill- heavy stocks: APP, AXON, CASY, CIEN, DD, DDOG, GE, HWM, ...) Both surfaced by the deterministic prefilter in `stock-detail- auditor` during PR #175 dry-run. Filed as separate issues; this PR intentionally does NOT include compute / valuation fixes — focuses purely on the agent-prompt-size lever. Doc lockstep: CLAUDE.md §Phase status gains "Trim agent prompts in flight (this PR)" entry. AGENTS.md §Claude-Code-specific tooling gains a paragraph on the keep-it-tight principle for future agent additions (trim target when an agent grows past ~150 lines). No compute / schema / scoring / valuation / frontend code change. Co-authored-by: Claude <noreply@anthropic.com>
dackclup
added a commit
that referenced
this pull request
May 21, 2026
…) (#181) New annotate-only defense flag closing the visibility gap surfaced by the stock-detail-auditor dry-run on PR #175. STZ on the 2026-05-14 cron shipped with market_cap=null and risk_flags=[] because shares_outstanding failed XBRL extraction despite revenue + balance sheet extracting cleanly. The new flag share_count_extraction_missing fires when `shares_outstanding is None AND revenue > 0 AND total_assets > 0` — narrow guard distinguishing partial XBRL extraction from "entire extraction broken" (issue #15 territory). Annotate-only per portable-annotate-before-veto: the existing data_quality_input_corruption veto keeps its shares_outstanding=None silence contract (issue #18 / test_D3) so the two pathways stay coherent. Asymmetry tests lock the None-vs-zero behavior since shares_outstanding=0 is a legitimate edge (not extraction failure). STZ is rank 308 so no Top-5 impact either way; promotion to veto deferred to the Q3 2026-08-19 quarterly cohort audit. Schema 0.9.5-phase4h.5 → 0.9.6-phase4h.6 for the new diagnostic Metadata.share_count_extraction_missing_count: int | None (Rule 18 observability shipped in the same PR as the flag emission). Defense layer headline 28 → 29 emitted boolean flags. Tests 1031 → 1040 (+9). The deeper XBRL-manifest fix (extend _FUNDAMENTALS_REQUIRED_ATTRS with share-class-scoped fact names + cover-page fallback) is a follow-up needing SEC live access. https://claude.ai/code/session_01HHo4UHKc9iKKytkKfxfVnA Co-authored-by: Claude <noreply@anthropic.com>
6 tasks
dackclup
pushed a commit
that referenced
this pull request
May 23, 2026
…-trivial edits Stacks on the depth-only fix already in this PR. User followup: loosening per-spawn caps alone isn't enough — if sonnet agents don't spawn often enough, the Sonnet-only pool stays idle anyway. Adds six new edit-trigger rows to CLAUDE.md §Auto-routing policy that fire sonnet agents immediately on non-trivial edit to their domain (replacing the lean "Edits alone do NOT auto-spawn" rule from PR #175): | Edit | Spawns (sonnet) | |---|---| | Schema triple file | schema-sentinel | | compute/scoring or compute/valuation | defense-layer-auditor | | frontend/components or frontend/app | frontend-design-reviewer | | .github/workflows or new dep or new env-var | security-reviewer | | Prod code without same-PR test | test-engineer | | Any of 7 top-level docs | docs-reviewer | "Non-trivial" definition spelled out: > 5 added lines OR touches non-comment code OR adds/removes a public symbol. Comment / whitespace / single-line fixes do not trigger. Opus agents (incident-commander · release-captain · methodology-scientist · quantrank-reviewer) keep the rare-fire policy — they bill against the "Weekly · all models" pool, so firing them more often does not help drain the underutilized Sonnet-only pool. The "ready to push" gate stays as a safety-net re-batch: opus reviewer + phase-coordinator fire fresh; sonnet agents skip via the existing 10-min dedup window if they already ran on the same diff during the edit-trigger pass. So worst-case spawn count per PR rises ~2-3× vs PR #175 baseline, but every extra spawn drains the paid-for-but-currently-idle Sonnet-only pool. AGENTS.md §Claude-Code-specific tooling — added paragraph describing the spawn-frequency discipline so cross-tool readers (Copilot / Cursor / Devin) understand the dual-pool topology. §Phase status entry in CLAUDE.md updated to describe both lifts (per-spawn cap AND spawn frequency) as one two-part change. No compute / schema / scoring / valuation / frontend code change.
dackclup
added a commit
that referenced
this pull request
May 23, 2026
…caps to drain Sonnet-only pool (#219) * chore(agents): reset sonnet sub-agent thoroughness — lift artificial caps to drain Sonnet-only pool User observation: "Weekly · Sonnet only" pool on the Max plan sits near 2% utilization while "Weekly · all models" moves normally — meaning sonnet sub-agents are under-used. Root cause: artificial work-bounding caps I added incidentally during PR #178 trim pass ("≤ 20 tickers", "terse", "do not exceed N file Reads") were treating sub-agent effort as a cost to minimize, when it's actually the intended use of the separately-billed Sonnet-only pool. Caps lifted: - `.claude/agents/stock-detail-auditor.md` Step 3 — removed 20-ticker hard cap. Agent now walks every prefilter-flagged ticker, dedups multi-rule hits, and may fetch 1-2 adjacent peers when a multi-ticker pattern is suspected. Added "DO NOT skip flagged tickers to keep the report short" hard constraint to codify the new principle. Frontmatter `description:` rewritten to remove the "≤ 20 flagged tickers" wording (kept the project's auto-routing cue intact otherwise). - `.claude/agents/quantrank-reviewer.md` Output format — removed "Reply terse" instruction. Agent now lists every PASS / FAIL / WARN finding while walking Sections A-H. - `.claude/agents/README.md` Flow 2 release ladder — release- captain's stock-detail-auditor lane no longer says "≤ 20 LLM verdicts"; says "thorough LLM verdicts for every flagged ticker". Roster row description updated to match. Policy update: - `CLAUDE.md` §Auto-routing policy §Spawn discipline — two new bullets: (a) "Don't gatekeep sub-agent effort" explaining the Max-plan dual-pool topology (sonnet sub-agents drain Weekly · Sonnet only; opus + main session drain Weekly · all models) and why bounding sub-agent output wastes paid budget (b) "Prefer delegation to sub-agents over inline main-session work" — when both options exist, route work through sonnet sub-agents so main-session tokens don't get spent doing what a thorough sonnet agent can do for free against a separate pool - `AGENTS.md` §Claude-Code-specific tooling — rewrote the "prompts are kept tight" paragraph to make explicit that the trim target is BOILERPLATE (read-these-first lists, duplicated material from the canonical docs), NOT investigation depth or output length. Hard prompt-size constraint ≠ hard work-size cap. Model assignments unchanged: 4 opus by design (incident-commander + release-captain + methodology-scientist + quantrank-reviewer) + 11 sonnet. Tested-and-reverted in same PR a temporary swap of quantrank-reviewer + methodology-scientist to sonnet — wrong interpretation of user intent. The fix is to stop capping work, not to demote models. Companion follow-up (not in this PR): per-session usage report post-merge that confirms Sonnet-only pool actually moves more after this lands. If not, re-investigate whether spawn frequency needs to increase (separate PR). No compute / schema / scoring / valuation / frontend code change. * chore(agents): also lift spawn frequency for sonnet sub-agents on non-trivial edits Stacks on the depth-only fix already in this PR. User followup: loosening per-spawn caps alone isn't enough — if sonnet agents don't spawn often enough, the Sonnet-only pool stays idle anyway. Adds six new edit-trigger rows to CLAUDE.md §Auto-routing policy that fire sonnet agents immediately on non-trivial edit to their domain (replacing the lean "Edits alone do NOT auto-spawn" rule from PR #175): | Edit | Spawns (sonnet) | |---|---| | Schema triple file | schema-sentinel | | compute/scoring or compute/valuation | defense-layer-auditor | | frontend/components or frontend/app | frontend-design-reviewer | | .github/workflows or new dep or new env-var | security-reviewer | | Prod code without same-PR test | test-engineer | | Any of 7 top-level docs | docs-reviewer | "Non-trivial" definition spelled out: > 5 added lines OR touches non-comment code OR adds/removes a public symbol. Comment / whitespace / single-line fixes do not trigger. Opus agents (incident-commander · release-captain · methodology-scientist · quantrank-reviewer) keep the rare-fire policy — they bill against the "Weekly · all models" pool, so firing them more often does not help drain the underutilized Sonnet-only pool. The "ready to push" gate stays as a safety-net re-batch: opus reviewer + phase-coordinator fire fresh; sonnet agents skip via the existing 10-min dedup window if they already ran on the same diff during the edit-trigger pass. So worst-case spawn count per PR rises ~2-3× vs PR #175 baseline, but every extra spawn drains the paid-for-but-currently-idle Sonnet-only pool. AGENTS.md §Claude-Code-specific tooling — added paragraph describing the spawn-frequency discipline so cross-tool readers (Copilot / Cursor / Devin) understand the dual-pool topology. §Phase status entry in CLAUDE.md updated to describe both lifts (per-spawn cap AND spawn frequency) as one two-part change. No compute / schema / scoring / valuation / frontend code change. --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
After the 14-agent enterprise rollout (PR #168), the auto-routing table fired most cues on EDIT events with "Parallel by default". On a multi-file edit (schemas.py + compute/scoring/ + frontend/components/) that meant 3-5 parallel agent spawns per diff — token cost compounded. This PR keeps all 15 agents, but changes WHEN they fire.
Change A — §Auto-routing policy rewrite (gate-moment biased)
The original 17-row table is rewritten to:
tests/test_ingest/, EDGAR 429/403, cron warm-cache > 10 min, p95 > 20s, Dependabot alerts, production cron failure / site down,workflow_dispatchlands green, quarterly cohort audit date, new defense flag proposed, threshold/weight constant change. Signal-driven = rare = spawn-on-signal is cheap.schema-sentinelif schema triple touched on branchdefense-layer-auditorifcompute/scoring/orcompute/valuation/touchedfrontend-design-revieweriffrontend/components/orfrontend/app/toucheddocs-reviewerif any of the 7 docs modifiedsecurity-reviewerif.github/workflows/or env-var or new deptest-engineerif production code added without a teststock-detail-auditor: "ตรวจ data หุ้น" / pre-release / post-cron.model: opusoverride onquantrank-reviewer: diff > 200 lines oncompute/scoring/OR explicit "full review" — opus is opt-in with user authorization.Change B — 15th agent:
stock-detail-auditor(Tier 1, sonnet)New data-correctness reviewer for the per-stock JSON the frontend renders at
/stock/[ticker]. Covers whatquantrank-reviewerdeliberately won't: output correctness of cron-generated JSON.Two-step protocol that bounds token cost
Step 2 — Deterministic prefilter (no LLM):
composite_score/ each non-nullpillar_scores.*outside[0, 100];current_price ≤ 0whenhas_history;market_cap ≤ 0orNone;fair_price.median ≤ 0 or > 10000;rankoutside[1, universe_size]|market_cap − current_price × shares_outstanding| / market_cap > 5%(issue #10 territory);revenue < 0;FCF ≠ OCF − capex ± $1M;|eps_diluted| > 500(XBRL unit parse);|fair_price.mos_pct| > 500%entered_top5 == True AND risk_flags non-empty(annotate-and-veto-Top-N invariant from SKILL.md)data_quality_input_corruptionflag (#10/#18); Financials +sloan_accruals_top_decile(#7);value_trap_risk(#11 noise candidate)Step 3 — LLM-judgment review (cap ≤ 20 tickers): per ticker, classify as
real_outlier(data plausible, flag informative) vsbroken_data(upstream mis-parse), and point at the likely upstream cause forbroken_data(fundamentals.pyXBRL /prices.pyyfinance /filing_text.py10-K / sector classification).Dry-run on current cron output (rankings 2026-05-14, 503 tickers)
The deterministic prefilter on real data, zero LLM tokens spent in this pass:
STZhasmarket_cap: None— should not happen post-cron|mos_pct| > 500%acrossAPP,AXON,CASY,CIEN,DD,DDOG,GE,HWM, ... — likely the fair-price ensemble producing extreme valuationsReal findings on real data — the prefilter pays back its cost before the LLM ever runs.
Hard constraints (in the agent prompt)
frontend/public/data/*.json— CI-job-only perAGENTS.md§BoundariesCoordination patterns
Added to Flow 2 (release ladder) in
.claude/agents/README.md—stock-detail-auditorruns in parallel alongsideschema-sentinel/defense-layer-auditorwhenrelease-captainorchestrates a release.Doc lockstep (§Conventions)
14 → 15(Tier 1 Core4 → 5); §Auto-routing policy full rewrite; §Phase status "Lean auto-routing + stock-detail-auditor in flight (this PR)" entry14 → 15; §Claude-Code-specific tooling subsection adds the gate-moment-policy + stock-detail-auditor paragraph.claude/agents/README.md— current-set heading14 → 15; Tier 1 Core gains stock-detail-auditor row; Flow 2 release ladder gains the new agent between schema-sentinel and security-reviewerTest plan
name/description/model: sonnet/tools: Read, Bash, Grep, Glob)frontend/public/data/stocks/*.json— produces meaningful findings (16 hits on real data, no false positives in dry-run)-3rows because three "on edit" cues were merged into the "ready to push" conditional batch-mates)stock-detail-auditorfiresworkflow_dispatchgreen spawnsdefense-layer-auditor+stock-detail-auditorin parallel (per the updated post-cron cue)Token-economy impact (qualitative)
Before this PR — typical multi-file edit (e.g.,
compute/scoring/risk_overlay.py+schemas.py+types.ts):schema-sentinel(sonnet) spawn on schema-triple editquantrank-reviewer(opus) spawn after edit-set stabilizesdefense-layer-auditor(sonnet) spawn if output committeddocs-reviewer(sonnet) spawn on any doc touchedmethodology-scientist(opus) on weight changeAfter this PR — same edit:
schema-reminder.sh(zero LLM cost) on schema-triple editquantrank-reviewer+phase-coordinator+schema-sentinel(since triple touched) +defense-layer-auditor(since scoring touched) +docs-reviewer(since docs touched) +methodology-scientistMode B (since weight changed)Expected savings on a typical 5-edit-iteration PR cycle: ~80% reduction in agent spawns (from ~25 spawns → ~6 batched spawns).
Out of scope
incident-commander,release-captain,methodology-scientistkeep opus because they orchestrate or do cross-domain reasoning.quantrank-reviewermoves toward sonnet-by-default with explicit opus opt-in for large diffs.methodology-scientistslot, not this PR.Supersedes
Generated by Claude Code