feat(low-token): two-tier model routing + Phase 1/5 agent trims#61
Merged
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
Deploying zero-operators with
|
| Latest commit: |
43dd15f
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://7d56e4ca.zero-operators.pages.dev |
| Branch Preview URL: | https://claude-low-token-quick-wins.zero-operators.pages.dev |
Two structural levers added to the low-token preset to push the savings ceiling from ~30% (lead-only swap) toward ~50-60% without requiring an SDK refactor. 1. LOW_TOKEN_HAIKU_AGENTS — frozenset of agents routed to Haiku 4.5 in low-token mode: code-reviewer, test-engineer, oracle-qa. Pattern-matching tasks (convention checks, pytest scaffolding, eval execution + result extraction) where Haiku is SWE-bench- competitive (73.3%) and ~3x cheaper than Sonnet. The lead is instructed via _prompt_low_token_overrides() to spawn these three with model="claude-haiku-4-5" and all other sub-agents (data-engineer, model-builder, xai-agent, domain-evaluator, ml-engineer, custom agents) with model="claude-sonnet-4-6". Two-tier routing replaces the previous single-tier "everyone Sonnet" instruction. 2. LOW_TOKEN_PHASE_DROPS — per-phase agent skip dict consumed by _agents_for_phase. Phase 1 drops code-reviewer, test-engineer, domain-evaluator (defer reviews/tests to Gate 5 final pass — just data-engineer runs). Phase 5 drops xai-agent, domain-evaluator (lead writes a single-shot analysis summary instead of dedicated explainability + domain-validation pass). Phase 1 was ~45% of the first bench's cost ($3.47 of $7.75), so the Phase-1 trim is the biggest single contributor. Both constants live in src/zo/_orchestrator_phases.py near AGENT_PHASE_MAP for routing-config locality. Custom agents (not in AGENT_PHASE_MAP) are NOT affected by phase drops — they remain available across all phases. Tests: existing TestAgentsForPhaseLowToken updated (research-scout test moved to Phase 3 since code-reviewer is now dropped from Phase 1); +6 new tests covering Phase-1 drops (code-reviewer, test-engineer, domain-evaluator), Phase-5 drops (xai-agent, domain-evaluator), default-mode-keeps-all sanity checks, custom-agent passthrough. +2 tests for two-tier prompt routing in TestLowTokenOrchestrator. Test count 725 → 735 (+10). Cascade docs updated: docs/reference/cost-benchmark.mdx "What would push savings higher" section restructured into "Shipped post-first- bench (target: ~50-60%)" + "Architectural — not yet shipped (target: ~70-80%)" so users can see what's live vs. roadmap. docs/concepts/low-token-mode.mdx "What the preset flips" table extended with sub-agent routing, Phase 1, Phase 5 rows; docs/reference/low-token-preset.mdx preset block + knob reference updated; README --low-token paragraph extended. Note callout in preset reference flags that a second bench is needed to confirm the 50-60% target. Quality gates: pytest 735/735 + 7 skipped, ruff src/zo/ clean, validate-docs 10/10. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4fda2d3 to
43dd15f
Compare
SamPlvs
added a commit
that referenced
this pull request
Apr 30, 2026
feat(low-token): two-tier model routing + Phase 1/5 agent trims
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two structural levers added to the low-token preset to push the savings ceiling from ~30% (lead-only swap, measured in PR #60) toward ~50-60% without requiring an SDK refactor.
1. Sub-agent right-sizing — Haiku for pattern-matching agents
LOW_TOKEN_HAIKU_AGENTS(new frozenset insrc/zo/_orchestrator_phases.py) routes three agents to Haiku 4.5 in low-token mode:code-reviewer— convention checks, style reviewtest-engineer— pytest scaffolding, fixture writingoracle-qa— eval execution, metric extraction,result.mdwritingHaiku is SWE-bench-competitive (73.3%) and ~3× cheaper than Sonnet. The lead's
_prompt_low_token_overrides()now instructs two-tier routing: Haiku for these three; Sonnet for reasoning agents (data-engineer, model-builder, xai-agent, domain-evaluator, ml-engineer, customs).2. Per-phase agent trims — drop non-essential reviewers in heaviest phases
LOW_TOKEN_PHASE_DROPS(new per-phase dict) skips agents in two phases:code-reviewer,test-engineer,domain-evaluator. Just data-engineer runs. Reviews/tests deferred to Gate 5 final pass. Phase 1 was ~45% of the first bench's cost ($3.47 of $7.75) — biggest single contributor.xai-agent,domain-evaluator. Lead writes a single-shot analysis summary instead of dedicated explainability + domain-validation pass.Custom agents (not in
AGENT_PHASE_MAP) are not affected by drops — they remain available across all phases.Why these levers and not others
The ~30% ceiling measured in PR #60 was structural — sub-agents were already on Sonnet via
.mdfrontmatter, so the lead-only swap was the entire saving. To break past that without an SDK refactor, the only paths are (a) cheaper-per-token within Sonnet/Haiku/Opus tradeoffs, and (b) fewer/shorter agent spawns. This PR delivers both.The path past ~50-60% requires moving from
claudeCLI subprocess to direct Anthropic SDK (prompt caching, Batch API, Files API) — multi-week effort, deferred to v1.1.Test plan
TestAgentsForPhaseLowToken: 6 new tests covering Phase-1 drops (code-reviewer,test-engineer,domain-evaluator), Phase-5 drops (xai-agent,domain-evaluator), default-mode-keeps-all sanity, custom-agent passthroughTestLowTokenOrchestrator: 2 new tests for two-tier prompt routing —test_low_token_two_tier_routing_haiku_for_pattern_agents(asserts Haiku tier section + each Haiku-eligible agent name +claude-haiku-4-5model id) +test_low_token_off_omits_haiku_routing(regression guard)test_low_token_drops_research_scout→ moved to Phase 3 since code-reviewer is now phase-dropped from Phase 1)ruff src/zo/cleanvalidate-docs.sh10/10 (1 pre-existing test-count warning)Cascade docs
docs/reference/cost-benchmark.mdxdocs/concepts/low-token-mode.mdxdocs/reference/low-token-preset.mdxREADME.md--low-tokenparagraph extended with the two new levers + new ceiling targetmemory/zo-platform/DECISION_LOG.mdmemory/zo-platform/STATE.mdOut of scope (separate work)
cost-benchmark.mdxfinal tracking row + replace "needs second bench" caveats with measured numbers.🤖 Generated with Claude Code