feat(low-token): two-tier model routing + Phase 1/5 agent trims by SamPlvs · Pull Request #61 · SamPlvs/zero-operators

SamPlvs · 2026-04-27T16:31:12Z

Rebased onto main post-#60-merge. Single commit, clean diff against main.

Summary

Two structural levers added to the low-token preset to push the savings ceiling from ~30% (lead-only swap, measured in PR #60) toward ~50-60% without requiring an SDK refactor.

1. Sub-agent right-sizing — Haiku for pattern-matching agents

LOW_TOKEN_HAIKU_AGENTS (new frozenset in src/zo/_orchestrator_phases.py) routes three agents to Haiku 4.5 in low-token mode:

code-reviewer — convention checks, style review
test-engineer — pytest scaffolding, fixture writing
oracle-qa — eval execution, metric extraction, result.md writing

Haiku is SWE-bench-competitive (73.3%) and ~3× cheaper than Sonnet. The lead's _prompt_low_token_overrides() now instructs two-tier routing: Haiku for these three; Sonnet for reasoning agents (data-engineer, model-builder, xai-agent, domain-evaluator, ml-engineer, customs).

2. Per-phase agent trims — drop non-essential reviewers in heaviest phases

LOW_TOKEN_PHASE_DROPS (new per-phase dict) skips agents in two phases:

Phase 1: drop code-reviewer, test-engineer, domain-evaluator. Just data-engineer runs. Reviews/tests deferred to Gate 5 final pass. Phase 1 was ~45% of the first bench's cost ($3.47 of $7.75) — biggest single contributor.
Phase 5: drop xai-agent, domain-evaluator. Lead writes a single-shot analysis summary instead of dedicated explainability + domain-validation pass.

Custom agents (not in AGENT_PHASE_MAP) are not affected by drops — they remain available across all phases.

Why these levers and not others

The ~30% ceiling measured in PR #60 was structural — sub-agents were already on Sonnet via .md frontmatter, so the lead-only swap was the entire saving. To break past that without an SDK refactor, the only paths are (a) cheaper-per-token within Sonnet/Haiku/Opus tradeoffs, and (b) fewer/shorter agent spawns. This PR delivers both.

The path past ~50-60% requires moving from claude CLI subprocess to direct Anthropic SDK (prompt caching, Batch API, Files API) — multi-week effort, deferred to v1.1.

Test plan

+10 new tests:
- TestAgentsForPhaseLowToken: 6 new tests covering Phase-1 drops (code-reviewer, test-engineer, domain-evaluator), Phase-5 drops (xai-agent, domain-evaluator), default-mode-keeps-all sanity, custom-agent passthrough
- TestLowTokenOrchestrator: 2 new tests for two-tier prompt routing — test_low_token_two_tier_routing_haiku_for_pattern_agents (asserts Haiku tier section + each Haiku-eligible agent name + claude-haiku-4-5 model id) + test_low_token_off_omits_haiku_routing (regression guard)
- 1 existing test updated (test_low_token_drops_research_scout → moved to Phase 3 since code-reviewer is now phase-dropped from Phase 1)
Test count 725 → 735 + 7 skipped
ruff src/zo/ clean
validate-docs.sh 10/10 (1 pre-existing test-count warning)

Cascade docs

File	Update
`docs/reference/cost-benchmark.mdx`	"What would push savings higher" restructured into "Shipped post-first-bench (target: ~50-60%)" + "Architectural — not yet shipped (target: ~70-80%)" — users can see what's live vs. roadmap
`docs/concepts/low-token-mode.mdx`	"What the preset flips" table extended with sub-agent routing, Phase 1, Phase 5 rows; "Measured savings" updated with new ceiling target
`docs/reference/low-token-preset.mdx`	Preset code block now includes the two new constants; knob reference table extended; top Note callout flags second bench needed
`README.md`	`--low-token` paragraph extended with the two new levers + new ceiling target
`memory/zo-platform/DECISION_LOG.md`	New entry 2026-04-27T17:00:00Z documenting the design + alternatives considered
`memory/zo-platform/STATE.md`	Hand-off updated; PR B/C status reflected

Out of scope (separate work)

Second measured bench to confirm the ~50-60% target. ~$5-8 to run against MNIST. Would update cost-benchmark.mdx final tracking row + replace "needs second bench" caveats with measured numbers.
SDK refactor for prompt caching + Batch API + Files API (target ~70-80%+). Multi-week effort.
STATE.md staleness in delivery repo (orchestrator doesn't auto-write phase transitions). Same family as PR fix(orchestrator): hard-enforce ZOTrainingCallback contract #59 but distinct code path.

🤖 Generated with Claude Code

mintlify · 2026-04-27T16:31:19Z

Preview deployment for your docs. Learn more about Mintlify Previews.

Project	Status	Preview	Updated (UTC)
personal-6078e1c9	🟢 Ready	View Preview	Apr 27, 2026, 4:32 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

cloudflare-workers-and-pages · 2026-04-27T16:31:47Z

Deploying zero-operators with Cloudflare Pages

Latest commit:	`43dd15f`
Status:	✅ Deploy successful!
Preview URL:	https://7d56e4ca.zero-operators.pages.dev
Branch Preview URL:	https://claude-low-token-quick-wins.zero-operators.pages.dev

View logs

Two structural levers added to the low-token preset to push the savings ceiling from ~30% (lead-only swap) toward ~50-60% without requiring an SDK refactor. 1. LOW_TOKEN_HAIKU_AGENTS — frozenset of agents routed to Haiku 4.5 in low-token mode: code-reviewer, test-engineer, oracle-qa. Pattern-matching tasks (convention checks, pytest scaffolding, eval execution + result extraction) where Haiku is SWE-bench- competitive (73.3%) and ~3x cheaper than Sonnet. The lead is instructed via _prompt_low_token_overrides() to spawn these three with model="claude-haiku-4-5" and all other sub-agents (data-engineer, model-builder, xai-agent, domain-evaluator, ml-engineer, custom agents) with model="claude-sonnet-4-6". Two-tier routing replaces the previous single-tier "everyone Sonnet" instruction. 2. LOW_TOKEN_PHASE_DROPS — per-phase agent skip dict consumed by _agents_for_phase. Phase 1 drops code-reviewer, test-engineer, domain-evaluator (defer reviews/tests to Gate 5 final pass — just data-engineer runs). Phase 5 drops xai-agent, domain-evaluator (lead writes a single-shot analysis summary instead of dedicated explainability + domain-validation pass). Phase 1 was ~45% of the first bench's cost ($3.47 of $7.75), so the Phase-1 trim is the biggest single contributor. Both constants live in src/zo/_orchestrator_phases.py near AGENT_PHASE_MAP for routing-config locality. Custom agents (not in AGENT_PHASE_MAP) are NOT affected by phase drops — they remain available across all phases. Tests: existing TestAgentsForPhaseLowToken updated (research-scout test moved to Phase 3 since code-reviewer is now dropped from Phase 1); +6 new tests covering Phase-1 drops (code-reviewer, test-engineer, domain-evaluator), Phase-5 drops (xai-agent, domain-evaluator), default-mode-keeps-all sanity checks, custom-agent passthrough. +2 tests for two-tier prompt routing in TestLowTokenOrchestrator. Test count 725 → 735 (+10). Cascade docs updated: docs/reference/cost-benchmark.mdx "What would push savings higher" section restructured into "Shipped post-first- bench (target: ~50-60%)" + "Architectural — not yet shipped (target: ~70-80%)" so users can see what's live vs. roadmap. docs/concepts/low-token-mode.mdx "What the preset flips" table extended with sub-agent routing, Phase 1, Phase 5 rows; docs/reference/low-token-preset.mdx preset block + knob reference updated; README --low-token paragraph extended. Note callout in preset reference flags that a second bench is needed to confirm the 50-60% target. Quality gates: pytest 735/735 + 7 skipped, ruff src/zo/ clean, validate-docs 10/10. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(low-token): two-tier model routing + Phase 1/5 agent trims

mintlify Bot deployed to staging - docs April 27, 2026 16:32 View deployment

SamPlvs force-pushed the claude/low-token-quick-wins branch from 4fda2d3 to 43dd15f Compare April 27, 2026 16:45

mintlify Bot deployed to staging - docs April 27, 2026 16:45 View deployment

SamPlvs merged commit e453dc4 into main Apr 27, 2026
2 checks passed

SamPlvs deleted the claude/low-token-quick-wins branch April 27, 2026 16:51

SamPlvs added a commit that referenced this pull request Apr 30, 2026

Merge pull request #61 from SamPlvs/claude/low-token-quick-wins

4fd8d3e

feat(low-token): two-tier model routing + Phase 1/5 agent trims

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(low-token): two-tier model routing + Phase 1/5 agent trims#61

feat(low-token): two-tier model routing + Phase 1/5 agent trims#61
SamPlvs merged 1 commit into
mainfrom
claude/low-token-quick-wins

SamPlvs commented Apr 27, 2026 •

edited

Loading

Uh oh!

mintlify Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SamPlvs commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Sub-agent right-sizing — Haiku for pattern-matching agents

2. Per-phase agent trims — drop non-essential reviewers in heaviest phases

Why these levers and not others

Test plan

Cascade docs

Out of scope (separate work)

Uh oh!

mintlify Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying zero-operators with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SamPlvs commented Apr 27, 2026 •

edited

Loading

mintlify Bot commented Apr 27, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Apr 27, 2026 •

edited

Loading