Skip to content

feat(low-token): two-tier model routing + Phase 1/5 agent trims#61

Merged
SamPlvs merged 1 commit into
mainfrom
claude/low-token-quick-wins
Apr 27, 2026
Merged

feat(low-token): two-tier model routing + Phase 1/5 agent trims#61
SamPlvs merged 1 commit into
mainfrom
claude/low-token-quick-wins

Conversation

@SamPlvs
Copy link
Copy Markdown
Owner

@SamPlvs SamPlvs commented Apr 27, 2026

Rebased onto main post-#60-merge. Single commit, clean diff against main.

Summary

Two structural levers added to the low-token preset to push the savings ceiling from ~30% (lead-only swap, measured in PR #60) toward ~50-60% without requiring an SDK refactor.

1. Sub-agent right-sizing — Haiku for pattern-matching agents

LOW_TOKEN_HAIKU_AGENTS (new frozenset in src/zo/_orchestrator_phases.py) routes three agents to Haiku 4.5 in low-token mode:

  • code-reviewer — convention checks, style review
  • test-engineer — pytest scaffolding, fixture writing
  • oracle-qa — eval execution, metric extraction, result.md writing

Haiku is SWE-bench-competitive (73.3%) and ~3× cheaper than Sonnet. The lead's _prompt_low_token_overrides() now instructs two-tier routing: Haiku for these three; Sonnet for reasoning agents (data-engineer, model-builder, xai-agent, domain-evaluator, ml-engineer, customs).

2. Per-phase agent trims — drop non-essential reviewers in heaviest phases

LOW_TOKEN_PHASE_DROPS (new per-phase dict) skips agents in two phases:

  • Phase 1: drop code-reviewer, test-engineer, domain-evaluator. Just data-engineer runs. Reviews/tests deferred to Gate 5 final pass. Phase 1 was ~45% of the first bench's cost ($3.47 of $7.75) — biggest single contributor.
  • Phase 5: drop xai-agent, domain-evaluator. Lead writes a single-shot analysis summary instead of dedicated explainability + domain-validation pass.

Custom agents (not in AGENT_PHASE_MAP) are not affected by drops — they remain available across all phases.

Why these levers and not others

The ~30% ceiling measured in PR #60 was structural — sub-agents were already on Sonnet via .md frontmatter, so the lead-only swap was the entire saving. To break past that without an SDK refactor, the only paths are (a) cheaper-per-token within Sonnet/Haiku/Opus tradeoffs, and (b) fewer/shorter agent spawns. This PR delivers both.

The path past ~50-60% requires moving from claude CLI subprocess to direct Anthropic SDK (prompt caching, Batch API, Files API) — multi-week effort, deferred to v1.1.

Test plan

  • +10 new tests:
    • TestAgentsForPhaseLowToken: 6 new tests covering Phase-1 drops (code-reviewer, test-engineer, domain-evaluator), Phase-5 drops (xai-agent, domain-evaluator), default-mode-keeps-all sanity, custom-agent passthrough
    • TestLowTokenOrchestrator: 2 new tests for two-tier prompt routing — test_low_token_two_tier_routing_haiku_for_pattern_agents (asserts Haiku tier section + each Haiku-eligible agent name + claude-haiku-4-5 model id) + test_low_token_off_omits_haiku_routing (regression guard)
    • 1 existing test updated (test_low_token_drops_research_scout → moved to Phase 3 since code-reviewer is now phase-dropped from Phase 1)
  • Test count 725 → 735 + 7 skipped
  • ruff src/zo/ clean
  • validate-docs.sh 10/10 (1 pre-existing test-count warning)

Cascade docs

File Update
docs/reference/cost-benchmark.mdx "What would push savings higher" restructured into "Shipped post-first-bench (target: ~50-60%)" + "Architectural — not yet shipped (target: ~70-80%)" — users can see what's live vs. roadmap
docs/concepts/low-token-mode.mdx "What the preset flips" table extended with sub-agent routing, Phase 1, Phase 5 rows; "Measured savings" updated with new ceiling target
docs/reference/low-token-preset.mdx Preset code block now includes the two new constants; knob reference table extended; top Note callout flags second bench needed
README.md --low-token paragraph extended with the two new levers + new ceiling target
memory/zo-platform/DECISION_LOG.md New entry 2026-04-27T17:00:00Z documenting the design + alternatives considered
memory/zo-platform/STATE.md Hand-off updated; PR B/C status reflected

Out of scope (separate work)

  • Second measured bench to confirm the ~50-60% target. ~$5-8 to run against MNIST. Would update cost-benchmark.mdx final tracking row + replace "needs second bench" caveats with measured numbers.
  • SDK refactor for prompt caching + Batch API + Files API (target ~70-80%+). Multi-week effort.
  • STATE.md staleness in delivery repo (orchestrator doesn't auto-write phase transitions). Same family as PR fix(orchestrator): hard-enforce ZOTrainingCallback contract #59 but distinct code path.

🤖 Generated with Claude Code

@mintlify
Copy link
Copy Markdown

mintlify Bot commented Apr 27, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
personal-6078e1c9 🟢 Ready View Preview Apr 27, 2026, 4:32 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 27, 2026

Deploying zero-operators with  Cloudflare Pages  Cloudflare Pages

Latest commit: 43dd15f
Status: ✅  Deploy successful!
Preview URL: https://7d56e4ca.zero-operators.pages.dev
Branch Preview URL: https://claude-low-token-quick-wins.zero-operators.pages.dev

View logs

Two structural levers added to the low-token preset to push the
savings ceiling from ~30% (lead-only swap) toward ~50-60% without
requiring an SDK refactor.

1. LOW_TOKEN_HAIKU_AGENTS — frozenset of agents routed to Haiku 4.5
   in low-token mode: code-reviewer, test-engineer, oracle-qa.
   Pattern-matching tasks (convention checks, pytest scaffolding,
   eval execution + result extraction) where Haiku is SWE-bench-
   competitive (73.3%) and ~3x cheaper than Sonnet. The lead is
   instructed via _prompt_low_token_overrides() to spawn these three
   with model="claude-haiku-4-5" and all other sub-agents
   (data-engineer, model-builder, xai-agent, domain-evaluator,
   ml-engineer, custom agents) with model="claude-sonnet-4-6".
   Two-tier routing replaces the previous single-tier "everyone
   Sonnet" instruction.

2. LOW_TOKEN_PHASE_DROPS — per-phase agent skip dict consumed by
   _agents_for_phase. Phase 1 drops code-reviewer, test-engineer,
   domain-evaluator (defer reviews/tests to Gate 5 final pass — just
   data-engineer runs). Phase 5 drops xai-agent, domain-evaluator
   (lead writes a single-shot analysis summary instead of dedicated
   explainability + domain-validation pass). Phase 1 was ~45% of the
   first bench's cost ($3.47 of $7.75), so the Phase-1 trim is the
   biggest single contributor.

Both constants live in src/zo/_orchestrator_phases.py near
AGENT_PHASE_MAP for routing-config locality. Custom agents (not in
AGENT_PHASE_MAP) are NOT affected by phase drops — they remain
available across all phases.

Tests: existing TestAgentsForPhaseLowToken updated (research-scout
test moved to Phase 3 since code-reviewer is now dropped from Phase
1); +6 new tests covering Phase-1 drops (code-reviewer, test-engineer,
domain-evaluator), Phase-5 drops (xai-agent, domain-evaluator),
default-mode-keeps-all sanity checks, custom-agent passthrough. +2
tests for two-tier prompt routing in TestLowTokenOrchestrator. Test
count 725 → 735 (+10).

Cascade docs updated: docs/reference/cost-benchmark.mdx "What would
push savings higher" section restructured into "Shipped post-first-
bench (target: ~50-60%)" + "Architectural — not yet shipped (target:
~70-80%)" so users can see what's live vs. roadmap.
docs/concepts/low-token-mode.mdx "What the preset flips" table
extended with sub-agent routing, Phase 1, Phase 5 rows;
docs/reference/low-token-preset.mdx preset block + knob reference
updated; README --low-token paragraph extended. Note callout in
preset reference flags that a second bench is needed to confirm the
50-60% target.

Quality gates: pytest 735/735 + 7 skipped, ruff src/zo/ clean,
validate-docs 10/10.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@SamPlvs SamPlvs force-pushed the claude/low-token-quick-wins branch from 4fda2d3 to 43dd15f Compare April 27, 2026 16:45
@SamPlvs SamPlvs merged commit e453dc4 into main Apr 27, 2026
2 checks passed
@SamPlvs SamPlvs deleted the claude/low-token-quick-wins branch April 27, 2026 16:51
SamPlvs added a commit that referenced this pull request Apr 30, 2026
feat(low-token): two-tier model routing + Phase 1/5 agent trims
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant