docs: downshift model recommendations toward cheaper tiers#1
Conversation
Rewrites the model-tier rubrics in CLAUDE.md §2 and triage.md to reflect Haiku 4.5's expanded capabilities. Tasks that can plausibly run on a cheaper tier now do — most notably the triage default flips from Sonnet to Haiku, and several Sonnet-tier examples (test writing from a tight spec, small-diff code review, mechanical API migrations) move to Haiku. Sonnet's "agentic loops" example is qualified to "non-trivial reasoning per step" so rubric-driven loops stay on Haiku. Opus narrows "reading a large unfamiliar codebase" to the case without graphify. The "When in doubt, go one tier cheaper" meta-principle (line 209) and the Task vs. Agent terminology drift on line 210 are intentionally unchanged.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughDocumentation updates refining model-tier guidance and CI watcher agent behavior: Changes
Sequence Diagram(s)sequenceDiagram
participant CI as CI (GitHub Actions)
participant Watcher as CI-watch agent
participant Model as Model tier selector
participant Sonnet as Sonnet agent
participant Repo as Repository / PR
CI->>Watcher: push / workflow run
Watcher->>Model: evaluate task complexity & rubric
alt Mechanical / rubric-driven
Model->>Watcher: choose Haiku
Watcher->>Repo: run mechanical triage/fixes (Haiku)
else Requires non-trivial reasoning
Model->>Watcher: choose Haiku then escalate
Watcher->>Sonnet: respawn specific fix/verification step (Sonnet)
Sonnet->>Repo: handle exploratory or multi-file reasoning
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 3/5 reviews remaining, refill in 21 minutes and 44 seconds. Comment |
Adds explicit `model: haiku` hints (with re-spawn-on-Sonnet escalation notes) at the three background-agent spawn sites in git-workflow.md: ci-watch, cr-watch, and merge-watch. Their core work is polling, log/triage, and routine deploy verification — Haiku per the rubric just updated in CLAUDE.md §2. The escalation notes preserve the meta-rule on line 209: re-spawn on a stronger tier only when a specific step genuinely needs it. This locks the choice in at the spawn site rather than relying on the spawning agent to reconsult CLAUDE.md every time.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@git-workflow.md`:
- Line 67: The watcher name template "ci-watch-<short-sha>-<workflow-slug>" can
collide across runs; update the agent spawn logic that creates the Agent tool
(where you set run_in_background: true and model: haiku and call SendMessage) to
include the unique <run-id> in the name (e.g.,
"ci-watch-<run-id>-<short-sha>-<workflow-slug>") so each Agent instance is
globally unique and addressable via SendMessage; keep the rest of the Agent
options (run_in_background, model: haiku) unchanged and ensure any code that
looks up agents by name uses the new template.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Summary
CLAUDE.md§2 andtriage.mdto push tasks toward cheaper tiers wherever the cheaper tier is plausibly capable, per the existing "go one tier cheaper when in doubt" principle (line 209).(default for most delegations)plus 6 example tasks downshifted from Sonnet (test from tight spec, small-diff code review, mechanical API migration, rubric-driven labelling, etc.). Sonnet narrows to multi-file judgement, function impl with 1–2 design choices, multi-file/non-trivial-logic review, clear-target refactors, and reasoning-heavy agentic loops. Opus narrows "reading a large unfamiliar codebase" to the no-graphify case.Taskvs.Agentterminology drift on line 210 (flagged for separate cleanup).Test plan
git diff --stat origin/mainshows exactly two files changed, two hunks totalSummary by CodeRabbit