Skip to content

docs: task complexity scoring + model routing proposal (post-mortem + rubric)#102

Merged
neoneye merged 2 commits intoPlanExeOrg:mainfrom
VoynichLabs:docs/complexity-routing-proposal
Feb 27, 2026
Merged

docs: task complexity scoring + model routing proposal (post-mortem + rubric)#102
neoneye merged 2 commits intoPlanExeOrg:mainfrom
VoynichLabs:docs/complexity-routing-proposal

Conversation

@82deutschmark
Copy link
Contributor

Summary

Post-mortem of Simon's 26 Feb 2026 refactor + a proposal for adding task complexity scoring and model routing to PlanExe.

What's in this PR (docs-only, no code changes)

71-model-routing-postmortem-simon-26feb2026.md
Analysis of Simon's day (64 commits, 108 files, ~15K lines). Corrected cost estimate: ~$18 total at Opus. Identifies where Haiku/Minimax could have handled execution after an Opus planning pass.

72-complexity-assessment-*.md (3 files)
Three independent complexity assessments of the same 8 task clusters — one from each model perspective (Sonnet 4.6, Haiku 4.5, Minimax M2.5). Written independently, designed to show how different model tiers assess the same work.

73-task-complexity-routing.md
The core proposal: PlanExe should score each task it generates on a 4-dimension Likert rubric (file size, semantic complexity, ambiguity, context dependency) and route tasks to the appropriate model tier for execution.

For Simon's Review

The three complexity assessments need your calibration. Questions embedded in each doc:

  1. Which model's scores most closely matched the actual difficulty you experienced?
  2. Were there tasks we all scored too high or too low?
  3. Would you have trusted Haiku for the security hardening work given an explicit spec?

This is the first real calibration dataset for the routing rubric. Your feedback drives the next iteration.

Authors

Larry (Sonnet 4.6) — primary author
Egon (Minimax M2.5) — Minimax perspective
Bubba (Haiku 4.5) — Haiku perspective
Authorized by: Mark Barney

Larry the Laptop Lobster added 2 commits February 26, 2026 23:21
- Post-mortem of Simon's 26 Feb 2026 refactor with realistic cost estimates (~$18)
- Three independent complexity assessments (Sonnet/Haiku/Minimax perspectives)
- Proposal 73: task complexity scoring + model routing for PlanExe
- For Simon's calibration review
@neoneye neoneye merged commit ed8a14d into PlanExeOrg:main Feb 27, 2026
3 checks passed
@neoneye neoneye deleted the docs/complexity-routing-proposal branch February 27, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants