feat(sizing): complexity-scale-based derivation for all Claude sessions by CoreyRDean · Pull Request #101 · CoreyRDean/clauck

CoreyRDean · 2026-04-18T13:17:32Z

Summary

Makes cost a first-class transparent policy per `INTENT.md §3` non-negotiable #4 and §4 architectural property "Cost transparency". Introduces a single shared formula (`lib/sizing.py`) that maps a declared complexity scale (0.0–1.0) to `(model, effort, max_turns, max_budget_usd)`. Used by:

`clauck doctor` — classifier emits only `task_complexity_scale`; Python does the dollar arithmetic. Kills the ``"Reached maximum budget ($0.25)"`` failure mode structurally (two prior prompt-only fixes could not converge — classifiers anchor on literal dollars in prompts under load).
Scheduled jobs — `scheduler.py` resolves via the same helper at tick time. Jobs declare `complexity: ` in frontmatter; the scheduler derives the four sizing fields. Legacy jobs with explicit params continue to work unchanged.
Marketplace jobs — all 11 converted to `complexity:` declarations. `module-demo` retains explicit overrides for pedagogical minimums (teaches the override pattern).
Semantic interpreter (`clauck `) — job-creation guidance updated to emit `complexity:` in new jobs.

What changed

New module: `lib/sizing.py`

`SCALE_PARAMS` — banded lookup (0.10: haiku/medium → 1.0: opus/high), rates calibrated against Anthropic 4.x API list prices with a midpoint-average context-growth multiplier.
`compute_sizing(scale, ctx_tokens, config)` — derives all four fields, clamps budget to config min/max, explains the arithmetic in a human-readable string.
`resolve_params(frontmatter, ctx_tokens, config)` — handles the frontmatter contract: complexity → derived, explicit field → override (per-field granularity), neither → legacy defaults.
`load_doctor_config` / `save_doctor_config` with atomic writes.
`apply_auto_skew_on_budget_hit` / `apply_auto_skew_decay` — self-balancing scale offset.

Doctor changes

New classifier prompt (`_build_doctor_interpreter_prompt`) — emits `{interpretation, task_complexity_scale}` only. Scale anchors at 0.05, 0.20, 0.35, 0.55, 0.75, 0.95 with concrete task shapes. No literal dollar values in the prompt anywhere.
Stage-2 now uses derived params + prints the sizing breakdown before running:
```
→ Check delegate-scout job scheduling and Slack DM history [--dry --safe]
→ scale=0.35 → haiku/high, 14 turns, base $0.39 × 1.30 headroom = $0.50
```
Auto-skew on budget hit (`scale_skew += auto_skew_increment`, capped), decay on success (`scale_skew -= increment/2`, floored). Runs silently; notifies stderr on adjustment.
Interpreter runs with `--effort low` (new).

New CLI surfaces

`clauck config doctor [key [value]]` — view or set any doctor sizing knob.
`clauck size [--ctx N] [--model M]` — inspect what the formula derives without firing a session. Shows breakdown, config context, and the frontmatter snippet to pin.
`clauck inspect ` now shows resolved params with per-field provenance (`derived` / `override` / `default`).
`clauck validate` accepts `complexity` (float in [0.0, 1.0]), warns on total absence of sizing declaration.

Scheduler

Flat, module-anchor, and module-stage job builds all route through `sizing.resolve_params` — zero drift between the three construction sites.
Same YAML parser fix as the CLI (inline `#` comment stripping respecting quotes).

YAML parser (both scheduler and CLI)

New `_strip_inline_comment` helper that respects single/double quotes. Needed to parse `complexity: 0.15 # chain-test`-style lines, a natural convention for authors who want to annotate their scale choice inline.

Documentation

CLAUDE.md: new Cost policy section; frontmatter schema extended.
INTENT.md: §4 now lists four architectural properties (adds Cost transparency); §7 Cost row updated.
README.md: Cost section reframed around transparency.
skill/clauck/SKILL.md: new Cost & sizing reference section with scale anchors, knob table, auto-skew semantics.

Backward compatibility

Jobs without `complexity:` continue to work exactly as before (explicit fields → override path; no fields → legacy defaults).
Manifest schema unchanged.
No existing job requires edits.
`clauck config` still accepts the dotted-key form (`clauck config set doctor.scale_skew 0.1`) — the new `config doctor` form is additive.

Known-failure regression checks

Invocation	Pre-fix	Post-fix derivation
`"is my delegate-scout job operating as expected?"`	haiku $0.25, budget-exceeded	scale≈0.20 → haiku/high, 8 turns, $0.22 — within budget
`"intent-miner silent 4h … delegate-scout … 5-min cron"`	haiku $0.25, truncated; then sonnet $0.25, truncated	scale≈0.55 → sonnet/high, 25 turns, $2.71
`"is the plist loaded"` (trivial)	haiku small, fine	scale≈0.05 → haiku/medium, 4 turns, $0.08 (no regression)

Test plan

`bash -n install.sh && bash -n uninstall.sh` clean
`ast.parse` clean for `lib/{scheduler.py,dag-runner.py,clauck,sizing.py}`
`json.load(marketplace/index.json)` clean
`bash install.sh --dry-run --yes` completes
Smoke: `scheduler.discover_jobs()` with current `~/.clauck/` returns sensible sizing for every installed job (21 jobs); none regressed
Marketplace sizing sanity check: each of 11 jobs derives expected tier
After merge + reinstall: rerun the originally failing invocations and confirm no `Reached maximum budget` truncation
Sanity check: `clauck size 0.35` prints breakdown; `clauck config doctor` shows defaults; `clauck inspect morning-brief` shows `[derived]` provenance
Sanity check: trivial `clauck doctor "is the plist loaded"` still routes cheap

… CLI Introduces lib/sizing.py as the single source of truth for converting a task complexity scale (0.0–1.0) into (model, effort, max_turns, max_budget_usd). Used by both the scheduler (at tick time) and the CLI (for doctor sizing, `clauck size`, `clauck config doctor`, and `clauck inspect` resolved-param display). Why Prior doctor sizing lived inside the classifier prompt and asked the classifier to emit dollar values. Classifiers under load anchor on literal dollars in the prompt rather than computing from rates — a structural failure that produced repeat `Reached maximum budget ($0.25)` truncations across two prior attempts at prompt-only fixes. Moving the arithmetic out of the prompt eliminates that class of failure. What this commit does - lib/sizing.py: banded SCALE_PARAMS lookup (haiku/sonnet/opus tiers) + context-growth-aware compute_sizing() + resolve_params() with per-field override semantics + load/save of the `doctor` config block (min/max/headroom/scale_skew/auto-skew knobs) + atomic config writes. Rates calibrated against Anthropic 4.x API list prices with a midpoint-average context-growth multiplier. - lib/clauck: new doctor interpreter prompt emits {interpretation, task_complexity_scale} only — no model, turns, or dollars. Python derives those via sizing.compute_sizing. Doctor prints the full sizing breakdown before stage-2 for transparency. Adds auto-skew: on "Reached maximum budget" bumps config.scale_skew by auto_skew_increment (capped at auto_skew_cap); on clean runs decays by increment/2 (floored at 0) — self-balancing tuner. Interpreter runs with --effort low for cheapest classification. New subcommands: `clauck config doctor [key [val]]` (view/edit doctor config) and `clauck size <scale> [--ctx N]` (inspect what the formula derives). Updates semantic interpreter and job-creation prompts to prefer `complexity:` in new job frontmatter. cmd_validate recognizes `complexity`, validates [0.0, 1.0] range, and warns when no sizing is declared. cmd_inspect shows resolved params with per-field provenance (derived / override / default). YAML subset parser now strips inline `#` comments respecting quotes. - lib/scheduler.py: discover_jobs routes the three job-dict builds (flat, module anchor, module stage) through sizing.resolve_params so complexity-based frontmatter derives the four sizing fields at tick time, and legacy jobs continue to work unchanged via the per-field override path. YAML parser gets the same inline-comment fix to stay consistent with the CLI parser. - install.sh / uninstall.sh: place and remove lib/sizing.py alongside the other runtime Python files in ~/.clauck/. Backward compatibility - Jobs without `complexity:` continue to work. resolve_params falls back to explicit `max_turns` / `max_budget_usd` / `effort` / `model` fields, then to LEGACY_DEFAULTS (50 turns, $2.00, high effort). - Jobs with both `complexity:` and an explicit field use the explicit value for THAT field only. Override granularity is per-field so a job can pin max_turns while leaving everything else derived. - The manifest schema is unchanged — only the values flowing through it shift from "raw from frontmatter" to "resolved by sizing.py".

Replaces explicit max_turns/max_budget_usd/effort/model with a single `complexity:` field in each marketplace job. The scheduler derives all four sizing fields at fire time via lib/sizing.py. Scale assignments: - 0.05: dynamic-context-demo, module-demo (trivial/pedagogical) - 0.10: event-monitor, git-commit-nudge (single-trigger check) - 0.15: daily-verify, downloads-triage, workspace-cleanup (focused single-source) - 0.20: github-issue-watcher, github-pr-digest (multi-query digest) - 0.25: inbox-zero-assist (scan + synthesis) - 0.35: morning-brief (3-source synthesis) module-demo retains explicit max_turns: 1 / max_budget_usd: 0.02 as overrides — pedagogical minimums below min_budget_usd floor. This demonstrates the override pattern as a teaching example. All jobs pass through `clauck validate` with the new complexity recognition, and `sizing.resolve_params` derives sensible defaults for each scale. No behavioral change to firing logic; budgets shift to the values the formula picks (generally comparable to or slightly more generous than the hand-tuned values they replace).

- CLAUDE.md: new "Cost policy" section covering the scale formula, per-field override semantics, legacy compat, auto-skew, and the single-source-of-truth rule (lib/sizing.py). Frontmatter schema extended to document `complexity:`. - INTENT.md: §4 promoted from three architectural properties to four — "Cost transparency" added. Cost as first-class (non-negotiable #4) explained HOW via this property: single formula, single config surface, inspectable from CLI by humans and agents, self-correcting on truncation. §7 Cost policy row updated to describe the declare-and-derive pattern instead of raw "declare + log." - README.md: Cost section leads with the transparency pitch and points at `clauck size <scale>` + `clauck config doctor` for introspection. - skill/clauck/SKILL.md: new "Cost & sizing" reference section before the frontmatter schema. Table of scale anchors, config knobs, auto-skew semantics, legacy compat. Frontmatter schema entry extended to document `complexity:` with override semantics.

B1 — auto-skew false-positive detection Replaced substring search on stdout+stderr ("Reached maximum budget") with structured-field detection on the claude CLI envelope (subtype == "error_max_budget_usd" OR matching entries in envelope.errors). The old detection would false-positive on any diagnostic report that mentioned budget truncation in its output (very common once users start debugging this feature). Also short-circuits the skew bump when the current run's budget was already clamped by max_budget_usd — bumping skew cannot help if the config ceiling is the binding constraint; the code now prints a direct pointer to `clauck config doctor max_budget_usd <value>` instead. B2 — max_budget_usd default disagreement + silent opus clamping - DEFAULT_DOCTOR_CONFIG["max_budget_usd"] raised from $10 to $25 so the high-end opus bands (scale 0.85–0.95) are not silently clamped by default. Scale 1.00 still clamps at $25 as a genuine circuit breaker. - compute_sizing now reads all fallbacks from DEFAULT_DOCTOR_CONFIG keys instead of literal magic numbers at the call site (previously max_budget_usd fallback was 5.00 in code but 10.00 in defaults — the two paths now agree by construction). - cmd_size prints a visible "⚠ budget clamped" warning when the raw derivation exceeds max_budget_usd, pointing at the config command to raise it. Satisfies INTENT.md §4 "set a budget the user cannot audit violates this property" — clamps are now surfaced, not silent. B3 — missing `effort: low` band removed from expressible space Added a bottom band: (ceiling=0.07, haiku, low, 3 turns, $0.012/turn). Re-scaled the marketplace jobs whose original `effort: low` was getting upgraded to medium/high under the previous table: event-monitor, git-commit-nudge → complexity 0.05 (low tier, preserves original effort: low exactly); daily-verify, downloads-triage, workspace-cleanup → complexity 0.10 (medium tier, closer match than the prior 0.15). Band ceilings throughout SCALE_PARAMS shifted slightly to accommodate the new entry; scale-anchor tables updated in doctor interpreter prompt, SKILL.md, and CLAUDE.md. B4 — INTENT.md stale "three" + version not bumped - §8 "When checking for drift" line updated: "three architectural properties" → "four". - Version header bumped v1 → v2 per §8 ("Each amendment bumps it"). B5 — semantic interpreter contract-drift acknowledgment The natural-language semantic interpreter (clauck <anything>) still emits exec_model/exec_effort/exec_max_turns/exec_max_budget_usd in its routing JSON — a known gap vs the new §4 Cost transparency property. Refactoring that path touches more consumers than doctor did; tracked for follow-up rather than extended into this PR. INTENT.md §4 now explicitly names the scope of v2 compliance (doctor + scheduler + CLI surfaces + marketplace jobs, all fully compliant) and flags the semantic path as a known gap that must be closed before v3. _build_interpreter_prompt gets a docstring pointing at the same gap so any future maintainer sees it locally, not only in INTENT.md. N1 — scale_skew cross-contamination into scheduled jobs (high-value non-blocker) scale_skew is a doctor-scoped tuner: it auto-bumps when doctor hits budget, reflecting doctor task shapes. Previously every scheduled job's sizing read the same skew, so a user whose doctor had been auto-bumped to +0.20 would see every cron-fired job also run hotter despite never having truncated. Scheduler now loads its sizing config via a new _load_scheduler_sizing_config() that forces scale_skew=0 while keeping the other knobs (min/max/headroom/growth). Also lifted the config load out of _resolve_sizing into discover_jobs so the full tick shares one disk read (~30k reads/day saved at ~20 jobs × 60s ticks).

…541)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1b9296e23

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-18T13:36:16Z

+                fm_for_sizing = _parse_fm_block(fm_match.group(1))
+        except OSError:
+            pass
+    resolved = sizing.resolve_params(fm_for_sizing, 0, sizing.load_doctor_config())


Align inspect sizing with scheduler skew rules

cmd_inspect resolves complexity with sizing.load_doctor_config(), which includes the current doctor.scale_skew, but scheduled jobs are resolved with scale_skew forcibly set to 0.0 in scheduler._load_scheduler_sizing_config(). When doctor auto-skew has been bumped, clauck inspect <job> will show inflated model/turn/budget values that do not match what the scheduler will actually run, which undermines the new “resolved params” output and can lead users to make incorrect tuning decisions.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-18T13:36:16Z

+                findings.append(("error", f"complexity must be in [0.0, 1.0] (got {c})"))
+            else:
+                has_complexity = True
+                sized = sizing.compute_sizing(c, 0, sizing.load_doctor_config())


Use scheduler-equivalent config in complexity validation

The complexity preview in _validate_job_file calls sizing.compute_sizing(..., sizing.load_doctor_config()), so validation output is skewed by doctor.scale_skew, while scheduler execution intentionally ignores that skew (scale_skew = 0.0). This means clauck validate can report a derived tier/budget that is different from runtime behavior for the same job, producing misleading validation results right where users expect authoritative sizing feedback.

Useful? React with 👍 / 👎.

…102) ## Summary Empirical hard rule: **haiku is not a viable choice when the user's full MCP surface is loaded.** Tool descriptions across a typical MCP surface total ~150k tokens — combined with system prompts and tool results, this regularly exceeds haiku's effective working context and sends sessions into a compaction loop that burns budget without progress. Follows directly from the conversation around PR #101: we added context-*cost* tracking (dollars per turn) but not context-*fit* guardrails (does everything load before compaction). This PR adds one minimal, explicit fit rule. A more sophisticated window-fit layer (MODEL_CONTEXT_WINDOW table + per-model thresholds + measure-once MCP estimate) is tracked as a follow-up if this rule proves insufficient in practice. ## What changed - \`lib/sizing.py\` — new \`_promote_for_mcp\` helper + \`strict_mcp\` kwarg on \`compute_sizing\`. When \`strict_mcp=False\` (default), any haiku derivation bumps to the nearest sonnet band matching current effort (low→medium, medium→medium, high→high). Turns stay unchanged — we need a bigger **model**, not more steps. The promotion surfaces in the returned dict (\`mcp_promoted: bool\`) and in the \`explanation\` string. - \`resolve_params\` — reads \`strict_mcp_config\` from frontmatter, passes through to \`compute_sizing\`. - \`cmd_doctor\` — passes \`strict_mcp=False\` explicitly. Doctor's stage-2 does not pass \`--strict-mcp-config\` because the diagnostic agent needs MCP for \`--fix\` actions (post Slack escalations, query GitHub, etc.). So any haiku-tier scale auto-promotes. Eliminates the compaction-loop budget-exceeded failure mode that sparked PR #101 follow-up discussion. - \`cmd_size\` — new \`--strict-mcp\` flag for preview, plus a \`strict_mcp:\` line and \`↑ haiku auto-promoted\` note in output. - CLAUDE.md + SKILL.md — docs for the rule and how to opt out. ## Sanity check | Frontmatter | Scale | Before this PR | After this PR | |---|---|---|---| | \`complexity: 0.20\` (MCP loaded, default) | 0.20 | haiku/high/8t/$0.22 | sonnet/high/8t/$0.77 | | \`complexity: 0.20, strict_mcp_config: true\` | 0.20 | haiku/high/8t/$0.22 | haiku/high/8t/$0.22 (unchanged) | | Doctor \"is plist loaded\" | 0.05 | haiku/low/3t/$0.05 | sonnet/medium/3t/$0.22 | | \`complexity: 0.55\` (MCP loaded) | 0.55 | sonnet/medium/18t/$1.46 | sonnet/medium/18t/$1.46 (no promotion needed) | ## Test plan - [x] \`ast.parse\` clean for all modules - [x] ruff check passes (E9,F selection) - [x] Smoke: resolve_params on fm with / without \`strict_mcp_config\` derives expected model - [x] Smoke: compute_sizing with strict_mcp=False promotes haiku bands; with True keeps them - [ ] After merge + reinstall: \`clauck size 0.20\` defaults to sonnet derivation; \`--strict-mcp\` shows haiku - [ ] Doctor invocation that previously truncated on haiku now runs on sonnet and completes

CoreyRDean added 3 commits April 18, 2026 08:16

CoreyRDean added the enhancement New feature or request label Apr 18, 2026

CoreyRDean added 2 commits April 18, 2026 08:31

chore: remove accidental worktree gitlink

f1b9296

CoreyRDean marked this pull request as ready for review April 18, 2026 13:31

fix(ci): remove unused datetime imports + bare f-strings (ruff F401/F…

25c4d34

…541)

CoreyRDean merged commit c5d9ac0 into main Apr 18, 2026
2 checks passed

CoreyRDean deleted the feat/complexity-scale-sizing branch April 18, 2026 13:34

chatgpt-codex-connector Bot reviewed Apr 18, 2026

View reviewed changes

CoreyRDean mentioned this pull request Apr 18, 2026

fix(sizing): auto-promote haiku to sonnet when MCP surface is loaded #102

Merged

6 tasks

CoreyRDean mentioned this pull request Apr 20, 2026

Migrate semantic interpreter to task_complexity_scale (INTENT §4 v3 compliance) #135

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sizing): complexity-scale-based derivation for all Claude sessions#101

feat(sizing): complexity-scale-based derivation for all Claude sessions#101
CoreyRDean merged 6 commits into
mainfrom
feat/complexity-scale-sizing

CoreyRDean commented Apr 18, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CoreyRDean commented Apr 18, 2026

Summary

What changed

New module: `lib/sizing.py`

Doctor changes

New CLI surfaces

Scheduler

YAML parser (both scheduler and CLI)

Documentation

Backward compatibility

Known-failure regression checks

Test plan

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant