Skip to content

[ghcp-handoff] Auto-select model + effort based on handoff scope #24

@bminier

Description

@bminier

Problem

The pre-flight confirmation now lets users pick model and effort interactively (#3 follow-up, landed on `ghcp-g1-streaming`), but the defaults are still Copilot's baselines for every handoff regardless of scope. A 1-stream trivial scaffold and a 4-stream intricate refactor both default to `claude-sonnet-4.5` + `medium`.

Why it matters

  • Wasted spend on simple handoffs that don't need `medium` (let alone `high`).
  • Under-powered runs on ambiguous refactors that would benefit from `high` but no one thought to flip it.
  • The sibling `/codex` skill already has per-mode effort defaults (review: `high`, consult: `medium`) — ghcp-handoff's one-size-fits-all feels like a regression.

Approach

Two levels, increasing ambition. Implement incrementally:

Level A: default-by-scope (couple hours)

Heuristic based on collected inputs from guided mode:

Streams Boundaries NOT-in-scope globs → Default effort
1 ≤2 ≤1 `low`
2-3 ≤5 ≤3 `medium` (Copilot default — no flag)
4+ `high`
any judgment-heavy language detected in reasons `high`

Model default stays `claude-sonnet-4.5` for all tiers. Pre-flight summary shows "Effort: medium (inferred from 2 streams)" so the reasoning is visible; user can still override via the loop.

Level B: auto-derive from task summary + deliverables (more work)

Scorer that reads the collected Q1 task summary, the per-stream deliverables, and the boundaries, and scores the handoff on dimensions like:

  • Mechanical vs. judgment-heavy (keyword signals: "rename", "scaffold", "wire" vs. "design", "decide", "refactor")
  • Scope breadth (file count estimates, cross-module references)
  • Novelty (new tech stack entries vs. familiar)

Map the score to `low`/`medium`/`high`. Show the scoring rationale in the pre-flight summary so users understand the pick — and can fix it quickly when the heuristic gets it wrong.

Level A is probably 80% of the value; B is worth it only if A's heuristic turns out too coarse in practice.

Model selection

Less urgent. Copilot currently has three models (claude-sonnet-4.5 / claude-sonnet-4 / gpt-5) and the default is already the strongest. Revisit when:

  • Copilot adds meaningful cheaper tiers (claude-haiku equivalent)
  • BYOK makes per-task model selection meaningful (e.g., routing to an internal provider)

For now, scope-based model selection isn't high-leverage.

Review reference

Feature review conversation: "Do we have any method of choosing the copilot model and effort?" — fix #1 (interactive override) shipped on branch `ghcp-g1-streaming`; this issue tracks the smart-defaults follow-up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestghcp-handoffRelates to /ghcp-handoff skill

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions