Skip to content

feat(provider): add GitHub Copilot auto model selection#20136

Draft
fastdrumr wants to merge 12 commits intoanomalyco:devfrom
fastdrumr:feat/copilot-auto-model-selection
Draft

feat(provider): add GitHub Copilot auto model selection#20136
fastdrumr wants to merge 12 commits intoanomalyco:devfrom
fastdrumr:feat/copilot-auto-model-selection

Conversation

@fastdrumr
Copy link
Copy Markdown

@fastdrumr fastdrumr commented Mar 30, 2026

Issue for this PR

Closes #10093

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Adds an "Auto (Best for task)" model to the GitHub Copilot provider that automatically selects the best available model for each request using GPT-5-mini (free on Copilot) as a semantic complexity classifier.

How it works:

  1. User selects "Auto (Best for task)" from the Copilot model list
  2. Before each LLM call, the auto model sends the user prompt to GPT-5-mini asking for a 1-5 complexity rating
  3. The rating maps to model tiers: 1-2 = fast (mini/haiku/flash), 3 = standard (sonnet/gpt-4.1), 4-5 = reasoning (opus/codex)
  4. The best available model from the matching tier is selected
  5. The TUI message header shows the resolved model name (e.g., "Build · gpt-4.1 · 3s" instead of "Build · auto")

Why GPT-5-mini? We explored regex heuristics, claw-llm-router (15-dimension keyword scoring), fine-tuning bert-tiny on 113k labeled examples, RouteLLMs pre-trained BERT (278M params), and local LLMs (Qwen2.5-0.5B). GPT-5-mini outperformed all of them on coding prompts because it has the world knowledge to distinguish semantic complexity (e.g., "implement a linked list" vs "implement a lock-free concurrent hash map"). It is free on Copilot, requires no model files or dependencies, and adds 1-3s latency per request.

Design decisions:

  • Model tiers built dynamically from provider metadata (capabilities, output limits) — no hardcoded model lists, works as new models are added
  • Only the last user message is classified, not the full conversation history
  • Prompts under 20 chars skip classification entirely (fast tier)
  • Classification errors fall back to standard tier

Note: The Copilot server-side routing API (/models/session) used by VS Code is restricted to first-party OAuth clients. A chat.model plugin hook (see #10093 discussion) would allow this to be an external plugin instead of a core change.

Files changed:

  • packages/opencode/src/provider/sdk/copilot/auto-model.ts (new) — CopilotAutoLanguageModel with GPT-5-mini classification and tiered routing
  • packages/opencode/src/provider/provider.ts — intercepts modelID === "auto" in copilot custom loader; sorts "auto" first
  • packages/opencode/src/plugin/copilot.ts — injects the "auto" model entry during auth loading
  • packages/opencode/src/provider/sdk/copilot/index.ts — re-exports
  • packages/opencode/src/session/processor.ts — updates message modelID from "auto" to resolved model name

How did you verify your code works?

Tested locally with bun dev and opencode run --model github-copilot/auto --print-logs:

  • "hello" (6 chars) → skipped classification → fast → gpt-5.1-codex-mini
  • "implement a LRU cache" → GPT-5-mini rated 3 → standard → gpt-4.1
  • "design a distributed consensus algorithm similar to Raft" → GPT-5-mini rated 5 → reasoning → gpt-5.2-codex
  • "write a function to add two numbers" → GPT-5-mini rated 1 → fast → gpt-5.1-codex-mini
  • "debug why distributed cache returns stale data under high concurrency" → GPT-5-mini rated 4 → reasoning → gpt-5.2-codex
  • TUI message header updates to show resolved model name
  • Typecheck passes (bun turbo typecheck)

Screenshots / recordings

N/A — the only UI change is the message header showing the resolved model name instead of "auto"

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@fastdrumr fastdrumr force-pushed the feat/copilot-auto-model-selection branch 6 times, most recently from 11e0baf to 0285d04 Compare March 31, 2026 00:22
@fastdrumr fastdrumr marked this pull request as draft March 31, 2026 00:31
fastdrumr and others added 6 commits March 31, 2026 10:38
Implements Copilot's server-side model routing protocol so users can
select "Auto (Best for task)" and get the optimal model chosen per-turn
with a 10% billing discount.

Protocol (reverse-engineered from VS Code Copilot Chat extension):
- POST /models/session — creates an auto-mode session
- POST /models/session/intent — per-turn routing via Copilot-Session-Token header
- Session token forwarded to chat/responses requests for discount tracking

Includes fallback logic for image requests, same-prompt reuse, session
expiry refresh, and provider-sticky model selection.

Closes anomalyco#10093

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Copilot server-side routing API (/models/session) is restricted
to first-party OAuth clients (VS Code). Implement client-side model
selection using prompt complexity heuristics instead:
- reasoning prompts → Opus/GPT-5.4 tier
- standard coding tasks → Sonnet/Codex tier
- short/simple prompts → Mini/Haiku/Flash tier

Tested: "say hello" → gpt-5-mini, "explain security..." → claude-opus-4.6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The capabilities type in the copilot plugin auth context doesn't
include the interleaved field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove family, release_date, variants, and interleaved fields that
don't exist in the v1 SDK Model type used by the plugin system.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update the assistant message's modelID from "auto" to the actual
resolved model (e.g., "gpt-5-mini") via the response-metadata stream
event. The TUI message header now shows which model was actually used.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds an "Auto (Best for task)" model to the GitHub Copilot provider
that uses GPT-5-mini (free on Copilot) to classify prompt complexity
and route to the best available model.

Classification: GPT-5-mini rates each prompt 1-5, mapping to tiers:
  1-2 → fast (mini/haiku/flash)
  3   → standard (sonnet/gpt-4.1)
  4-5 → reasoning (opus/codex)

Model tiers are built dynamically from provider metadata (capabilities,
output limits) — no hardcoded model lists. Prompts under 20 chars
skip classification and go straight to fast.

The TUI message header shows the resolved model name after each
response (e.g., "Build · gpt-4.1 · 3s" instead of "Build · auto").

Closes anomalyco#10093

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fastdrumr fastdrumr force-pushed the feat/copilot-auto-model-selection branch from 412de65 to 8b1792d Compare March 31, 2026 09:39
fastdrumr and others added 6 commits March 31, 2026 11:11
Codex models may not be available on all Copilot subscription tiers.
Sort non-codex models first within each tier to maximize compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of sorting purely by output limits (which put GPT-5.4 above
Opus for reasoning), use explicit preference lists per tier:
- Reasoning: Opus > GPT-5.4 > GPT-5.x > Sonnet
- Standard: Sonnet > GPT-4.1 > GPT-4o
- Fast: GPT-5-mini > GPT-5.4-mini > Haiku > Flash

Non-codex models still preferred within equal preference rank.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If a model returns "not supported" or similar error, mark it as
unavailable and try the next candidate in the tier. Unavailable
models are remembered for the session so subsequent requests skip
them immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reasoning (3x credits): Opus, GPT-5.x, Gemini Pro
Standard (1x credits): Sonnet, GPT-4.1, GPT-4o, Grok
Fast (0.25x credits): Mini, Haiku, Flash

Fixes gemini matching "mini" in fast tier, and moves Sonnet to
standard where it belongs cost-wise.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests cover tier classification, model preferences, fallback on
unavailable models, resolved model ID tracking, and gemini
classification edge cases. All tests use mocked classifier.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set OPENCODE_AUTO_DRY_RUN=1 to use a local keyword heuristic instead
of GPT-5-mini for classification. Useful for testing the routing
logic without spending Copilot requests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Expose GitHub Copilot "Auto" option in model selector

1 participant