feat(provider): add GitHub Copilot auto model selection#20136
Draft
fastdrumr wants to merge 12 commits intoanomalyco:devfrom
Draft
feat(provider): add GitHub Copilot auto model selection#20136fastdrumr wants to merge 12 commits intoanomalyco:devfrom
fastdrumr wants to merge 12 commits intoanomalyco:devfrom
Conversation
11e0baf to
0285d04
Compare
Open
1 task
Implements Copilot's server-side model routing protocol so users can select "Auto (Best for task)" and get the optimal model chosen per-turn with a 10% billing discount. Protocol (reverse-engineered from VS Code Copilot Chat extension): - POST /models/session — creates an auto-mode session - POST /models/session/intent — per-turn routing via Copilot-Session-Token header - Session token forwarded to chat/responses requests for discount tracking Includes fallback logic for image requests, same-prompt reuse, session expiry refresh, and provider-sticky model selection. Closes anomalyco#10093 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Copilot server-side routing API (/models/session) is restricted to first-party OAuth clients (VS Code). Implement client-side model selection using prompt complexity heuristics instead: - reasoning prompts → Opus/GPT-5.4 tier - standard coding tasks → Sonnet/Codex tier - short/simple prompts → Mini/Haiku/Flash tier Tested: "say hello" → gpt-5-mini, "explain security..." → claude-opus-4.6 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The capabilities type in the copilot plugin auth context doesn't include the interleaved field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove family, release_date, variants, and interleaved fields that don't exist in the v1 SDK Model type used by the plugin system. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update the assistant message's modelID from "auto" to the actual resolved model (e.g., "gpt-5-mini") via the response-metadata stream event. The TUI message header now shows which model was actually used. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds an "Auto (Best for task)" model to the GitHub Copilot provider that uses GPT-5-mini (free on Copilot) to classify prompt complexity and route to the best available model. Classification: GPT-5-mini rates each prompt 1-5, mapping to tiers: 1-2 → fast (mini/haiku/flash) 3 → standard (sonnet/gpt-4.1) 4-5 → reasoning (opus/codex) Model tiers are built dynamically from provider metadata (capabilities, output limits) — no hardcoded model lists. Prompts under 20 chars skip classification and go straight to fast. The TUI message header shows the resolved model name after each response (e.g., "Build · gpt-4.1 · 3s" instead of "Build · auto"). Closes anomalyco#10093 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
412de65 to
8b1792d
Compare
Codex models may not be available on all Copilot subscription tiers. Sort non-codex models first within each tier to maximize compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of sorting purely by output limits (which put GPT-5.4 above Opus for reasoning), use explicit preference lists per tier: - Reasoning: Opus > GPT-5.4 > GPT-5.x > Sonnet - Standard: Sonnet > GPT-4.1 > GPT-4o - Fast: GPT-5-mini > GPT-5.4-mini > Haiku > Flash Non-codex models still preferred within equal preference rank. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If a model returns "not supported" or similar error, mark it as unavailable and try the next candidate in the tier. Unavailable models are remembered for the session so subsequent requests skip them immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reasoning (3x credits): Opus, GPT-5.x, Gemini Pro Standard (1x credits): Sonnet, GPT-4.1, GPT-4o, Grok Fast (0.25x credits): Mini, Haiku, Flash Fixes gemini matching "mini" in fast tier, and moves Sonnet to standard where it belongs cost-wise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests cover tier classification, model preferences, fallback on unavailable models, resolved model ID tracking, and gemini classification edge cases. All tests use mocked classifier. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Set OPENCODE_AUTO_DRY_RUN=1 to use a local keyword heuristic instead of GPT-5-mini for classification. Useful for testing the routing logic without spending Copilot requests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #10093
Type of change
What does this PR do?
Adds an "Auto (Best for task)" model to the GitHub Copilot provider that automatically selects the best available model for each request using GPT-5-mini (free on Copilot) as a semantic complexity classifier.
How it works:
Why GPT-5-mini? We explored regex heuristics, claw-llm-router (15-dimension keyword scoring), fine-tuning bert-tiny on 113k labeled examples, RouteLLMs pre-trained BERT (278M params), and local LLMs (Qwen2.5-0.5B). GPT-5-mini outperformed all of them on coding prompts because it has the world knowledge to distinguish semantic complexity (e.g., "implement a linked list" vs "implement a lock-free concurrent hash map"). It is free on Copilot, requires no model files or dependencies, and adds 1-3s latency per request.
Design decisions:
Note: The Copilot server-side routing API (
/models/session) used by VS Code is restricted to first-party OAuth clients. Achat.modelplugin hook (see #10093 discussion) would allow this to be an external plugin instead of a core change.Files changed:
packages/opencode/src/provider/sdk/copilot/auto-model.ts(new) — CopilotAutoLanguageModel with GPT-5-mini classification and tiered routingpackages/opencode/src/provider/provider.ts— intercepts modelID === "auto" in copilot custom loader; sorts "auto" firstpackages/opencode/src/plugin/copilot.ts— injects the "auto" model entry during auth loadingpackages/opencode/src/provider/sdk/copilot/index.ts— re-exportspackages/opencode/src/session/processor.ts— updates message modelID from "auto" to resolved model nameHow did you verify your code works?
Tested locally with
bun devandopencode run --model github-copilot/auto --print-logs:Screenshots / recordings
N/A — the only UI change is the message header showing the resolved model name instead of "auto"
Checklist