English | 简体中文
Important
Before using, please be aware of the following:
-
Claude Code configuration: When using with Claude Code, please configure the model ID as
claude-opus-4-6orclaude-opus-4.6(without the[1m]suffix, exceeding GitHub Copilot's context window limit too much may lead to being banned). Example claudesettings.jsonsee Manual Configuration withsettings.json. -
Recommend for Opencode: For opencode, prefer the opencode OAuth app. It matches opencode's built-in GitHub Copilot provider and avoids Terms of Service risk:
npx @jeffreycao/copilot-api@latest --oauth-app=opencode start
-
Built-in
codexprovider: Runnpx @jeffreycao/copilot-api@latest auth login --provider codexonce and the gateway will persist and refresh Codex OAuth credentials automatically. -
Disable multi agent when using codex: If you're using codex via GitHub Copilot, disable multi agent. Copilot currently charges codex traffic based on whether the last message is a user role, and that billing logic has not been adjusted.
-
Note: See GitHub Copilot Security Notice for the warning removed from the README header.
A reverse-engineered GitHub Copilot integration that also works as a small AI gateway. Besides Copilot, it can route the built-in codex provider and configured third-party providers such as DashScope behind OpenAI- and Anthropic-compatible APIs, so tools like Claude Code can use one local endpoint.
On the GitHub Copilot path, the gateway prefers Copilot's native Anthropic-style Messages API when available, preserving more Claude-native behavior for tool-heavy workflows.
- OpenAI and Anthropic compatibility: Serve
/v1/responses,/v1/chat/completions,/v1/models,/v1/embeddings, and/v1/messagesfrom one local gateway. - One gateway for Copilot,
codex, and external providers: Route GitHub Copilot, the built-incodexprovider, and configured third-party providers behind the same endpoint. - Agent-friendly Claude handling on Copilot: Prefer native
/v1/messageswhen available, preserve Claude-style tool flows, support Anthropic beta features, and keep subagent/session markers intact. - Claude Code and OpenCode integration: Works with Claude Code and OpenCode, including direct Anthropic-compatible usage through
@ai-sdk/anthropic. - Flexible auth and deployment options: Supports interactive login or direct tokens, individual/business/enterprise plans, GitHub Enterprise, opencode OAuth, and custom data directories.
- Local control and visibility: Includes a usage dashboard, rate limiting, manual approval, and optional token visibility for debugging.
- Multi-provider routing: Expose provider-specific
/:provider/...routes or usemodel: "provider/model"on the top-level API. - Better token and context management: Supports exact Claude token counting and configurable GPT context compaction for long-running conversations.
- Bun (>= 1.2.x)
- Node.js if you plan to run the published CLI with
npx - GitHub account with Copilot subscription (individual, business, or enterprise)
To install dependencies, run:
bun installTo start the server directly from source:
bun run start startYou can run the project directly using npx:
Important
Token usage storage uses Node's built-in node:sqlite module when running with npx. It is enabled on Node.js >= 22.13.0. On Node.js < 22.13.0, the CLI still starts, but token usage storage is disabled.
If you want token usage storage without upgrading Node.js, run the published CLI with Bun instead: bunx --bun @jeffreycao/copilot-api@latest start.
npx @jeffreycao/copilot-api@latest startWith options:
npx @jeffreycao/copilot-api@latest start --port 8080For authentication only:
npx @jeffreycao/copilot-api@latest authIf you prefer a GUI, this repository also includes an Electron desktop app in desktop/. It supports GitHub Copilot sign-in or manual token entry, can start and stop the local proxy with one click, and shows the local endpoint, auth header, available models, usage, and logs in the app.
The settings screen also exposes OAuth App, API Home, Enterprise URL, verbose logging, and minimize-to-tray. Desktop packages are published in GitHub Releases:
https://github.com/caozhiyuan/copilot-api/releases
Download the installer for your platform, sign in inside the app, choose a port, start the server, then point your client at the local endpoint shown in the app. Packaged desktop builds use the bundled Electron runtime, so normal desktop usage does not require installing Node.js separately. Token usage history is enabled when that bundled runtime supports SQLite.
The desktop app's Advanced Config page reads and writes model mappings through GET/POST /admin/config/model-mappings. It uses auth.adminApiKey instead of the regular auth.apiKeys, and the app reads that key directly from config.json after the server has generated it on startup.
Main dashboard, token usage breakdown in the bundled Electron app:
Build the image:
docker build -t copilot-api .Run the container with a bind mount so auth data survives restarts:
mkdir -p ./copilot-data
docker run -p 4141:4141 -v $(pwd)/copilot-data:/root/.local/share/copilot-api copilot-apiThis stores GitHub auth data in ./copilot-data on the host, mapped to /root/.local/share/copilot-api in the container.
Or pass a GitHub token directly:
docker run -p 4141:4141 -e GH_TOKEN=your_github_token_here copilot-apiCopilot API now uses a subcommand structure with these main commands:
start: Start the Copilot API server. This command will also handle authentication if needed.auth: Run GitHub authentication flow without starting the server. This is typically used if you need to generate a token for use with the--github-tokenoption, especially in non-interactive environments.check-usage: Show your current GitHub Copilot usage and quota information directly in the terminal (no server required).debug: Display diagnostic information including version, runtime details, file paths, and authentication status. Useful for troubleshooting and support.
The following options can be used with any subcommand. When passing them before the subcommand, use the --key=value form:
| Option | Description | Default | Alias |
|---|---|---|---|
| --api-home | Path to the API home directory (sets COPILOT_API_HOME) | none | none |
| --oauth-app | OAuth app identifier (sets COPILOT_API_OAUTH_APP) | none | none |
| --enterprise-url | Enterprise URL for GitHub (sets COPILOT_API_ENTERPRISE_URL) | none | none |
The following command line options are available for the start command:
| Option | Description | Default | Alias |
|---|---|---|---|
| --port | Port to listen on | 4141 | -p |
| --verbose | Enable verbose logging | false | -v |
| --account-type | Account type to use (individual, business, enterprise) | individual | -a |
| --manual | Enable manual request approval | false | none |
| --rate-limit | Rate limit in seconds between requests | none | -r |
| --wait | Wait instead of error when rate limit is hit | false | -w |
| --github-token | Provide GitHub token directly (must be generated using the auth subcommand) |
none | -g |
| --claude-code | Generate a command to launch Claude Code with Copilot API config | false | -c |
| --show-token | Show GitHub and Copilot tokens on fetch and refresh | false | none |
| --proxy-env | Initialize proxy from environment variables | false | none |
| Option | Description | Default | Alias |
|---|---|---|---|
| --verbose | Enable verbose logging | false | -v |
| --show-token | Show GitHub token on auth | false | none |
| Option | Description | Default | Alias |
|---|---|---|---|
| --json | Output debug info as JSON | false | none |
- Location:
~/.local/share/copilot-api/config.json(Linux/macOS) or%USERPROFILE%\.local\share\copilot-api\config.json(Windows). - Default shape:
{ "auth": { "apiKeys": [], "adminApiKey": "<auto-generated-on-startup>" }, "providers": { "custom": { "type": "anthropic", "enabled": true, "baseUrl": "your-base-url", "apiKey": "sk-your-provider-key", "authType": "x-api-key", "adjustInputTokens": false, "models": { "kimi-k2.5": { "temperature": 1, "topP": 0.95 } } }, "dashscope": { "type": "openai-compatible", "enabled": true, "baseUrl": "https://dashscope.aliyuncs.com/compatible-mode", "apiKey": "sk-your-dashscope-key", "models": { "qwen3.6-plus": { "temperature": 1, "topP": 0.95, "topK": 20, "extraBody": { "preserve_thinking": true }, "contextCache": true }, "glm-5.1": { "temperature": 0.7, "topP": 0.95, "contextCache": true, "extraBody": { "preserve_thinking": true } } } } }, "modelMappings": {}, "extraPrompts": { "gpt-5-mini": "<built-in exploration prompt>", "gpt-5.3-codex": "<built-in commentary prompt>", "gpt-5.4-mini": "<built-in commentary prompt>", "gpt-5.4": "<built-in commentary prompt>" }, "smallModel": "gpt-5-mini", "responsesApiContextManagementModels": [], "modelReasoningEfforts": { "gpt-5-mini": "low", "gpt-5.3-codex": "xhigh", "gpt-5.4-mini": "xhigh", "gpt-5.4": "xhigh" }, "useMessagesApi": true, "useResponsesApiWebSocket": true, "useResponsesApiWebSearch": true } - auth.apiKeys: API keys used for request authentication on non-admin routes. Supports multiple keys for rotation. Requests can authenticate with either
x-api-key: <key>orAuthorization: Bearer <key>. If empty or omitted, authentication for non-admin routes is disabled. - auth.adminApiKey: Single admin key used only for
/admin/*routes. If missing, the server generates a random key at startup and writes it back toconfig.json. Requests use the samex-api-keyorAuthorization: Bearerheaders, but regularauth.apiKeysnever grant access to/admin/*. - modelMappings: Exact
sourceModel -> targetModelrewrites for top-levelPOST /v1/messagesandPOST /v1/messages/count_tokensrequests. Omit it or leave it as{}to disable rewrites. Both the source and target must be non-empty strings. Targets can be regular model IDs orprovider/modelaliases such asdashscope/qwen3.6-plus, and the rewrite happens before provider alias parsing. The admin endpointsGET/POST /admin/config/model-mappingsread and update only this field. - extraPrompts: Map of
model -> promptappended to the first system prompt when translating Anthropic-style requests to Copilot. Use this to inject guardrails or guidance per model. Missing default entries are auto-added without overwriting your custom prompts. The built-in prompts forgpt-5.3-codexandgpt-5.4enable phase-aware commentary, which lets the model emit a short user-facing progress update before tools or deeper reasoning. - providers: Global upstream provider map. Each provider key (for example
dashscope) becomes a route prefix (/dashscope/v1/messages). Supportstype: "anthropic",type: "openai-compatible", andtype: "openai-responses". Top-level clients can also usemodel: "dashscope/model-id"with/v1/messages,/v1/messages/count_tokens, and/v1/responses; the gateway strips thedashscope/prefix before forwarding upstream.GET /v1/modelsdoes not aggregate provider models; useGET /dashscope/v1/modelsfor provider model lists.enableddefaults totrueif omitted.baseUrlshould be provider API base URL without the final endpoint. For Anthropic providers, omit/v1/messages; for OpenAI-compatible providers, omit/v1/chat/completions; for OpenAI Responses providers, omit/v1/responses.apiKeyis used as the upstream credential value and is required for regular providers.authType(optional): Controls howapiKeyis sent upstream. Supportsx-api-keyandauthorizationfor regular providers. Anthropic providers default tox-api-key; OpenAI-compatible and OpenAI Responses providers default toauthorization. When set toauthorization, the proxy sendsAuthorization: Bearer <apiKey>.oauth2is reserved for the built-incodexprovider and is written automatically byauth login --provider codex.adjustInputTokens(optional): Whentrue, the proxy will adjust theinput_tokensin the usage response by subtractingcache_read_input_tokensandcache_creation_input_tokens.models(optional): Per-model configuration map. Each key is a model ID (matching the model name in requests), and the value is:temperature(optional): Default temperature value used when the request does not specify one.topP(optional): Default top_p value used when the request does not specify one.topK(optional): Default top_k value used when the request does not specify one.extraBody(optional): Dynamic fields merged into the upstream request body for that model. Request body fields with the same name take precedence. OpenAI-compatible providers can use this for fields such asenable_thinking,preserve_thinking,reasoning_effort.thinking_budgetis a special OpenAI-compatible provider override: when configured inextraBody, it is forced after Anthropicthinking.budget_tokenstranslation and overrides the request-derived budget.contextCache(optional): Defaults totruefor OpenAI-compatible providers. This enables Alibaba Cloud Model Studio/DashScope explicit context cache by injectingcache_control: { "type": "ephemeral" }on up to 4 content blocks using the Context Cache format. The cache breakpoint strategy matches opencode's main provider flow: the first 2 system messages plus the last 2 non-system messages. Marked string content is converted to text content part arrays forsystem/user/assistant/toolmessages; existing array content is marked on the last part. Set this tofalsewhen the model already supports implicit caching, or when the upstream does not accept this explicit-cache extension field.supportPdf(optional): Controls whether the model supports PDF/document content. Defaults tofalse; unsupported PDFs are converted to a text notice. Set it totrueto send PDF/document blocks as OpenAI Chat Completions file parts.toolContentSupportType(optional): Tool result content capabilities for that model, as an array ofarray,image, andpdf. Provider routes default to string-only tool content when omitted. IfsupportPdfistruebut this list does not includepdf, file parts in tool results are moved to user role messages. This provider default does not change the Copilot main flow, which continues to support array + image and not PDF.
- smallModel: Fallback model used for tool-less warmup messages (e.g., Claude Code probe requests); defaults to gpt-5-mini.
- responsesApiContextManagementModels: List of GPT model IDs that should receive Responses API
context_managementcompaction instructions. This defaults to[], so you need to opt in explicitly. A good starting point is["gpt-5-mini", "gpt-5.3-codex", "gpt-5.4-mini", "gpt-5.4"]. When enabled, the request includescontext_managementin the body and keeps only the latest compaction carrier on follow-up turns. The actual compaction is handled server-side and appears to begin when usage approaches roughly 90% of the model'smaxPromptTokens, which makes it especially useful for long-running tasks. In practice, the effectivecompact_thresholdalso appears to be fixed on the server side, so changing it in this project does not currently alter compaction behavior. At the moment, this optimization is intended for GPT-family models only. - modelReasoningEfforts: Per-model
reasoning.effortsent to the Copilot Responses API. Allowed values arenone,minimal,low,medium,high, andxhigh. If a model isn’t listed,highis used by default. - useMessagesApi: When
true, Claude-family models that support Copilot's native/v1/messagesendpoint will use the Messages API; otherwise they fall back to/chat/completions. Set tofalseto disable Messages API routing and always use/chat/completions. Defaults totrue. - useResponsesApiWebSocket: When
true, Responses API requests use Copilot's websocket transport for models that advertisews:/responses; models that only advertise/responsescontinue to use HTTP. Set tofalseto disable websocket routing and use HTTP/responseswhenever the selected model supports it. Defaults totrue. - useResponsesApiWebSearch: When
true, the server keeps Responses API tools withtype: "web_search"and forwards them upstream. Set tofalseto strip those tools from/responsespayloads. Defaults totrue. - claudeTokenMultiplier: Multiplier applied to the fallback GPT-tokenizer estimate for Claude
/v1/messages/count_tokensrequests. Defaults to1.15. Increase it if your client is still compacting too late. This setting is only used when the proxy is estimating Claude tokens locally; ifanthropicApiKeyis configured and Anthropic token counting succeeds, the exact Anthropic count is returned instead. - anthropicApiKey: Anthropic API key used to forward Claude
/v1/messages/count_tokensrequests to Anthropic's real token counting endpoint, which returns exact counts instead of GPT tokenizer estimates. Can also be set via theANTHROPIC_API_KEYenvironment variable. If not set, or if the upstream call fails, token counting falls back to local GPT tokenizer estimation controlled byclaudeTokenMultiplier.
Edit this file to customize prompts or swap in your own fast model. Restart the server (or rerun the command) after changes so the cached config is refreshed.
- Protected non-admin routes: All routes except
/,/usage-viewer, and/usage-viewer/require authentication whenauth.apiKeysis configured and non-empty. - Admin routes: All
/admin/*routes requireauth.adminApiKey. If it is missing, the server generates one at startup and persists it toconfig.jsonbefore serving requests. - Allowed auth headers:
x-api-key: <your_key>Authorization: Bearer <your_key>
- CORS preflight:
OPTIONSrequests are always allowed. - When no regular keys are configured: Non-admin routes continue to allow requests. This does not apply to
/admin/*, which only acceptsauth.adminApiKey.
Example request for a regular protected route:
curl http://localhost:4141/v1/models \
-H "x-api-key: your_api_key"Example request for an admin route:
curl http://localhost:4141/admin/config/model-mappings \
-H "x-api-key: your_admin_api_key"The server exposes several endpoints to interact with the Copilot API. It provides OpenAI-compatible endpoints and now also includes support for Anthropic-compatible endpoints, allowing for greater flexibility with different tools and services.
These endpoints mimic the OpenAI API structure.
| Endpoint | Method | Description |
|---|---|---|
POST /v1/responses |
POST |
OpenAI Most advanced interface for generating model responses. Supports provider/model aliases for openai-responses providers. |
POST /v1/chat/completions |
POST |
Creates a model response for the given chat conversation. |
GET /v1/models |
GET |
Lists the currently available models. |
POST /v1/embeddings |
POST |
Creates an embedding vector representing the input text. |
These endpoints are designed to be compatible with the Anthropic Messages API.
| Endpoint | Method | Description |
|---|---|---|
POST /v1/messages |
POST |
Creates a model response for a given conversation. Supports provider/model aliases for configured providers. |
POST /v1/messages/count_tokens |
POST |
Calculates the number of tokens for a given set of messages. Supports provider/model aliases for configured providers. |
POST /:provider/v1/messages |
POST |
Proxies Anthropic Messages requests to the configured Anthropic, OpenAI-compatible, or OpenAI Responses provider. |
GET /:provider/v1/models |
GET |
Proxies model listing requests to the configured provider. |
POST /:provider/v1/messages/count_tokens |
POST |
Calculates tokens locally for provider route requests. |
New endpoints for monitoring your Copilot usage and quotas.
| Endpoint | Method | Description |
|---|---|---|
GET /usage |
GET |
Get detailed Copilot usage statistics and quota information. |
GET /token |
GET |
Get the current Copilot token being used by the API. |
These endpoints are reserved for local administrative actions and only accept auth.adminApiKey.
| Endpoint | Method | Description |
|---|---|---|
GET /admin/config/model-mappings |
GET |
Returns the current config.json path and the active modelMappings map. |
POST /admin/config/model-mappings |
POST |
Updates only the modelMappings field in config.json and returns it back. |
Common npx commands:
# Start the gateway
npx @jeffreycao/copilot-api@latest start
# Start on a custom port with verbose logging
npx @jeffreycao/copilot-api@latest start --port 8080 --verbose
# Run the auth flow
npx @jeffreycao/copilot-api@latest auth login
# Check Copilot usage without starting the server
npx @jeffreycao/copilot-api@latest check-usage
# Print debug information as JSON
npx @jeffreycao/copilot-api@latest debug --json
# Run the published CLI with Bun instead of Node.js
bunx --bun @jeffreycao/copilot-api@latest startThis AI gateway can be used to power Claude Code, an experimental conversational AI assistant for developers from Anthropic.
There are two ways to configure Claude Code to use this AI gateway:
To get started, run the start command with the --claude-code flag:
npx @jeffreycao/copilot-api@latest start --claude-codeYou will be prompted to select a primary model and a "small, fast" model for background tasks. After selecting the models, a command will be copied to your clipboard. This command sets the necessary environment variables for Claude Code to use the gateway.
Paste and run this command in a new terminal to launch Claude Code.
Alternatively, you can configure Claude Code by creating a .claude/settings.json file in your project's root directory. This file should contain the environment variables needed by Claude Code. This way you don't need to run the interactive setup every time.
Here is an example .claude/settings.json file:
{
"env": {
"ANTHROPIC_BASE_URL": "http://localhost:4141",
"ANTHROPIC_AUTH_TOKEN": "dummy",
"ANTHROPIC_MODEL": "gpt-5.4",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "gpt-5.4",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "gpt-5-mini",
"DISABLE_NON_ESSENTIAL_MODEL_CALLS": "1",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"CLAUDE_CODE_ATTRIBUTION_HEADER": "0",
"CLAUDE_CODE_ENABLE_PROMPT_SUGGESTION": "false",
"CLAUDE_CODE_DISABLE_TERMINAL_TITLE": "true",
"CLAUDE_CODE_ENABLE_AWAY_SUMMARY": "0",
"CLAUDE_PLUGIN_ENABLE_QUESTION_RULES": "true"
},
"permissions": {
"deny": [
"WebSearch",
"mcp__ide__executeCode"
]
}
}- Replace
ANTHROPIC_MODEL,ANTHROPIC_DEFAULT_OPUS_MODEL,ANTHROPIC_DEFAULT_SONNET_MODEL, andANTHROPIC_DEFAULT_HAIKU_MODELaccording to your needs. After configuration, please install the claude code plugin Plugin Integrations. If configuring the claude model, it is recommended to set all model configurations the same, so as to remain consistent with github-copilot claude agent behavior. - Setting CLAUDE_CODE_ATTRIBUTION_HEADER to 0 can prevent Claude code from adding billing and version information in system prompts, thereby avoiding prompt cache invalidation.
- Turning off CLAUDE_CODE_ENABLE_PROMPT_SUGGESTION and CLAUDE_CODE_ENABLE_AWAY_SUMMARY can prevent quota from being consumed unnecessarily.
- Permissions deny WebSearch because the GitHub Copilot API does not support natie websearch (some gpt models support websearch, but the current project has not adapted websearch); it is recommended to install the mcp mcp_server_fetch tool or other search tools as alternatives..
- If using a non-Claude model, do not enable ENABLE_TOOL_SEARCH. If using the Claude model, can enable ENABLE_TOOL_SEARCH. The current Claude Code uses the client tool search mode. In this mode, loading defer tools requires an additional request each time.
You can find more options here: Claude Code settings
You can also read more about IDE integration here: Add Claude Code to your IDE
For GPT Responses models such as gpt-5.4+, this AI gateway can expose Responses tool_search through a small MCP bridge. The same bridge can be used by Claude Code and opencode, as long as the client loads MCP servers and sends Anthropic Messages traffic through this gateway.
Do not set Claude Code's native ENABLE_TOOL_SEARCH for GPT models. That flag enables Claude Code's own client-side tool search mode, and it may stop forwarding deferred tool definitions. This gateway needs the full tool definitions so it can keep the small always-loaded tool set eager and translate every other tool into Responses deferred namespaces.
If you install tool-search@copilot-api-marketplace, Claude Code receives this MCP bridge automatically and you can skip the manual Claude Code MCP setup below.
Add the tool search bridge to the MCP config used by Claude Code:
{
"mcpServers": {
"tool_search": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@jeffreycao/copilot-api@latest", "mcp"]
}
}
}Add the tool search bridge to the MCP config used by opencode:
{
"mcp": {
"tool_search": {
"type": "local",
"command": ["npx", "-y", "@jeffreycao/copilot-api@latest", "mcp"]
}
}
}For local development, use bun as the command and ["run", "./src/main.ts", "mcp"] as the args.
Internally, the gateway now configures OpenAI Responses tool_search in client-executed mode. Deferred tools are still exposed as searchable namespaces, but the model is explicitly asked to return the exact deferred tool names it wants to load next.
The bridge uses direct tool selection, not query search. Its tool input is names, a comma-separated list of exact deferred tool names, for example TaskList,TaskGet,mcp__fetch__fetch.
OpenCode already has a direct GitHub Copilot provider. Use this section when you want OpenCode to point at this AI gateway through @ai-sdk/anthropic and reuse the agent behaviors described earlier in this README.
Start the AI gateway with the OpenCode OAuth app:
npx @jeffreycao/copilot-api@latest --oauth-app=opencode startThen point OpenCode at the gateway with @ai-sdk/anthropic.
Example ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"model": "local/gpt-5.4",
"small_model": "local/gpt-5-mini",
"agent": {
"build": {
"model": "local/gpt-5.4"
},
"plan": {
"model": "local/gpt-5.4"
},
"explore": {
"model": "local/gpt-5-mini"
}
},
"provider": {
"local": {
"npm": "@ai-sdk/anthropic",
"name": "Copilot API Proxy",
"options": {
"baseURL": "http://localhost:4141/v1",
"apiKey": "dummy"
},
"models": {
"gpt-5.4": {
"name": "gpt-5.4",
"modalities": {
"input": ["text", "image"],
"output": ["text"]
},
"limit": {
"context": 272000,
"output": 128000
}
},
"gpt-5-mini": {
"name": "gpt-5-mini",
"limit": {
"context": 128000,
"output": 64000
}
},
"claude-sonnet-4.6": {
"id": "claude-sonnet-4.6",
"name": "claude-sonnet-4.6",
"modalities": {
"input": ["text", "image"],
"output": ["text"]
},
"limit": {
"context": 128000,
"output": 32000
},
"options": {
"thinking": {
"type": "enabled",
"budgetTokens": 31999
}
}
}
}
}
}
}Why these fields matter:
npm: "@ai-sdk/anthropic"is the important part. OpenCode will speak Anthropic Messages semantics to this AI gateway instead of flattening everything into OpenAI Chat Completions.options.baseURLshould behttp://localhost:4141/v1; the Anthropic SDK will append/messages,/models, and/messages/count_tokensautomatically.model,small_model, andagent.*.modellet you keepgpt-5.4for build/plan work while routing exploration and background work togpt-5-mini.- If you enable
auth.apiKeysin this AI gateway, replacedummywith a real key. Otherwise any placeholder value is fine.
Plugin integrations are available for Claude Code and opencode.
The Claude Code integration is packaged as two plugins:
-
agent-injectinjects__SUBAGENT_MARKER__...onSubagentStart, so the gateway can inferx-initiator: agent. -
tool-searchregisters thetool_searchMCP bridge used for GPT Responses deferred tool loading. -
Marketplace catalog in this repository:
.claude-plugin/marketplace.json -
Plugin sources in this repository:
plugin/claude/agent-inject,plugin/claude/tool-search
Add the marketplace remotely:
/plugin marketplace add https://github.com/caozhiyuan/copilot-api.gitInstall the plugins from the marketplace:
/plugin install agent-inject@copilot-api-marketplace
/plugin install tool-search@copilot-api-marketplaceAfter installation, agent-inject injects __SUBAGENT_MARKER__... on SubagentStart, and the gateway uses it to infer x-initiator: agent.
The agent-inject plugin also registers a UserPromptSubmit hook that returns {"continue": true}, and it can inject SessionStart reminder rules through environment variables:
CLAUDE_PLUGIN_ENABLE_QUESTION_RULES=1enables the two reminders about using thequestiontool automatically for Claude Code. Alternatively, you can add the same reminders manually inCLAUDE.md; see CLAUDE.md or AGENTS.md Recommended Content.CLAUDE_PLUGIN_ENABLE_NO_BACKGROUND_AGENTS_RULE=1enables therun_in_background: trueavoidance reminder for agent hooks.
The tool-search plugin bundles the same MCP bridge described in GPT Tool Search, so Claude Code users do not need to add the tool_search server manually when they install that plugin.
The subagent marker producer is packaged as an opencode plugin located at plugin/opencode/subagent-marker.js.
Installation:
Copy the plugin file to your opencode plugins directory:
# Clone or download this repository, then copy the plugin
cp plugin/opencode/subagent-marker.js ~/.config/opencode/plugins/Or manually create the file at ~/.config/opencode/plugins/subagent-marker.js with the plugin content.
Features:
- Tracks sub-sessions created by subagents
- Automatically prepends a marker system reminder (
__SUBAGENT_MARKER__...) to subagent chat messages - Sets
x-session-idheader for session tracking - Enables the gateway to infer
x-initiator: agentfor subagent-originated requests
The plugin hooks into session.created, session.deleted, chat.message, and chat.headers events to provide seamless subagent marker functionality.
After starting the server, a URL to the Copilot Usage Dashboard will be displayed in your console. This dashboard is a web interface for monitoring your API usage.
- Start the server. For example, using npx:
npx @jeffreycao/copilot-api@latest start
- The server will output a URL to the usage viewer. Copy and paste this URL into your browser. It will look something like this:
http://localhost:4141/usage-viewer?endpoint=http://localhost:4141/usage- If you use the
start.batscript on Windows, this page will open automatically.
- If you use the
The dashboard provides a user-friendly interface to view your Copilot usage data:
Token usage history requires Bun or Node.js >= 22.13.0. On Node.js < 22.13.0, the server runs normally but token usage storage is disabled.
- API Endpoint URL: The dashboard is pre-configured to fetch data from your local server endpoint via the URL query parameter. You can change this URL to point to any other compatible API endpoint.
- Fetch Data: Click the "Fetch" button to load or refresh the usage data. The dashboard will automatically fetch data on load.
- Usage Quotas: View a summary of your usage quotas for different services like Chat and Completions, displayed with progress bars for a quick overview.
- Detailed Information: See the full JSON response from the API for a detailed breakdown of all available usage statistics.
- URL-based Configuration: You can also specify the API endpoint directly in the URL using a query parameter. This is useful for bookmarks or sharing links. For example:
http://localhost:4141/usage-viewer?endpoint=http://your-api-server/usage
The project can be run from source in several ways:
bun run dev startbun run start start- To avoid hitting GitHub Copilot's rate limits, you can use the following flags:
--manual: Enables manual approval for each request, giving you full control over when requests are sent.--rate-limit <seconds>: Enforces a minimum time interval between requests. For example,copilot-api start --rate-limit 30will ensure there's at least a 30-second gap between requests.--wait: Use this with--rate-limit. It makes the server wait for the cooldown period to end instead of rejecting the request with an error. This is useful for clients that don't automatically retry on rate limit errors.
- If you have a GitHub business or enterprise plan account with Copilot, use the
--account-typeflag (e.g.,--account-type business). See the official documentation for more details.
To add these reminders manually, include the following in CLAUDE.md for Claude Code, or AGENTS.md for opencode/codex:
- Prohibited from directly asking questions to users, MUST use question tool.
- Once you can confirm that the task is complete, MUST use question tool to make user confirm. The user may respond with feedback if they are not satisfied with the result, which you can use to make improvements and try again, after try again, MUST use question tool to make user confirm again.


