Claudia Router is a lightweight local Anthropic-compatible proxy that routes Claude-style /v1/messages requests to OpenAI-compatible model providers.
It is useful for experimenting with Claude-style coding workflows while using backends such as NVIDIA NIM, OpenRouter, LM Studio, Groq, DeepSeek, or local models.
- Accepts Anthropic-style
/v1/messagesrequests - Converts them to OpenAI-compatible
/chat/completions - Routes requests based on model aliases
- Returns Anthropic-style responses
- Converts Anthropic tool definitions/results to OpenAI-compatible function tools
- Logs backend, model, latency, and errors
- Streaming
- Full Claude Code compatibility guarantee
- Prompt caching
- Vision inputs
- Multimodal content
Prerequisites:
- Node.js 18 or newer
- Claude Code CLI installed as
claude - NVIDIA API key with access to NVIDIA hosted endpoints
git clone https://github.com/ChunkyMonkey11/Claudia-Router.git
cd Claudia-Router
npm install
npm run setupEdit .env and set:
NVIDIA_API_KEY=your_nvidia_key_hereStart the router:
npm run devOptional: install the Claude wrapper command locally:
npm linkTest the server:
curl http://localhost:8082/healthSend a Claude-style request:
curl http://localhost:8082/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: dummy" \
-d '{
"model": "claude-3-5-sonnet-latest",
"max_tokens": 512,
"messages": [
{
"role": "user",
"content": "Write a TypeScript function that reverses a string."
}
]
}'For Claude Code, keep the router running and start Claude in another terminal:
npm run claude:fastAfter npm link, you can use the wrapper from any coding project:
cd /path/to/your/project
claudia-claudeThe wrapper runs the normal claude CLI with ANTHROPIC_BASE_URL, dummy auth, and the default router model already set. Pass normal Claude Code arguments as usual:
claudia-claude --model claude-3-5-sonnet-glmThe fast script and default wrapper route claude-3-5-sonnet-latest to NVIDIA stepfun-ai/step-3.5-flash. Use npm run claude:glm for the slower GLM quality profile, or npm run claude:smoke to test routing with the smallest configured model.
V1 is configured to use NVIDIA NIM by default.
- Put your NVIDIA API key in
.env:
NVIDIA_API_KEY=your_nvidia_key_here
CLAUDIA_CONFIG=./config.json
LOG_LEVEL=info-
Keep
defaultBackendset tonvidiainconfig.json. -
Use a mapped Claude-style model alias such as
claude-3-5-sonnet-latest, or send any model name and Claudia Router will use the NVIDIA backend default model.
{
"port": 8082,
"defaultBackend": "nvidia",
"backends": {
"nvidia": {
"baseUrl": "https://integrate.api.nvidia.com/v1",
"apiKeyEnv": "NVIDIA_API_KEY",
"defaultModel": "stepfun-ai/step-3.5-flash"
},
"openrouter": {
"baseUrl": "https://openrouter.ai/api/v1",
"apiKeyEnv": "OPENROUTER_API_KEY",
"defaultModel": "qwen/qwen-2.5-coder-32b-instruct"
},
"local": {
"baseUrl": "http://localhost:1234/v1",
"apiKeyEnv": "LOCAL_API_KEY",
"defaultModel": "local-model"
}
},
"modelProfiles": {
"claude-3-5-sonnet-latest": {
"backend": "nvidia",
"providerModel": "stepfun-ai/step-3.5-flash",
"retryAttempts": 3,
"retryBaseDelayMs": 500,
"notes": "Fast NVIDIA coding profile",
"capabilities": {
"toolCalls": true,
"coding": true
}
},
"claude-opus-4-1": {
"backend": "nvidia",
"providerModel": "z-ai/glm4.7",
"retryAttempts": 3,
"retryBaseDelayMs": 500,
"extraBody": {
"chat_template_kwargs": {
"enable_thinking": true,
"clear_thinking": false
}
},
"notes": "Higher-quality GLM coding profile; slower because thinking is enabled",
"capabilities": {
"toolCalls": true,
"coding": true
}
},
"claude-3-5-sonnet-glm": {
"backend": "nvidia",
"providerModel": "z-ai/glm4.7",
"retryAttempts": 3,
"retryBaseDelayMs": 500,
"extraBody": {
"chat_template_kwargs": {
"enable_thinking": true,
"clear_thinking": false
}
},
"notes": "Explicit GLM 4.7 profile for harder coding tasks",
"capabilities": {
"toolCalls": true,
"coding": true
}
},
"claude-3-5-sonnet-qwen": {
"backend": "nvidia",
"providerModel": "qwen/qwen3.5-122b-a10b",
"retryAttempts": 3,
"retryBaseDelayMs": 500,
"notes": "Qwen fallback NVIDIA coding profile",
"capabilities": {
"toolCalls": true,
"coding": true
}
},
"claude-3-haiku-latest": {
"backend": "nvidia",
"providerModel": "nvidia/nemotron-mini-4b-instruct",
"retryAttempts": 1,
"retryBaseDelayMs": 250,
"notes": "Smoke-test/free-small NVIDIA profile",
"capabilities": {
"toolCalls": false,
"coding": false
}
}
},
"modelMap": {
"legacy-claude-3-5-sonnet-latest": {
"backend": "nvidia",
"model": "stepfun-ai/step-3.5-flash"
}
}
}modelProfiles is the preferred model routing config. Each key is the incoming Claude-style model alias; backend selects a configured backend, and providerModel is the model sent to that provider. If both modelProfiles and legacy modelMap contain the same alias, modelProfiles wins. modelMap remains supported for simple { "backend", "model" } mappings.
retryAttempts is the total number of provider attempts for that profile, including the first request. retryBaseDelayMs is the exponential backoff base delay between retryable provider responses. extraBody is merged into the provider chat-completions JSON body for model-specific options such as NVIDIA chat_template_kwargs. capabilities is advisory metadata for humans and future routing policy; it does not force provider behavior.
The default config includes the NVIDIA free/shared endpoint profiles that have worked well with Claude Code-style loops:
| Router alias | NVIDIA model | Intended use |
|---|---|---|
claude-3-5-sonnet-latest |
stepfun-ai/step-3.5-flash |
Fast default for day-to-day coding/tool loops |
claude-opus-4-1 |
z-ai/glm4.7 |
Slower quality profile with GLM thinking enabled |
claude-3-5-sonnet-glm |
z-ai/glm4.7 |
Explicit GLM 4.7 alias for harder coding tasks |
claude-3-5-sonnet-qwen |
qwen/qwen3.5-122b-a10b |
Alternate/fallback coding model |
claude-3-haiku-latest |
nvidia/nemotron-mini-4b-instruct |
Smoke test for auth, routing, retries, and simple requests |
Use the fast default:
npm run claude:fastOr from any project after npm link:
claudia-claudeSwitch to GLM for a harder task:
npm run claude:glmOr:
claudia-claude --model claude-3-5-sonnet-glmSmoke-test the router:
npm run claude:smoke{
"model": "claude-3-5-sonnet-latest",
"max_tokens": 1024,
"temperature": 0.2,
"system": "You are a helpful coding assistant.",
"messages": [
{
"role": "user",
"content": "Write a Python function for binary search."
}
]
}Text content blocks are also supported:
{
"role": "user",
"content": [
{
"type": "text",
"text": "Fix this code."
}
]
}Non-text blocks are ignored safely in V1.
Claudia Router converts Anthropic tool-use shapes to OpenAI-compatible function calling:
- Request
toolsbecome OpenAItoolswithtype: "function" - Request
tool_choicebecomes OpenAItool_choice - Assistant
tool_useblocks become OpenAI assistanttool_calls - User
tool_resultblocks become OpenAItoolmessages - Provider
tool_callsbecome Anthropictool_useresponse blocks
This is the compatibility layer Claude Code needs before it can execute local tools such as file reads and writes. Actual tool-call quality depends on the selected backend model and whether that provider supports OpenAI-style function calling.
Start the router:
npm run devLaunch Claude Code against the local router:
ANTHROPIC_BASE_URL=http://localhost:8082 \
ANTHROPIC_AUTH_TOKEN=dummy \
ANTHROPIC_MODEL=claude-3-5-sonnet-latest \
ANTHROPIC_DEFAULT_SONNET_MODEL=claude-3-5-sonnet-latest \
ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-3-5-sonnet-latest \
claude --model claude-3-5-sonnet-latestThen ask Claude Code to do a safe file operation:
Create hello.md with a one-line greeting.
If the file is created, the gateway can route a complete Claude Code tool loop for that operation. If Claude Code only prints a fake Write(...) call, inspect the router and provider logs to confirm whether the backend returned structured OpenAI tool_calls.
Any provider with an OpenAI-compatible /chat/completions endpoint should work. The example config includes:
- NVIDIA NIM
- OpenRouter
- LM Studio or another local OpenAI-compatible server
Groq, DeepSeek, Ollama OpenAI-compatible mode, and similar providers can be added by creating another backend entry.
Claudia Router logs one line per request with request ID, backend, source model, target model, latency, and status.
Prompts are not logged by default.
npm run typecheck
npm test
npm run buildBefore tagging a release:
- Run
npm run typecheck,npm test, andnpm run build. - Verify
/healthreturns200. - Verify a plain
/v1/messagesrequest returns an Anthropic-style message. - Verify Claude Code can create and edit a small file through the router.
- Rotate any provider keys that were ever committed, pasted into logs, or shared during testing.
- Text-only
- Non-streaming only
- No vision
- No perfect Claude Code compatibility guarantee
- Provider quality depends on selected model
- Streaming support
- Broader Claude Code compatibility tests
- Request replay logs
- Cost estimation
- Provider fallback
- Simple web dashboard
- Policy layer for blocking risky actions
- Prompt redaction
- Team config
- Claude Code-specific compatibility tests