Skip to content

ChunkyMonkey11/Claudia-Router

Repository files navigation

Claudia Router

Claudia Router is a lightweight local Anthropic-compatible proxy that routes Claude-style /v1/messages requests to OpenAI-compatible model providers.

It is useful for experimenting with Claude-style coding workflows while using backends such as NVIDIA NIM, OpenRouter, LM Studio, Groq, DeepSeek, or local models.

What it does

  • Accepts Anthropic-style /v1/messages requests
  • Converts them to OpenAI-compatible /chat/completions
  • Routes requests based on model aliases
  • Returns Anthropic-style responses
  • Converts Anthropic tool definitions/results to OpenAI-compatible function tools
  • Logs backend, model, latency, and errors

What it does not do yet

  • Streaming
  • Full Claude Code compatibility guarantee
  • Prompt caching
  • Vision inputs
  • Multimodal content

Quick Start

Prerequisites:

  • Node.js 18 or newer
  • Claude Code CLI installed as claude
  • NVIDIA API key with access to NVIDIA hosted endpoints
git clone https://github.com/ChunkyMonkey11/Claudia-Router.git
cd Claudia-Router
npm install
npm run setup

Edit .env and set:

NVIDIA_API_KEY=your_nvidia_key_here

Start the router:

npm run dev

Optional: install the Claude wrapper command locally:

npm link

Test the server:

curl http://localhost:8082/health

Send a Claude-style request:

curl http://localhost:8082/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: dummy" \
  -d '{
    "model": "claude-3-5-sonnet-latest",
    "max_tokens": 512,
    "messages": [
      {
        "role": "user",
        "content": "Write a TypeScript function that reverses a string."
      }
    ]
  }'

For Claude Code, keep the router running and start Claude in another terminal:

npm run claude:fast

After npm link, you can use the wrapper from any coding project:

cd /path/to/your/project
claudia-claude

The wrapper runs the normal claude CLI with ANTHROPIC_BASE_URL, dummy auth, and the default router model already set. Pass normal Claude Code arguments as usual:

claudia-claude --model claude-3-5-sonnet-glm

The fast script and default wrapper route claude-3-5-sonnet-latest to NVIDIA stepfun-ai/step-3.5-flash. Use npm run claude:glm for the slower GLM quality profile, or npm run claude:smoke to test routing with the smallest configured model.

NVIDIA Setup

V1 is configured to use NVIDIA NIM by default.

  1. Put your NVIDIA API key in .env:
NVIDIA_API_KEY=your_nvidia_key_here
CLAUDIA_CONFIG=./config.json
LOG_LEVEL=info
  1. Keep defaultBackend set to nvidia in config.json.

  2. Use a mapped Claude-style model alias such as claude-3-5-sonnet-latest, or send any model name and Claudia Router will use the NVIDIA backend default model.

Config Example

{
  "port": 8082,
  "defaultBackend": "nvidia",
  "backends": {
    "nvidia": {
      "baseUrl": "https://integrate.api.nvidia.com/v1",
      "apiKeyEnv": "NVIDIA_API_KEY",
      "defaultModel": "stepfun-ai/step-3.5-flash"
    },
    "openrouter": {
      "baseUrl": "https://openrouter.ai/api/v1",
      "apiKeyEnv": "OPENROUTER_API_KEY",
      "defaultModel": "qwen/qwen-2.5-coder-32b-instruct"
    },
    "local": {
      "baseUrl": "http://localhost:1234/v1",
      "apiKeyEnv": "LOCAL_API_KEY",
      "defaultModel": "local-model"
    }
  },
  "modelProfiles": {
    "claude-3-5-sonnet-latest": {
      "backend": "nvidia",
      "providerModel": "stepfun-ai/step-3.5-flash",
      "retryAttempts": 3,
      "retryBaseDelayMs": 500,
      "notes": "Fast NVIDIA coding profile",
      "capabilities": {
        "toolCalls": true,
        "coding": true
      }
    },
    "claude-opus-4-1": {
      "backend": "nvidia",
      "providerModel": "z-ai/glm4.7",
      "retryAttempts": 3,
      "retryBaseDelayMs": 500,
      "extraBody": {
        "chat_template_kwargs": {
          "enable_thinking": true,
          "clear_thinking": false
        }
      },
      "notes": "Higher-quality GLM coding profile; slower because thinking is enabled",
      "capabilities": {
        "toolCalls": true,
        "coding": true
      }
    },
    "claude-3-5-sonnet-glm": {
      "backend": "nvidia",
      "providerModel": "z-ai/glm4.7",
      "retryAttempts": 3,
      "retryBaseDelayMs": 500,
      "extraBody": {
        "chat_template_kwargs": {
          "enable_thinking": true,
          "clear_thinking": false
        }
      },
      "notes": "Explicit GLM 4.7 profile for harder coding tasks",
      "capabilities": {
        "toolCalls": true,
        "coding": true
      }
    },
    "claude-3-5-sonnet-qwen": {
      "backend": "nvidia",
      "providerModel": "qwen/qwen3.5-122b-a10b",
      "retryAttempts": 3,
      "retryBaseDelayMs": 500,
      "notes": "Qwen fallback NVIDIA coding profile",
      "capabilities": {
        "toolCalls": true,
        "coding": true
      }
    },
    "claude-3-haiku-latest": {
      "backend": "nvidia",
      "providerModel": "nvidia/nemotron-mini-4b-instruct",
      "retryAttempts": 1,
      "retryBaseDelayMs": 250,
      "notes": "Smoke-test/free-small NVIDIA profile",
      "capabilities": {
        "toolCalls": false,
        "coding": false
      }
    }
  },
  "modelMap": {
    "legacy-claude-3-5-sonnet-latest": {
      "backend": "nvidia",
      "model": "stepfun-ai/step-3.5-flash"
    }
  }
}

modelProfiles is the preferred model routing config. Each key is the incoming Claude-style model alias; backend selects a configured backend, and providerModel is the model sent to that provider. If both modelProfiles and legacy modelMap contain the same alias, modelProfiles wins. modelMap remains supported for simple { "backend", "model" } mappings.

retryAttempts is the total number of provider attempts for that profile, including the first request. retryBaseDelayMs is the exponential backoff base delay between retryable provider responses. extraBody is merged into the provider chat-completions JSON body for model-specific options such as NVIDIA chat_template_kwargs. capabilities is advisory metadata for humans and future routing policy; it does not force provider behavior.

NVIDIA Model Set

The default config includes the NVIDIA free/shared endpoint profiles that have worked well with Claude Code-style loops:

Router alias NVIDIA model Intended use
claude-3-5-sonnet-latest stepfun-ai/step-3.5-flash Fast default for day-to-day coding/tool loops
claude-opus-4-1 z-ai/glm4.7 Slower quality profile with GLM thinking enabled
claude-3-5-sonnet-glm z-ai/glm4.7 Explicit GLM 4.7 alias for harder coding tasks
claude-3-5-sonnet-qwen qwen/qwen3.5-122b-a10b Alternate/fallback coding model
claude-3-haiku-latest nvidia/nemotron-mini-4b-instruct Smoke test for auth, routing, retries, and simple requests

Use the fast default:

npm run claude:fast

Or from any project after npm link:

claudia-claude

Switch to GLM for a harder task:

npm run claude:glm

Or:

claudia-claude --model claude-3-5-sonnet-glm

Smoke-test the router:

npm run claude:smoke

Claude-Style Request Shape

{
  "model": "claude-3-5-sonnet-latest",
  "max_tokens": 1024,
  "temperature": 0.2,
  "system": "You are a helpful coding assistant.",
  "messages": [
    {
      "role": "user",
      "content": "Write a Python function for binary search."
    }
  ]
}

Text content blocks are also supported:

{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "Fix this code."
    }
  ]
}

Non-text blocks are ignored safely in V1.

Tool Routing

Claudia Router converts Anthropic tool-use shapes to OpenAI-compatible function calling:

  • Request tools become OpenAI tools with type: "function"
  • Request tool_choice becomes OpenAI tool_choice
  • Assistant tool_use blocks become OpenAI assistant tool_calls
  • User tool_result blocks become OpenAI tool messages
  • Provider tool_calls become Anthropic tool_use response blocks

This is the compatibility layer Claude Code needs before it can execute local tools such as file reads and writes. Actual tool-call quality depends on the selected backend model and whether that provider supports OpenAI-style function calling.

Claude Code Gateway Test

Start the router:

npm run dev

Launch Claude Code against the local router:

ANTHROPIC_BASE_URL=http://localhost:8082 \
ANTHROPIC_AUTH_TOKEN=dummy \
ANTHROPIC_MODEL=claude-3-5-sonnet-latest \
ANTHROPIC_DEFAULT_SONNET_MODEL=claude-3-5-sonnet-latest \
ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-3-5-sonnet-latest \
claude --model claude-3-5-sonnet-latest

Then ask Claude Code to do a safe file operation:

Create hello.md with a one-line greeting.

If the file is created, the gateway can route a complete Claude Code tool loop for that operation. If Claude Code only prints a fake Write(...) call, inspect the router and provider logs to confirm whether the backend returned structured OpenAI tool_calls.

Supported Providers

Any provider with an OpenAI-compatible /chat/completions endpoint should work. The example config includes:

  • NVIDIA NIM
  • OpenRouter
  • LM Studio or another local OpenAI-compatible server

Groq, DeepSeek, Ollama OpenAI-compatible mode, and similar providers can be added by creating another backend entry.

Logging

Claudia Router logs one line per request with request ID, backend, source model, target model, latency, and status.

Prompts are not logged by default.

Development Checks

npm run typecheck
npm test
npm run build

Release Checklist

Before tagging a release:

  1. Run npm run typecheck, npm test, and npm run build.
  2. Verify /health returns 200.
  3. Verify a plain /v1/messages request returns an Anthropic-style message.
  4. Verify Claude Code can create and edit a small file through the router.
  5. Rotate any provider keys that were ever committed, pasted into logs, or shared during testing.

Limitations

  • Text-only
  • Non-streaming only
  • No vision
  • No perfect Claude Code compatibility guarantee
  • Provider quality depends on selected model

Roadmap

  • Streaming support
  • Broader Claude Code compatibility tests
  • Request replay logs
  • Cost estimation
  • Provider fallback
  • Simple web dashboard
  • Policy layer for blocking risky actions
  • Prompt redaction
  • Team config
  • Claude Code-specific compatibility tests

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors