MultiRouter AI

OpenAI-Compatible AI Gateway with Multi-Account Load Balancing

Route chat completions across multiple AI providers and accounts with automatic failover. Works with Cursor, Windsurf, Claude Code, OpenCode, Continue.dev, Aider, and any OpenAI-compatible client.

Getting Started · IDE Setup · API Reference · Configuration

Why MultiRouter AI?

Most AI providers offer generous free tiers — Groq, Cerebras, and Gemini each give you thousands of free API calls. The problem? Once you hit the limit on one key, your IDE stops working.

MultiRouter AI solves this by exposing an OpenAI-compatible API that routes requests across every provider instance you configure. Stack multiple free-tier accounts, and when one runs out, the next picks up automatically.

OpenAI API Compatible — Drop-in replacement for any tool that speaks the OpenAI protocol (/v1/chat/completions, /v1/models).
Multi-Account Stacking — Add 5 Groq free-tier keys and exhaust them one by one. When all 5 are depleted, failover to Cerebras, then Gemini, then OpenAI.
Automatic Failover — Rate-limited providers are temporarily disabled and re-enabled after a configurable cooldown.
Works with Every IDE — Cursor, Windsurf, Claude Code, OpenCode, Continue.dev, Aider, Open WebUI — anything that can point to a custom OpenAI base URL.
Custom Endpoints — Supports Ollama, LM Studio, vLLM, and any OpenAI-compatible API via the openai-compatible provider type.

Supported Providers

Provider	Type	OpenAI-Compatible	Streaming
Groq	`groq`	Yes	Yes
Cerebras	`cerebras`	Yes	Yes
OpenAI	`openai`	Yes	Yes
OpenRouter	`openrouter`	Yes	Yes
Google Gemini	`gemini`	Native SDK	Yes
Ollama / LM Studio / vLLM	`openai-compatible`	Yes	Yes

Any provider that implements the OpenAI chat completions API can be added using the openai-compatible type with a custom base_url.

Quick Start

Prerequisites

Node.js v20 or higher
pnpm (recommended) or npm

1. Clone & install

git clone https://github.com/Mykle23/MultiRouter-AI.git
cd MultiRouter-AI
pnpm install

2. Configure

cp .env.example .env
cp providers.yaml.example providers.yaml

Edit providers.yaml and paste your API keys directly:

providers:
  - id: groq-1
    type: groq
    api_key: gsk_your_key_here      # ← paste your real key
    models:
      - llama-3.3-70b-versatile
    priority: 1

Instances without a valid API key are automatically skipped. Optionally edit .env to set a gateway auth token, port, or rate limit.

3. Start the server

pnpm dev      # Development (hot reload + pretty logs)
pnpm start    # Production

4. Test it

# OpenAI-compatible endpoint (streaming)
curl -N http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "stream": true,
    "messages": [
      { "role": "user", "content": "Hello!" }
    ]
  }'

# List available models
curl http://localhost:3000/v1/models

(back to top)

IDE Setup

MultiRouter AI is a drop-in OpenAI proxy. Point your IDE to http://localhost:3000/v1 and it works.

Cursor

Open Settings → Models → OpenAI API Key
Set Base URL to http://localhost:3000/v1
Set API Key to the value of your API_KEY env var (or any string if auth is disabled)
Add your desired models (e.g. llama-3.3-70b-versatile, gemini-2.5-flash, multirouter-auto)

Windsurf

Open Settings → AI Provider
Select OpenAI Compatible
Set Base URL to http://localhost:3000/v1
Set the API key and model name

Claude Code / OpenCode / Continue.dev / Aider

All support custom OpenAI base URLs. Set:

OPENAI_BASE_URL=http://localhost:3000/v1
OPENAI_API_KEY=your-api-key

Special Models

Model Name	Behaviour
`multirouter-auto`	Round-robin across ALL providers — maximises free-tier usage
Any real model name	Exhaust strategy — tries each provider that has this model in priority order

(back to top)

API Reference

`POST /v1/chat/completions`

OpenAI-compatible chat completions endpoint. Supports both streaming and non-streaming responses.

Request Body — identical to the OpenAI API:

{
  "model": "llama-3.3-70b-versatile",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "stream": true,
  "temperature": 0.7,
  "max_completion_tokens": 4096
}

Streaming response — Server-Sent Events in OpenAI format:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"llama-3.3-70b-versatile","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

`GET /v1/models`

Returns all available models in OpenAI format. Includes the virtual multirouter-auto model.

{
  "object": "list",
  "data": [
    { "id": "multirouter-auto", "object": "model", "created": 1700000000, "owned_by": "multirouter" },
    { "id": "llama-3.3-70b-versatile", "object": "model", "created": 1700000000, "owned_by": "groq" },
    { "id": "gemini-2.5-flash", "object": "model", "created": 1700000000, "owned_by": "gemini" }
  ]
}

`GET /health`

Returns server status with all provider instances, their status, and models.

(back to top)

Configuration

`providers.yaml`

This is the heart of MultiRouter AI. It defines your provider instances and routing strategy.

routing:
  default_strategy: exhaust        # exhaust | round-robin
  retry_after_seconds: 300         # Re-enable failed providers after 5 min

providers:
  # Free tiers first — they'll be tried top-to-bottom
  - id: groq-1
    type: groq
    api_key: gsk_your_first_key
    models:
      - llama-3.3-70b-versatile
      - llama-3.1-8b-instant

  - id: groq-2
    type: groq
    api_key: gsk_your_second_key
    models:
      - llama-3.3-70b-versatile

  # Paid providers last — only used when free tiers are exhausted
  - id: openai-1
    type: openai
    api_key: sk-your_openai_key
    models:
      - gpt-4o
      - gpt-4o-mini

  # Custom local endpoint
  - id: ollama
    type: openai-compatible
    base_url: http://localhost:11434/v1
    api_key: ollama
    models:
      - llama3.2

Security: providers.yaml contains secrets and is git-ignored. Commit only providers.yaml.example (no real keys).

Order matters: Providers are tried top-to-bottom. Put free tiers first and paid providers last.

Key concepts:

Field	Description
`id`	Unique identifier for this instance
`type`	Provider type: `openai`, `groq`, `cerebras`, `openrouter`, `gemini`, `openai-compatible`
`api_key`	API key — paste directly, or use `${ENV_VAR}` for Docker/CI
`base_url`	Custom endpoint URL (required for `openai-compatible`, optional for others)
`models`	List of models this instance can serve

`.env`

Server-level settings only. Provider keys go in providers.yaml.

Variable	Default	Description
`PORT`	`3000`	Server port
`NODE_ENV`	`development`	`development` / `production`
`LOG_LEVEL`	`info`	`debug`, `info`, `warn`, `error`
`API_KEY`	(empty)	Bearer token for gateway auth — leave empty to disable
`RATE_LIMIT_MAX`	`100`	Max requests per minute per IP — `0` to disable

Tip: For Docker or CI, you can reference env vars in providers.yaml with api_key: ${SECRET_NAME} instead of pasting keys directly.

(back to top)

Routing Strategies

Exhaust (default)

Stick with one provider instance until it's rate-limited, then failover to the next in order.

Request 1-50:    groq-1  ✓
Request 51:      groq-1  → 429 Rate Limit
                 groq-2  ✓  ← automatic failover
Request 51-100:  groq-2  ✓
Request 101:     groq-2  → 429 Rate Limit
                 groq-3  ✓  ← automatic failover
...
All exhausted:   → 429 "All providers for model exhausted"
After 5 min:     groq-1  recovers → back to first

Best for: Maximising free tiers. Each account's quota is fully used before moving to the next.

Round-Robin

Distribute requests evenly across all active provider instances.

Request 1: groq-1     (llama-3.3-70b)
Request 2: cerebras-1 (llama-3.3-70b)
Request 3: gemini-1   (gemini-2.5-flash)
Request 4: groq-1     (wraps around)

Best for: Load distribution when you don't mind mixing models. Use the multirouter-auto model name to activate round-robin from any IDE.

(back to top)

How It Works

                      ┌───────────────────┐
                      │   IDE / Client    │
                      │ Cursor, Windsurf  │
                      │ Claude Code, etc. │
                      └─────────┬─────────┘
                                │  POST /v1/chat/completions
                                │  { model, messages, stream }
                                ▼
                      ┌───────────────────┐
                      │  MultiRouter AI   │
                      │                   │
                      │  Auth → Rate Lim  │
                      │  → Route Select   │
                      └─────────┬─────────┘
                                │
               ┌────────────────┼────────────────┐
               │                │                │
        "multirouter-auto"   Real model      Model not
        (round-robin)        (exhaust)         found
               │                │                │
               ▼                ▼                ▼
          Next active     Instances in         404 error
          instance        YAML order
               │                │
               │         ┌──────┼──────┐
               │         ▼      ▼      ▼
               │      groq-1  groq-2  groq-3
               │      (1st)   (2nd)   (3rd)
               │         │
               │    Try groq-1 → 429? → Try groq-2 → OK!
               │                              │
               └──────────────┬───────────────┘
                              │  SSE stream (OpenAI format)
                              ▼
                      ┌───────────────────┐
                      │   IDE / Client    │
                      └───────────────────┘

Request arrives at /v1/chat/completions with a model name.
Routing: multirouter-auto triggers round-robin; any real model name triggers exhaust strategy.
Exhaust: Instances for the requested model are tried in the order they appear in providers.yaml.
Failover: If a provider returns 429/402/503, it's marked as rate-limited and the next instance is tried.
Recovery: After retry_after_seconds, failed instances are automatically re-enabled.
Response: Streamed back in OpenAI SSE format (data: {…}\n\n + data: [DONE]).

(back to top)

Tech Stack

Category	Technology
Runtime	Node.js 20+
Language	TypeScript 5.9 (strict mode)
Framework	Express 5
Configuration	YAML with `${ENV_VAR}` interpolation
Logging	Pino + pino-http
Security	Helmet, express-rate-limit
Provider SDKs	`openai` (for all OpenAI-compatible), `@google/generative-ai` (for Gemini)
Dev Tools	ESLint 9, tsx (hot reload)

License

Distributed under the MIT License. See LICENSE for details.

Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package.json		package.json
providers.yaml.example		providers.yaml.example
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiRouter AI

OpenAI-Compatible AI Gateway with Multi-Account Load Balancing

Why MultiRouter AI?

Table of Contents

Supported Providers

Quick Start

Prerequisites

1. Clone & install

2. Configure

3. Start the server

4. Test it

IDE Setup

Cursor

Windsurf

Claude Code / OpenCode / Continue.dev / Aider

Special Models

API Reference

`POST /v1/chat/completions`

`GET /v1/models`

`GET /health`

Configuration

`providers.yaml`

`.env`

Routing Strategies

Exhaust (default)

Round-Robin

How It Works

Tech Stack

License

About

Uh oh!

Releases

Packages

Languages

License

Mykle23/MultiRouter-AI

Folders and files

Latest commit

History

Repository files navigation

MultiRouter AI

OpenAI-Compatible AI Gateway with Multi-Account Load Balancing

Why MultiRouter AI?

Table of Contents

Supported Providers

Quick Start

Prerequisites

1. Clone & install

2. Configure

3. Start the server

4. Test it

IDE Setup

Cursor

Windsurf

Claude Code / OpenCode / Continue.dev / Aider

Special Models

API Reference

POST /v1/chat/completions

GET /v1/models

GET /health

Configuration

providers.yaml

.env

Routing Strategies

Exhaust (default)

Round-Robin

How It Works

Tech Stack

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`POST /v1/chat/completions`

`GET /v1/models`

`GET /health`

`providers.yaml`

`.env`

Packages