Muxa — Universal LLM Proxy

Muxa is a self-hosted proxy that presents Anthropic- and OpenAI-compatible APIs so IDE/CLI tooling (Claude Code, Cursor, Codex, Continue, Copilot, etc.) can run against your choice of provider—cloud or local—without touching client settings.

Why Muxa?

One URL for many providers – Point every client at http://localhost:8081, then change providers (OpenRouter, Azure, Databricks, Ollama, etc.) centrally via .env.
Auto routing & fallback – Send simple prompts to a local Ollama model, heavy workloads to a cloud model, and fall back automatically when a provider fails.
Token optimization – Prompt cache, semantic cache, memory injection, and headroom compression operate server-side so all clients enjoy reduced latency/cost.
Observability + policy controls – Built-in load shedding, circuit breaker, structured logs, and Prometheus endpoints give production visibility; policy guards enforce workspace/host/git/test rules.
Advanced providers – Some tools only speak OpenAI/Anthropic; Muxa converts that traffic to providers they don’t natively support (OpenRouter, Ollama, MLX).
Easy rollouts – Update .env once; every IDE routed through Muxa immediately uses the new provider/policy set.

Quick Start

npm (Recommended)

# Install globally
npm install -g @thelogicatelier/muxa

# Create a .env with your provider key (see docs for all options)
muxa                  # proxy listens on http://localhost:8081

Or run instantly without installing:

npx @thelogicatelier/muxa

From Source

git clone https://github.com/achatt89/muxa.git
cd muxa
npm install
cp .env.example .env  # fill in OPENAI_API_KEY, OPENROUTER_API_KEY, etc.
npm start             # proxy listens on http://localhost:8081

Docker

The example below uses OpenAI, but you can substitute any supported provider (Anthropic, OpenRouter, Ollama, Databricks, Azure, etc.):

docker build -t muxa .
docker run --rm -p 8081:8081 \
  -e MUXA_PRIMARY_PROVIDER=openai \
  -e OPENAI_API_KEY=sk-your-key \
  muxa:latest

Homebrew (macOS)

brew tap thelogicatelier/muxa https://github.com/achatt89/muxa.git
brew install muxa
muxa --help

Multi-Provider Routing Modes

Mode	Description
`single`	All requests go to `MUXA_PRIMARY_PROVIDER`. Use this when you only want one provider.
`hybrid`	Muxa evaluates request “complexity” (prompt length, tool use, etc.) and routes high-cost calls to `MUXA_FALLBACK_PROVIDER`. If the primary fails, the fallback is also used.

Example .env:

MUXA_ROUTING_STRATEGY=hybrid
MUXA_PRIMARY_PROVIDER=openrouter
MUXA_FALLBACK_PROVIDER=anthropic
OPENROUTER_API_KEY=sk-or-...
ANTHROPIC_API_KEY=sk-ant-...

Use single mode if you don’t need fallback.

Token Optimization & Memory

Enable caches and memory to cut token usage:

MUXA_PROMPT_CACHE_ENABLED=true
MUXA_PROMPT_CACHE_TTL_MS=120000
MUXA_SEMANTIC_CACHE_ENABLED=true
MUXA_SEMANTIC_CACHE_THRESHOLD=0.9
MUXA_MEMORY_ENABLED=true
MUXA_MEMORY_TOPK=3
MUXA_HEADROOM_ENABLED=true
MUXA_HEADROOM_MODE=optimize

Prompt cache instantly returns repeated prompts.
Semantic cache reuses answers for similar prompts (requires embeddings provider—see docs/embeddings.md).
Memory store injects top-K memories into each request.
Headroom exposes /metrics/compression and /headroom/* to track savings.

Variable descriptions:

Variable	Purpose
`MUXA_PROMPT_CACHE_ENABLED`	Enable exact match cache; repeated prompts return instantly.
`MUXA_PROMPT_CACHE_TTL_MS`	Time-to-live (milliseconds) for prompt cache entries.
`MUXA_SEMANTIC_CACHE_ENABLED`	Enable semantic (embeddings-based) cache. Set to `true` once embeddings are configured.
`MUXA_SEMANTIC_CACHE_THRESHOLD`	Cosine similarity threshold (0-1) for semantic cache hits.
`MUXA_MEMORY_ENABLED`	Enable long-term memory extraction/storage.
`MUXA_MEMORY_TOPK`	Number of memories injected into each request when relevant.
`MUXA_HEADROOM_ENABLED`	Enable headroom sidecar/compression pipeline.
`MUXA_HEADROOM_MODE`	`audit` (record metrics only) or `optimize` (mutate/compress history).

Client Overrides (Cursor, Claude, Codex, Copilot)

Start Muxa (npm or Docker as shown above).
Point clients at Muxa:

Client	Configuration
Claude Code CLI	`ANTHROPIC_BASE_URL=http://localhost:8081 ANTHROPIC_API_KEY=sk-muxa claude "Prompt"` (or export those vars once before running).
Cursor IDE	Settings → Features → Models → Base URL `http://localhost:8081/v1`, API key `sk-muxa`, select the model configured in `.env`. For `@Codebase`, enable embeddings docs/embeddings.md.
OpenAI Codex CLI	`codex -c model_provider='"muxa"' -c model='"gpt-5.2-codex"' -c 'model_providers.muxa={name="Muxa Proxy",base_url="http://localhost:8081/v1",wire_api="responses",api_key="sk-muxa"}'` (Muxa maps this to your local primary provider model).
GitHub Copilot	`export GITHUB_COPILOT_PROXY_URL=http://localhost:8081/v1`, `export GITHUB_COPILOT_PROXY_KEY=dummy`, restart the editor (Works for VS Code / JetBrains).
Cline / Continue / ClawdBot / other OpenAI-compatible tools	Set their custom OpenAI endpoint to `http://localhost:8081/v1` with API key `sk-muxa` and use your desired model name.

macOS launchctl environment helpers

When you launch IDEs via Spotlight or the Dock, macOS ignores shell exports. Persist API keys for GUI apps (VS Code, Cursor, Claude CLI, etc.) with launchctl:

# Persist environment variables for GUI-launched apps
## NO NEED to provide the actual key - just needs a dummy value so that openAI/Anthropic don't complain). The actual keys are read from the .env file
launchctl setenv OPENAI_API_KEY sk-muxa
launchctl setenv ANTHROPIC_API_KEY sk-muxa
launchctl setenv MUXA_BASE_URL http://localhost:8081

# Inspect current values
launchctl getenv OPENAI_API_KEY
launchctl getenv ANTHROPIC_API_KEY

# Remove when rotating credentials
launchctl unsetenv OPENAI_API_KEY
launchctl unsetenv ANTHROPIC_API_KEY

After setting values, quit/reopen the IDE so it inherits the updated environment.

Observability & Diagnostics

/dashboard – lightweight HTML dashboard showing health, metrics, routing, compression, and headroom status (auto-refreshing)
/health, /health/live, /health/ready – readiness probes
/routing/stats, /debug/session, /v1/agents/* – routing + agent diagnostics
/metrics, /metrics/prometheus, /metrics/compression, /metrics/semantic-cache – Prometheus-ready metrics, headroom/semantic cache stats
/health/headroom, /headroom/status, /headroom/restart, /headroom/logs – headroom lifecycle

Cost Optimization & Multi-Model Strategy

Muxa’s proxy lives between every IDE and the upstream provider, so optimization happens once and benefits everything:

Caching chain – Prompt cache handles exact repeat prompts instantly; semantic cache (with embeddings) reuses near-duplicates; the memory store injects curated snippets for long-running projects.
Headroom compression – Enable MUXA_HEADROOM_ENABLED=true to shrink chat histories and reduce token spend while keeping context intact—inspect savings via /metrics/compression.
Hybrid routing – Set MUXA_ROUTING_STRATEGY=hybrid with a fallback provider to auto-route tool-heavy or long prompts to a premium model while keeping cheap models for short requests. Model aliases/fallbacks map IDE-only IDs (e.g., gpt-5.3-codex) to real upstream SKUs without touching editor config.
Instrumentation loop – /api/tokens/stats, /metrics/semantic-cache, /routing/stats, and the dashboard show exactly when caches hit or fallback routing triggers so you can prove savings to the team.

Best results happen when you warm caches (replay tests or common prompts), keep session_id/user identifiers consistent, and leave the proxy running for a day or two so semantic cache + headroom gather enough data. See docs/cost-optimization.md for the full playbook, configuration examples, timelines, and troubleshooting tips.

For a deep dive into how hybrid routing scores requests and auto-switches between providers, see docs/routing.md.

Embeddings & @Codebase Support

See docs/embeddings.md for Ollama, llama.cpp, OpenRouter, and OpenAI embeddings configuration. Example (Ollama):

ollama pull nomic-embed-text
export MUXA_SEMANTIC_CACHE_ENABLED=true
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_EMBEDDINGS_MODEL=nomic-embed-text
npm start

Testing

npm test                    # 90+ suites covering API/provider/integration/perf
node scripts/endpoint-parity-preflight.js

Docker Compose Example

See docker-compose.example.yml for a sample proxy + Ollama stack.

Additional Documentation

Detailed GitBook-style docs live under docs/:

docs/README.md — table of contents
docs/cost-optimization.md — cost optimization playbook
docs/setup.md — installation/config cheat sheet
docs/providers.md — provider-specific notes
docs/observability.md — endpoints + dashboard
docs/embeddings.md — embeddings/@Codebase setup
docs/routing.md — hybrid routing & cost optimization

Built by The Logic Atelier

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
.husky		.husky
HomebrewFormula		HomebrewFormula
bin		bin
docs		docs
packages/client-mapping		packages/client-mapping
scripts		scripts
src		src
test		test
.dockerignore		.dockerignore
.env.example		.env.example
.eslintignore		.eslintignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
Dockerfile		Dockerfile
README.md		README.md
_config.yml		_config.yml
codex-manifest.md		codex-manifest.md
codex-tool-payload.json		codex-tool-payload.json
docker-compose.example.yml		docker-compose.example.yml
last-tool-payload.json		last-tool-payload.json
nodemon.json		nodemon.json
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Muxa — Universal LLM Proxy

Why Muxa?

Quick Start

npm (Recommended)

From Source

Docker

Homebrew (macOS)

Multi-Provider Routing Modes

Token Optimization & Memory

Client Overrides (Cursor, Claude, Codex, Copilot)

macOS launchctl environment helpers

Observability & Diagnostics

Cost Optimization & Multi-Model Strategy

Embeddings & @Codebase Support

Testing

Docker Compose Example

Additional Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Muxa — Universal LLM Proxy

Why Muxa?

Quick Start

npm (Recommended)

From Source

Docker

Homebrew (macOS)

Multi-Provider Routing Modes

Token Optimization & Memory

Client Overrides (Cursor, Claude, Codex, Copilot)

macOS launchctl environment helpers

Observability & Diagnostics

Cost Optimization & Multi-Model Strategy

Embeddings & @Codebase Support

Testing

Docker Compose Example

Additional Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages