Featherweight LLM Proxy for Windows
9 MB binary. 6 MB RAM. Native system tray. Zero config.
Proxy any Anthropic API request to Ollama or OpenAI-compatible backends.
You want to use Claude Desktop, Claude Code, or any Anthropic-compatible client with your own LLM backend — Ollama, OpenCode, or any OpenAI-compatible API. You need a simple proxy that translates the Anthropic Messages API into formats your backend understands, with a GUI to manage it without touching config files or terminals.
Prism is a single 9 MB executable that runs in your system tray, uses barely 6 MB of RAM, and just works. No Python, no Docker, no Node.js, no 500 MB of dependencies.
- Featherweight — 9 MB binary, 6 MB RAM. Compiles to a single static executable with no runtime dependencies
- System Tray GUI — Native Windows tray icon to start/stop the proxy, switch providers, set API keys, and edit model config
- Multi-Provider — Switch between Ollama Cloud, OpenCode Go, or any custom OpenAI-compatible endpoint in one click
- Run on Startup — Lives in your system tray, starts with Windows, always ready
- Full Anthropic API — Translates the Anthropic Messages API (including streaming, tool use, thinking blocks, and images) to Ollama
/api/chator OpenAI/v1/chat/completions - SSE Streaming — Real-time Server-Sent Events streaming for both Ollama and OpenAI backends
- Model Remapping — Automatically redirect hardcoded model names (like
claude-3-5-haiku) to your preferred model via a simple JSON config - Zero Config — Sensible defaults out of the box. Configure only what you need
| Prism | LiteLLM | Bifrost | Portkey | OpenRouter | |
|---|---|---|---|---|---|
| Binary size | 9 MB | ~200 MB+ (Python) | ~30 MB | Cloud only | Cloud only |
| RAM usage | ~6 MB | ~300 MB+ | ~50 MB | N/A | N/A |
| Language | Go | Python | Go | Node.js | — |
| Desktop GUI | Native system tray | CLI only | CLI only | Web dashboard | Web dashboard |
| Self-hosted | Yes | Yes | Yes | Optional | No |
| Windows native | Yes (.exe, tray) | Yes (Python) | Yes | No | No |
| Run on startup | Tray icon, zero config | Manual setup | Manual setup | N/A | N/A |
| Provider switching | One-click in tray | Edit config file | Edit config file | Web UI | Web UI |
| Model remapping | JSON config + tray editor | Model aliases | Config file | Guardrails | Auto-routing |
| Anthropic → Ollama | Built-in | Via config | Via config | No | No |
| Anthropic → OpenAI | Built-in | Via config | Via config | Via config | No |
| Setup time | Double-click | pip install + config | Build + config | Sign up + config | Sign up |
| Use case | Personal desktop proxy | Team/production gateway | Production gateway | Production observability | Quick multi-model access |
Prism is not an enterprise gateway. It's a personal desktop tool for developers who want their LLM proxy to be as invisible as possible — sitting in the tray, sipping RAM, and staying out of the way.
If you need production features like rate limiting, observability, fallback routing, or multi-user access, reach for LiteLLM, Bifrost, or Portkey. If you just want to point Claude Desktop at your Ollama server and forget about it — that's Prism.
go build -ldflags="-H windowsgui" -o prism.exe .Double-click prism.exe — it appears in your system tray and starts the proxy automatically.
Or run as a console server:
prism.exe --serveThe proxy listens on http://127.0.0.1:11434 by default.
| Menu Item | What It Does |
|---|---|
| Start / Stop / Restart | Control the proxy server |
| Provider | Switch between Ollama Cloud, OpenCode Go, or Custom |
| Edit Model Config | Open model_remapping.json in Notepad — proxy restarts on save |
| Open Folder | Open the config directory in Explorer |
| Show Logs | Live tail of proxy logs in a terminal |
| Set API Key | Set the API key for the active provider |
| Quit | Stop proxy and exit |
- Start Prism and pick your provider from the tray menu
- Open your Claude Desktop config (
developer_settings.jsonor the config editor) and set:
{
"inferenceProvider": "gateway",
"inferenceGatewayBaseUrl": "http://127.0.0.1:11434",
"inferenceGatewayApiKey": "prism",
"inferenceModels": [
{ "name": "glm-5.1:cloud" },
{ "name": "deepseek-v4-flash:cloud", "supports1m": true }
]
}- Done. Prism translates Anthropic Messages API calls to your backend format automatically.
AI clients like Claude Code hardcode model names for subagents (e.g. claude-3-5-haiku). When you're proxying through a non-Anthropic backend, those requests fail. Prism solves this with a simple JSON config.
Click Edit Model Config in the tray to open %APPDATA%\prism\model_remapping.json:
{
"default_model": "glm-5.1:cloud",
"known_models": [
"glm-5.1:cloud",
"deepseek-v4-flash:cloud",
"deepseek-v4-flash"
],
"aliases": {
"claude-3-5-haiku": "deepseek-v4-flash:cloud",
"claude-3-5-haiku-20241022": "deepseek-v4-flash:cloud"
}
}| Priority | Rule | Example |
|---|---|---|
| 1 | Exact alias | claude-3-5-haiku → deepseek-v4-flash:cloud |
| 2 | Known model | glm-5.1:cloud passes through unchanged |
| 3 | Default fallback | Any unknown model → glm-5.1:cloud |
After saving the file, the proxy restarts automatically to apply changes.
{
"active_provider": "ollama_cloud",
"ollama_cloud": {
"name": "Ollama Cloud",
"base_url": "https://ollama.com",
"api_key": ""
},
"opencode_go": {
"name": "OpenCode Go",
"base_url": "https://opencode.ai/zen/go",
"api_key": ""
},
"custom": {
"name": "Custom",
"base_url": "",
"api_key": ""
}
}| Variable | Default | Description |
|---|---|---|
OLLAMA_PROXY_PORT |
11434 |
Port to listen on |
OLLAMA_PROXY_HOST |
127.0.0.1 |
Host to bind to |
OLLAMA_API_KEY |
— | Fallback API key for Ollama Cloud |
OPENCODE_GO_API_KEY |
— | Fallback API key for OpenCode Go |
┌─────────────────┐ ┌─────────┐ ┌──────────────┐
│ Claude Desktop │ Anthropic API │ Prism │ Ollama / OpenAI │ Your LLM │
│ Claude Code │ ───────────────► │ Proxy │ ──────────────► │ Backend │
│ Any client │ /v1/messages │ (9 MB) │ │ │
└─────────────────┘ └─────────┘ └──────────────┘
Prism translates the Anthropic Messages API format into:
- Ollama —
POST /api/chatwith Ollama's chat format - OpenAI-compatible —
POST /v1/chat/completionswith OpenAI's chat format
Both streaming and non-streaming are supported. Tool use, thinking blocks, and images are translated accordingly.
├── main.go Entry point, CLI flags, HTTP server
├── proxy.go Core proxy logic, Anthropic → Ollama translation
├── openai.go Anthropic → OpenAI translation (non-streaming)
├── streaming.go SSE streaming for Ollama backend
├── openai_streaming.go SSE streaming for OpenAI backend
├── config.go Config loading, model remapping, provider management
├── models.go Request/response types for all API formats
├── tray.go Windows system tray UI and process management
├── icon.ico Application icon
└── go.mod Go module definition
MIT