RelayCode

Run Claude Code on GPT-5.5. Or DeepSeek. Or anything OpenAI-compatible.
Single-binary proxy. Upstream prompt caching. Up to ~98% input token reuse in one observed long-session workload

Why this exists · Install · Quickstart · How it works · Configuration · Providers · Security · FAQ · Troubleshooting

RelayCode sits between Claude Code and model backends. It accepts Anthropic Messages API requests at /v1/messages, routes each request by incoming Claude model name, then streams Anthropic-shaped SSE back to the client.

Supported upstream protocols:

OpenAI Chat Completions (openai_chat, POST /v1/chat/completions)
OpenAI Responses (openai_responses, POST /v1/responses)
Native Anthropic Messages passthrough (anthropic_messages, POST /v1/messages)

Common use: keep Claude Code UX while routing Opus/Sonnet/Haiku requests to OpenAI-compatible backends, DeepSeek-style chat endpoints, or Anthropic native routes.

Why this exists

Claude Code normally sends conversation history every request. Long agentic sessions get expensive fast: each tool cycle adds more transcript, and later requests can replay tens of thousands of input tokens.

RelayCode was built to test a different shape: keep Claude Code as the local agent UI, but route model calls through a Responses-style backend with upstream prompt caching.

During development, RelayCode itself was built while running Claude Code through a GPT-5.5 Responses route. Total model spend for that end-to-end build and compatibility pass was roughly under $150.

Observed token/cost behavior from that run:

Scenario	Observed result
First request in a session	~20k-30k input tokens
Later cached requests	usually ~1k-2k input tokens
Claude Code tool-compatibility test	~40 requests for about $2-$3
Metrics tracked	request count, cost, input/output tokens, cache read/write

Real RelayCode log from one session (abbreviated, showing the first full replay and later upstream cache hits via cached_tokens; when experimental previous_response_id is enabled, cache reuse appears as session_chain):

responses: full_replay provider=custom_provider_responses model=gpt-5.5 reason="codex-compatible http replay" prompt_cache=miss cached_tokens=0 input_tokens=24871 output_tokens=147 stop_reason=end_turn resp=resp_...
responses: full_replay provider=custom_provider_responses model=gpt-5.5 reason="codex-compatible http replay" prompt_cache=hit cached_tokens=24576 input_tokens=24994 output_tokens=43 stop_reason=tool_use resp=resp_...
responses: full_replay provider=custom_provider_responses model=gpt-5.5 reason="codex-compatible http replay" prompt_cache=hit cached_tokens=24576 input_tokens=25059 output_tokens=24 stop_reason=tool_use resp=resp_...
responses: full_replay provider=custom_provider_responses model=gpt-5.5 reason="codex-compatible http replay" prompt_cache=hit cached_tokens=26624 input_tokens=26946 output_tokens=69 stop_reason=end_turn resp=resp_...
responses: session_chain provider=custom_provider_responses model=gpt-5.5 prev=resp_... tail_messages=1 total_messages=12 cached_tokens=0 input_tokens=1119 output_tokens=30 stop_reason=end_turn resp=resp_...

Those numbers are workload/provider dependent, but the pattern is the point: once the stable prefix lands in upstream cache, later Claude Code turns stop paying full-history cost every request.

Screenshot — token/cost per request (click to expand)

Highlights

Single Go binary. No third-party Go dependencies.
Model-aware routing. Case-insensitive substring match on incoming Claude model names, plus required "*" fallback.
Streaming translation. Emits Anthropic SSE lifecycle with text, thinking, tool use, tool input deltas, stop reasons, and token counts where available.
Image blocks. Claude Code base64 image blocks translate to OpenAI Chat image_url parts and Responses input_image parts.
Responses cache keying. openai_responses sets prompt_cache_key from Claude Code metadata.user_id.session_id when present.
Claude Code fast paths. Optional local shortcuts for quota probe, command prefix detection, title generation, suggestion mode, and filepath extraction.
Local web server tools. Optional local handling for forced Anthropic web_search / web_fetch requests.
Debug stats. /debug/stats exposes in-memory session cache counters.

Install

Prebuilt binary (recommended)

Grab the latest release from GitHub Releases. Archives ship the relaycode binary, relaycode.example.yaml, README.md, and LICENSE. Per-archive sha256 files are published alongside the assets.

Linux / macOS:

VERSION=1.5.0
curl -L -o relaycode.tar.gz \
  "https://github.com/5nYqnHvk/RelayCode/releases/download/${VERSION}/relaycode-${VERSION}-linux-amd64.tar.gz"
tar -xzf relaycode.tar.gz
./relaycode
# If relaycode.yaml is missing, RelayCode writes one from its embedded example.
# Interactive terminals can choose continue/exit; non-interactive runs exit after writing.

Windows: download the matching *.zip, unzip, then run relaycode.exe.

Build from source

go build -o relaycode ./cmd/relaycode

Quickstart

go build -o relaycode ./cmd/relaycode
./relaycode
# If relaycode.yaml is missing, RelayCode writes one from its embedded example.
# Interactive terminals can choose continue/exit; non-interactive runs exit after writing.

Edit relaycode.yaml, set provider keys, then run:

export OPENAI_API_KEY=sk-...
./relaycode -config relaycode.yaml

Point Claude Code at RelayCode:

export ANTHROPIC_BASE_URL=http://127.0.0.1:8080
export ANTHROPIC_API_KEY=freecc   # match server.network.auth_token when configured
export CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1
claude

If server.network.auth_token is empty, RelayCode does not require client auth. Set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 or Claude Code's /model picker will not load RelayCode's custom model list. RelayCode also supports live config reload on SIGHUP; host/port and Responses session-store path changes still require a restart.

Health check:

curl http://127.0.0.1:8080/health

How it works

┌──────────────┐   Anthropic        ┌──────────────┐   OpenAI/Anthropic   ┌──────────────┐
│ Claude Code  │ ─── /v1/messages ─▶│  RelayCode   │── chat/responses ───▶│  upstream    │
│   client     │ ◀─── SSE stream ───│              │◀── SSE stream ──────│  provider    │
└──────────────┘                    └──────────────┘                      └──────────────┘

Per request, RelayCode:

Decodes Anthropic Messages request body.
Runs enabled Claude Code fast-path optimizations when request shape matches.
Resolves route from routes[] using incoming model.
Handles forced local web_search / web_fetch when enabled.
Builds provider-specific upstream request.
Streams upstream SSE back as Anthropic SSE.
Updates in-memory Responses session stats when usage data exists.

Repository layout

cmd/relaycode/                  entrypoint, config flag, signal shutdown
internal/anthropic/             Anthropic request/content types and helpers
internal/config/                stdlib-only YAML subset loader
internal/optim/                 Claude Code fast-path response shortcuts
internal/provider/              adapter interfaces, HTTP/SSE helpers
internal/provider/anthropic/    native Anthropic Messages passthrough adapter
internal/provider/chat/         OpenAI Chat Completions adapter
internal/provider/responses/    OpenAI Responses adapter
internal/router/                model route resolver
internal/server/                HTTP ingress, auth, /health, /debug/stats
internal/session/               in-memory Responses cache/stat store
internal/sse/                   Anthropic SSE writer/builder
internal/streamparse/           thinking/tool-call text parsers
internal/webtools/              local web_search/web_fetch implementation

Configuration

relaycode.example.yaml:

server:
  network:
    host: 127.0.0.1
    port: 8080
    auth_token: ""   # when non-empty, clients must send matching x-api-key / Authorization
  web_tools:
    # Local Anthropic web_search/web_fetch handler. Disabled by default because it
    # performs outbound HTTP from the proxy. Runs only when tool_choice forces it.
    enable: false
    allowed_schemes: http,https
    allow_private_networks: false
  claude_code:
    # Claude Code fast-path optimizations. Disable individually for debugging.
    fast_prefix_detection: true
    enable_network_probe_mock: true
    enable_title_generation_skip: true
    enable_suggestion_mode_skip: true
    enable_filepath_extraction_mock: true
  logging:
    log_request_snapshots: false   # safe shape-only request logs; no raw prompt text
    compact_tool_results: false    # compact long tool output before replaying upstream
  updates:
    enable_notification: false # check GitHub latest release tag on startup
    # check_url: https://api.github.com/repos/5nYqnHvk/RelayCode/releases/latest
    # check_timeout_seconds: 3
  responses:
    session_store_path: "" # optional durable Responses session/cache metadata JSON

routes:
  - match: "claude/opus-4-7"
    provider: openai_responses
    model: gpt-5.5
  - match: "claude/sonnet-4-6"
    provider: openai_responses
    model: gpt-5.4
  - match: "claude/haiku-4-5"
    provider: openai_responses
    model: gpt-5.4
  - match: "claude/test"
    provider: openai_responses
    model: gpt-5.4
  - match: "*"
    provider: openai_responses
    model: gpt-5.4

tool_validation:
  unknown_tools: drop
  invalid_known_tools: warn
  malformed_arguments: repair

providers:
  openai_responses:
    kind: openai_responses                 # POST /v1/responses
    endpoint:
      base_url: https://api.openai.com/v1
      api_key: "${OPENAI_API_KEY}"
      # Codex auth.json; overrides api_key when set.
      # codex_auth_path: /home/you/.codex/auth.json
    http:
      # Upstream request timeout in seconds.
      # timeout_seconds: 300
      # HTTP proxy URL.
      # proxy: "${HTTPS_PROXY}"
      # Max retry count for upstream requests.
      # max_retries: 2
      # Max parallel upstream requests.
      # max_concurrency: 4
    experimental:
      # Use non-Codex response chaining instead of replaying the full prefix.
      # previous_response_id: false
      # Pass server tools upstream without translation.
      # passthrough_server_tools: true
    responses:
      # Use native Responses custom tools instead of function-style tools.
      # custom_tool_mode: native
      # Group mcp__server__tool entries as Responses namespace tools.
      # namespace_tools: false
      # Upstream service tier.
      # service_tier: priority
      # Upstream reasoning summary mode.
      # reasoning_summary: none
      # Enable parallel tool calls upstream.
      # parallel_tool_calls: false

  openai_chat:
    kind: openai_chat                      # POST /v1/chat/completions
    endpoint:
      base_url: https://api.openai.com/v1
      api_key: "${OPENAI_API_KEY}"

  anthropic_native:
    kind: anthropic_messages               # POST /v1/messages, raw Anthropic SSE passthrough
    endpoint:
      base_url: https://api.anthropic.com/v1
      api_key: "${ANTHROPIC_API_KEY}"

  deepseek_chat:
    kind: openai_chat
    endpoint:
      base_url: https://api.deepseek.com/v1
      api_key: "${DEEPSEEK_API_KEY}"

Config rules:

${VAR} values expand from process environment at startup.
YAML parser supports simple nested maps, lists of maps, and scalar values. No anchors, flow style, or multiline strings.
routes[].match is case-insensitive substring match against incoming Claude model name. First match wins.
Claude Code's model picker only keeps virtual model ids that start with claude or anthropic, so route ids should use one of those prefixes.
Set CLAUDE_CODE_ENABLE_GATEWAY_MODEL_DISCOVERY=1 or Claude Code skips gateway model discovery and /model shows only built-in models.
RelayCode reloads config live on SIGHUP, but server.network.host, server.network.port, and server.responses.session_store_path still require a restart.
Fallback route with match: "*" is required.
providers.<name>.kind must be openai_chat, openai_responses, or anthropic_messages.
Provider adapters are created lazily on first routed request. Missing API key only fails when that provider is used.
auth_token, when non-empty, accepts either x-api-key: <token>, Authorization: Bearer <token>, or raw Authorization: <token>.

Providers

`openai_responses`

Translates Anthropic messages to OpenAI Responses input[] items.

Behavior:

Sends model, input, stream: true, and instructions from Anthropic system text.
Maps max_tokens to max_output_tokens.
Forwards top_p, tools, and function-call outputs.
Omits temperature because current Responses targets used by RelayCode reject or ignore it inconsistently.
Always sends tool_choice, parallel_tool_calls: false, and store: false.
Sets prompt_cache_key from Claude Code session id when available.
Adds include: ["reasoning.encrypted_content"] when reasoning is requested.
Maps Anthropic tool_choice: {"type":"any"} to OpenAI required.
Drops replayed raw Anthropic thinking blocks because Responses API does not accept them.
Maps Claude Code base64 image blocks to Responses input_image parts.

Optional knobs:

codex_auth_path: reads local Codex auth.json. auth_mode: apikey uses OPENAI_API_KEY; auth_mode: chatgpt / chatgptAuthTokens uses tokens.access_token, forwards ChatGPT-Account-ID from token metadata, and forwards X-OpenAI-Fedramp for FedRAMP accounts. ChatGPT auth with the default OpenAI base URL uses Codex's https://chatgpt.com/backend-api/codex base URL. When set, codex_auth_path takes precedence over api_key, and api_key may be omitted.
experimental.previous_response_id: enables HTTP previous_response_id chaining for backends that support it. Default stays off for Codex-style full replay with prompt_cache_key.
experimental.passthrough_server_tools: passes Anthropic server tool declarations upstream instead of stripping unsupported server-tool entries. Keep off unless upstream provider understands those tool shapes.
responses.custom_tool_mode: function: sends schema-less Anthropic custom tools as normal Responses function tools with an input string argument for gateways that reject Responses custom tool declarations. Default native keeps OpenAI/Codex-style custom tools.
responses.namespace_tools: true: groups MCP-style tool names like mcp__calendar__create_event into Responses namespace declarations and maps namespace-qualified function calls back to Claude Code's full tool name. Default false keeps flat function tools for stricter gateways.

`openai_chat`

Translates Anthropic messages to OpenAI Chat Completions messages[].

Behavior:

Sends system text as role: system message.
Converts regular client tools to OpenAI function tools.
Streams chat text, reasoning content, and tool-call arguments back to Anthropic SSE.
Sanitizes tool parameter property named type to avoid provider schema bugs, then restores argument key on streamed tool input.
Rejects user image blocks.

`anthropic_messages`

Passes Anthropic request through to upstream /v1/messages with model replaced by routed upstream model.

Behavior:

Sends x-api-key and anthropic-version: 2023-06-01.
Forces stream: true.
Adds max_tokens: 4096 when missing or zero.
Pipes upstream Anthropic SSE through with minor policy transforms.

Tool compatibility

Claude Code feature	Status	Notes
Client tools (`Bash`, `Read`, `Write`, `Edit`, etc.)	Works	RelayCode relays function-style tool calls/results.
Custom function tools	Works	Converted to provider function tools.
Tool argument streaming	Works	Mapped to Anthropic `input_json_delta`.
Thinking/reasoning deltas	Works	Chat reasoning and Responses reasoning events map to `thinking_delta`.
Local `web_search` / `web_fetch`	Optional	Requires `server.web_tools.enable: true` and forced Anthropic server tool choice.
Provider-side server tools	Experimental	Use `experimental.passthrough_server_tools` only with compatible upstreams.
Images	Works	Claude Code base64 image blocks map to Chat `image_url` and Responses `input_image`.
MCP/server-tool replay blocks	Degraded by default	Preserves model-visible history as text unless passthrough is enabled.
Responses namespace tools	Optional	`responses.namespace_tools: true` groups `mcp__server__tool` names into Codex-style namespace declarations.
Chained custom tool results	Works	Stored `call_id` metadata lets `previous_response_id` tails emit `custom_tool_call_output`.

Claude Code tool probe on 2026-05-14 verified safe, reversible client tools through RelayCode: agent dispatch, shell foreground/background tasks, file read/write/edit, notebook edit, task list/update/output/stop, cron create/list/delete, monitor events, web fetch/search, and plan-mode entry/exit. Worktree and dynamic loop wakeups were not exercised because they require explicit workflow context. PushNotification is a known caveat from that probe: the adapter dropped the tool call due schema validation.

Observability

Stats endpoint:

curl -sS http://127.0.0.1:8080/debug/stats \
  -H "x-api-key: freecc" | jq .

Response shape:

{
  "counters": {
    "hits": 0,
    "misses": 0,
    "forced_replays": 0,
    "expired_invalid": 0,
    "input_tokens": 0,
    "output_tokens": 0
  },
  "sessions": [
    {
      "provider": "openai_responses",
      "upstream_model": "gpt-5.5",
      "message_count": 3,
      "response_id": "resp_...",
      "last_used": "2026-05-11T05:32:50Z",
      "input_tokens": 1119,
      "output_tokens": 30
    }
  ]
}

Debug logging:

RELAYCODE_DEBUG_REQUEST=1: logs raw incoming /v1/messages JSON. Use only locally; this can include prompt text.
server.logging.log_request_snapshots: true or RELAYCODE_LOG_REQUEST_SNAPSHOTS=1: logs scrubbed request shape snapshots without raw prompt text.
server.logging.compact_tool_results: true: sends compacted long tool/Bash outputs to OpenAI-compatible upstreams while keeping short outputs unchanged. Raw capture files remain full when RELAYCODE_CAPTURE_DIR is enabled.
server.updates.enable_notification: true: checks the latest GitHub Release tag once at startup and logs when a newer release exists. Source builds use version dev and skip update checks; release builds get their tag injected at build time.
server.updates.check_url and server.updates.check_timeout_seconds: override the release endpoint and timeout (defaults: GitHub latest release API, 3s).
RELAYCODE_CAPTURE_DIR=/tmp/relaycode-capture: writes one directory per request with incoming_anthropic.json, per-call upstream/*/request.json, and split SSE frames under upstream/*/events/ and downstream_events/. Use only with throwaway prompts; request and tool content are captured for fixture generation.

Limitations

Session store is in memory by default. Set server.responses.session_store_path to persist Responses session/cache metadata JSON and tool-call metadata across restarts.
Responses cache reuse relies on upstream prompt caching via prompt_cache_key. Optional HTTP previous_response_id chaining is experimental; WebSocket continuation is not implemented.
OpenAI image support expects Claude Code base64 image blocks; remote image URLs are not fetched by RelayCode.
Local web tools run only for forced Anthropic web server tool requests.
Retry applies to transport errors, HTTP 429/5xx before a stream is accepted, and early Responses stream failures before any content is emitted. Later mid-stream provider failures are returned as Anthropic SSE errors.

Security

Bind to localhost by default. server.network.host: 127.0.0.1 in relaycode.example.yaml. Bind to a public interface only when needed.
Set server.network.auth_token before exposing RelayCode beyond localhost. Without it, any local process can reach the proxy.
No TLS termination. RelayCode serves plain HTTP. Terminate TLS at a reverse proxy (Caddy, nginx, Cloudflare Tunnel) when not on localhost.
No prompt logging by default. RELAYCODE_DEBUG_REQUEST=1 prints raw request bodies; use only locally for debugging. server.logging.log_request_snapshots prints shape-only snapshots without raw prompt text. server.logging.compact_tool_results can reduce long Bash/tool replay sent upstream. RELAYCODE_CAPTURE_DIR writes raw request/tool content for local fixture capture only.
No outbound update checks by default. server.updates.enable_notification must be set explicitly before RelayCode calls the GitHub Release API.
Provider keys via env. Prefer ${OPENAI_API_KEY} / ${DEEPSEEK_API_KEY} over pasting keys into relaycode.yaml.
Session store is in memory by default. Set server.responses.session_store_path if you want Responses session/cache and tool-call metadata on disk.

FAQ

Why route through Responses instead of Chat Completions? Responses accepts prompt_cache_key, so multi-turn Claude Code sessions reuse the shared prefix upstream. Chat Completions works but has no session-level cache handle.

Can I keep using Anthropic directly? Yes. Configure an anthropic_messages provider and route whichever model substring you want through it. RelayCode just forwards the request.

How do I mix providers? Put multiple entries in routes[]. First match (case-insensitive substring on incoming model) wins. Example: claude/opus → OpenAI Responses, claude/haiku → DeepSeek chat, * → fallback.

Does it support image / vision? Yes. Claude Code base64 image blocks map to OpenAI Chat image_url parts and Responses input_image parts. Native Anthropic routes pass image blocks through.

Will Claude Code know it's being proxied? No. RelayCode speaks the Anthropic Messages API; Claude Code treats it as a normal Anthropic endpoint.

Troubleshooting

prompt_cache=miss every turn. Client may not be sending metadata.user_id.session_id, or upstream may not be honoring prompt_cache_key. Without a session id, RelayCode falls back to instructions/tools fingerprint when available.
401/403 from upstream. API key is missing or wrong. Check the env var referenced in relaycode.yaml.
429 from upstream. Set providers.<name>.http.max_retries and http.max_concurrency to smooth spikes.
Long requests time out. Bump providers.<name>.http.timeout_seconds.
Image blocks. OpenAI adapters now accept Claude Code base64 image blocks. Use native Anthropic only if you want direct passthrough.
Forced web_search / web_fetch returns 400. Enable server.web_tools.enable: true. Default is off because the proxy makes outbound HTTP.

Development

go test ./...
go vet ./...
go build -o relaycode ./cmd/relaycode
./relaycode -config relaycode.yaml

No external Go dependencies. Tests use standard go test.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.github/workflows		.github/workflows
Img		Img
cmd/relaycode		cmd/relaycode
internal		internal
.gitignore		.gitignore
COMPATIBILITY_BACKLOG.md		COMPATIBILITY_BACKLOG.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
relaycode.example.yaml		relaycode.example.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RelayCode

Why this exists

Highlights

Install

Prebuilt binary (recommended)

Build from source

Quickstart

How it works

Repository layout

Configuration

Providers

`openai_responses`

`openai_chat`

`anthropic_messages`

Tool compatibility

Observability

Limitations

Security

FAQ

Troubleshooting

Development

License

About

Uh oh!

Releases 8

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RelayCode

Why this exists

Highlights

Install

Prebuilt binary (recommended)

Build from source

Quickstart

How it works

Repository layout

Configuration

Providers

openai_responses

openai_chat

anthropic_messages

Tool compatibility

Observability

Limitations

Security

FAQ

Troubleshooting

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Contributors

Uh oh!

Languages

`openai_responses`

`openai_chat`

`anthropic_messages`