Skip to content

OnlyTerp/UltraCode-Shim

Repository files navigation

UltraCode-Shim — run Claude Code's UltraCode mode on any model you already pay for

CI License: MIT Python 3.8+ deps: stdlib only platforms

Use Claude Code's UltraCode mode (xhigh effort + the Workflow/deep-reasoning harness) with any model you already pay for — pick it live from the /model menu.

One icon. Open Claude Code, type /model, and choose any backend you've set up — all running with the full UltraCode harness. Your normal Claude Code install is left untouched.

The example config ships ready-to-use entries for GPT‑5.5 (Codex login), MiniMax‑M3, MiMo v2.5 Pro, DeepSeek V4 Pro/Flash, Step Flash, Ollama Cloud, OpenCode Go, OpenRouter, and local models — keep the ones you have a plan for, delete the rest. (Cursor's Composer needs the cursor-agent CLI and isn't HTTP-based — see docs/ADD_A_MODEL.md.)

One icon, every model · stdlib-only proxy · tools translated both ways · your Claude stays untouched

How it works

Claude Code's /model menu points at a loopback proxy that adds the UltraCode envelope and routes each pick to the backend you already pay for

How is this possible? At the API level, "UltraCode" is just effort=xhigh + adaptive thinking + a big max_tokens + one system reminder — there is no secret model. The proxy adds that envelope to every request, so any backend gets the UltraCode treatment. Full breakdown (with the reverse‑engineering evidence) in docs/HOW_IT_WORKS.md.

Built for long, dynamic workflows ✨

UltraCode shines on long, autonomous runs — deep reasoning, multi-step Workflows, multi-agent fan-out. The catch with any "route to a third-party backend" shim is that those backends occasionally hiccup, and on a 40-minute agent run a single unhandled hiccup can wedge the whole session. We hardened the proxy against the three failure modes we actually hit in production, so it keeps going instead of stalling:

  • 🔁 Empty turns auto-retry. A backend that returns a turn with no text and no tool call (a transient blip, or a budget-exhausted reasoning turn at high effort) is transparently re-issued. It buffers only until the first real token, so a normal turn adds zero latency and output is never duplicated — and it never retries after real output or a fatal error.
  • ⏱️ A stalled stream can't freeze the run. If a GPT‑5.5/codex stream opens and then goes silent mid-turn, a bounded idle timeout turns the stall into a quick retry instead of a ~10-minute hang — so one stuck sub-agent no longer freezes an entire multi-agent / dynamic-workflow run.
  • 🛠️ Rejecting a tool call just works. Declining (or skipping) a tool mid-run no longer 400s strict backends like DeepSeek — the proxy repairs the tool-call sequence and synthesizes a stub reply for anything you didn't answer, including partial parallel calls. (#3)

All three are tunable via env vars and locked down by the offline self-test in CI. Details and knobs: docs/HOW_IT_WORKS.md → Reliability.

Demo

There's a ready-to-run scenario in examples/demo/ — a buggy little Game of Life. Launch UltraCode there, pick any model, enable auto mode, and paste the prompt: it fixes the bug, adds an animated color renderer + starting patterns, and runs its own self-test, ending on a glider crawling across the screen.

Verified live against real backends: GPT‑5.5 (Codex login) and Cursor Composer, plus an offline self-test that runs in CI on Linux/Windows × Python 3.8/3.12.

What you need

  • Claude Code CLI with UltraCode access (npm i -g @anthropic-ai/claude-code).
  • Python 3.8+ (standard library only — there is nothing to pip install).
  • At least one backend credential, e.g. an API key (MiMo / OpenRouter / OpenAI / a local server) and/or a codex login for GPT‑5.5. You only set up the ones you have.

Tested on Windows 11 (no WSL needed). macOS/Linux/WSL work too via bin/ultracode.

Quick start

Three steps: get the code and run the doctor, copy config.example.json and pick your models, then launch and type /model

Windows

git clone https://github.com/OnlyTerp/UltraCode-Shim.git
cd UltraCode-Shim

# 1. Sanity-check your machine and config (safe to run anytime)
python scripts\doctor.py

# 2. Tell it which models you want (see "Configure your models" below)
#    Copy config.example.json to config.json, keep the models you have,
#    and put your keys in it (config.json is gitignored).
copy config.example.json config.json

# 3. Create Desktop icons (one for UltraCode, one for normal Claude Code)
.\windows\Install-DesktopIcons.ps1

# 4. Double-click "UltraCode (All Models)" — then type /model and pick a backend.

macOS / Linux / WSL

Run python3 scripts/doctor.py then ./bin/ultracode. (The launchers copy config.example.jsonconfig.json for you on first run if you skip step 2.)

Configure your models

Everything is in one file: config.json (copied from config.example.json). It has two sections you edit:

  • models — what shows up in the /model menu. Every id must start with claude or anthropic (Claude Code filters the rest out).
  • routes — where each of those ids actually goes. The route key must match the model id.

Example — MiMo and an OpenRouter model:

{
  "models": [
    { "id": "claude-mimo",       "display_name": "MiMo v2.5 Pro" },
    { "id": "claude-openrouter", "display_name": "Llama 3.3 70B (OpenRouter)" }
  ],
  "routes": {
    "claude-mimo": {
      "type": "openai_compat",
      "upstream": "https://token-plan-sgp.xiaomimimo.com/v1",
      "model": "mimo-v2.5-pro",
      "auth": "Bearer ${MIMO_API_KEY}"
    },
    "claude-openrouter": {
      "type": "openai_compat",
      "upstream": "https://openrouter.ai/api/v1",
      "model": "meta-llama/llama-3.3-70b-instruct",
      "auth": "Bearer ${OPENROUTER_API_KEY}"
    }
  }
}

Put your key right in config.json (it's gitignored) or use ${ENV_VAR} and export it — or drop keys into a gitignored ultracode.env the launchers load.

Route types:

type Use for Needs
(omit) Real Claude or any Anthropic-compatible endpoint nothing, or auth/upstream
openai_compat MiMo, DeepSeek, OpenRouter, OpenAI, Ollama, local llama.cpp — anything that speaks OpenAI Chat Completions (tools included) an API key
codex_oauth GPT‑5.5 via a ChatGPT/Codex login (no API key) codex login once
cursor_agent Cursor Composer (experimental) cursor-agent login

Reasoning models (MiniMax‑M3, etc.): an openai_compat route can carry a "body": { ... } dict of extra params merged into every request. MiniMax‑M3 needs "body": { "reasoning_split": true } so its <think> chain‑of‑thought is returned separately instead of leaking into the visible answer — the shipped example already sets this. See docs/ADD_A_MODEL.md.

Full walkthrough: docs/ADD_A_MODEL.md.

Is my normal Claude Code safe?

Yes. The UltraCode launcher only sets environment variables for the launched process and uses a session-scoped --settings file. It never edits your global Claude config or credentials. The installer also gives you a "Claude Code (Normal)" icon, so you can always start the plain version. Remove everything with windows\Uninstall.ps1.

Telling your AI assistant to set this up

This repo is built so you can hand it to an assistant. Point it at AGENTS.md — that's a step-by-step runbook (install → configure → test → troubleshoot) written for an AI to follow.

Docs

Doc What
AGENTS.md Runbook for an AI assistant to install/configure/test
docs/SETUP.md Human setup guide (Windows + macOS/Linux)
docs/HOW_IT_WORKS.md The mechanism + reverse-engineering evidence
docs/ADD_A_MODEL.md Add any backend to the /model menu
docs/TROUBLESHOOTING.md Symptom → cause → fix

License

MIT — see LICENSE. This is an unofficial, community project; it is not affiliated with Anthropic, OpenAI, or any model provider. You are responsible for complying with the terms of whatever accounts you route through it.

About

Give Claude Code's ultracode mode to ANY model you already pay for. A tiny local proxy + one config.json. Point your AI at AGENTS.md and it sets itself up.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors