✨ New Features
- feat(oauth): import accounts from CLIProxyAPI — Settings → CLIProxyAPI now has an "Import accounts" button that reads the OAuth accounts CLIProxyAPI already saved in
~/.cli-proxy-api/and imports them as OmniRoute connections, so you don't have to log into every account individually. CLIProxyAPI's unified auth-file format is parsed bytypediscriminator and the supported account types (Gemini, Codex, Claude/Anthropic, Antigravity, Qwen, Kimi) are upserted; unknown types are skipped. The preview never exposes tokens to the client. (thanks @powellnorma) - feat(routing): opt-in setting to echo the requested alias/combo name in the response model field — Settings → Routing now has an "Echo requested model name in responses" toggle (default off). When enabled, the response
modelfield (non-streaming and every streamed SSE chunk) reports the alias or combo name the client requested instead of the upstream model name, so strict clients such as Claude Desktop — which reject a response whosemodeldoes not match the request with a 401 — work with aliases and combos. (thanks @thaiphuong1202) - feat(providers): expand the openai and gemini direct registries with first-class variants already known elsewhere — the
openaiprovider entry now exposesgpt-4.1-mini,gpt-4.1-nano,o3-mini, ando4-mini(the latter two carryREASONING_UNSUPPORTEDlikeo3), and thegeminientry now exposesgemini-2.0-flash-liteandgemini-3-flash-lite-preview. These models were already first-class throughout sibling subsystems (cost estimator, task fitness, free-model catalog, multiple aggregator registries) but happened to be missing from the direct openai/gemini namespaces. Embedding/TTS/image-gen models stay in their dedicated registries (embeddingRegistry.ts,audioRegistry.ts,imageRegistry.ts); legacy ids OmniRoute curated out (o1, gpt-4-turbo, …) are not restored. (thanks @East-rayyy) - feat(translator): OpenAI SSE → Gemini SSE conversion for
/v1beta/models/{model}:streamGenerateContent— the@google/genaiSDK (Gemini CLI) always calls:streamGenerateContent?alt=ssefor chat and expects Gemini SSE chunks (no[DONE]sentinel — the stream just closes). The v1beta route was forwarding OpenAI SSE fromhandleChatunchanged, so the SDK crashed on the OpenAI[DONE]line withSyntaxError: Unexpected token 'D', "[DONE]" is not valid JSON. A newtransformOpenAISSEToGeminiSSE()(inopen-sse/translator/response/openai-to-gemini-sse.ts) rewrites each OpenAI delta intocandidates[].content.parts[], mapsfinish_reason→finishReason(STOP / MAX_TOKENS / SAFETY), attachesusageMetadata+modelVersionon the final chunk, and surfacesreasoning_contentas{ thought: true }parts for thinking models. The non-streaming:generateContentaction gets a siblingconvertOpenAIResponseToGemini()for the JSON path. Streaming intent is now keyed off the URL action suffix (canonical Gemini convention) rather than the non-standardgenerationConfig.streambody field. (thanks @SteelMorgan) - feat(compression): unified compression configuration panel (Phase 1) —
/dashboard/context/settingsis now the single source of truth for compression: a master toggle plus per-engine on/off and level controls, with the dispatch pipeline derived from a storedenginesmap onCompressionConfig. A gate (enginesExplicit) ensures the new map only drives dispatch when anenginesrow was actually saved from the panel, so legacy/backfilled installs (the seeded default combo from migrations 042/043) keep their existingdefaultModebehavior unchanged. The default-combo and per-engine routes are shimmed (410). (#4432 — thanks @diegosouzapw) - feat(mcp): register the web-session pool observability tools — the
poolToolsMCP tool set (web-session pool stats/health) was defined but never wired intocreateMcpServer(), so it was dead. It is now registered inserver.tswithwithScopeEnforcementagainst the typedread:health/write:resiliencescopes (no enum inflation), giving MCP clients visibility into the pooled web-session lifecycle. (#4399, #3368 — thanks @diegosouzapw) - feat(providers): stronger no-auth and web-cookie provider validation (
AUTH_007) — provider connection validation now handles no-auth and web-cookie providers explicitly: instead of returning a generic "Provider validation not supported", these providers report a preciseAUTH_007status so the dashboard surfaces actionable validation feedback for cookie/no-auth flows. (#4023 — thanks @oyi77) - feat(combo): per-combo
stickyRoundRobinLimitoverride on the combos page — the round-robin sticky-affinity limit can now be set per combo from the combos page UI, overriding the global default, so a combo can pin (or loosen) how many consecutive requests stick to the same round-robin member independently of the others. (#4472 — thanks @adivekar-utexas) - feat(usage): quota fetch for
kimi-coding-apikey— usage/quota tracking now supports thekimi-coding-apikeyprovider, so its remaining quota is fetched and surfaced like the other quota-aware providers. (#4435 — thanks @janeza2) - feat(cluster): opt-in memory + Bifrost cluster profiles — adds opt-in cluster profiles that wire the memory subsystem and the Bifrost Go sidecar into a clustered deployment (follow-up to #3932). (#4433 — thanks @KooshaPari)
- feat(models): opt-in low-noise
/v1/modelscatalog mode — a new opt-in mode trims the/v1/modelsresponse to a quieter, lower-noise catalog for clients that choke on or don't need the full provider/model list. (#4427 — thanks @Rahulsharma0810)
🐛 Fixed
- fix(embeddings): forward output dimensions to Gemini for consistent embedding dims. (thanks @nguyenha935)
- fix(translator): sanitize Read tool args from non-Anthropic models to prevent retry loops. (thanks @GodrezJr2)
- fix(usage): reuse Gemini CLI project ID for quota checks (avoid re-discovery). (thanks @Delcado19)
- fix(dashboard): surface manual config CTA when Claude CLI detection fails (remote deployments). (thanks @anuragg-saxenaa)
- fix(executors): granular reasoning_effort handling for Claude models on GitHub Copilot. (thanks @baslr)
- fix(translator): strip Claude output_config before MiniMax (rejected upstream). (thanks @hiepau1231)
- fix(translator): OpenAI audio input now reaches Gemini/Antigravity instead of being silently dropped —
input_audio/audiocontent parts on the OpenAI→Gemini path matched no handler inconvertOpenAIContentToPartsand were discarded with no error. They are now mapped to a GeminiinlineDatapart with anaudio/<format>mime type (wav, mp3, …). (thanks @mugnimaestra) - fix(combo): model lockout now honors a long upstream quota reset instead of retrying within minutes — when a combo target returned a quota error carrying an explicit long reset (e.g. Antigravity
Resets in 160h27m24s, aRetry-Afterheader), the per-model lockout capped at the short base cooldown (~minutes) and discarded the parsed reset, so the exhausted model kept being retried far too early. The lockout now applies the parsed reset when it exceeds the base cooldown, and the Antigravity error-message parser also matches the pluralResets in …phrasing. (thanks @Ansh7473) - fix(antigravity): Claude models no longer 400 with
Unknown name "output_config"— Anthropic/Claude-Code-only fields (output_config, legacyoutput_format) leaked into the Google Cloud Code request envelope via its top-level field passthrough, and Google rejects unknown envelope fields with400 Invalid JSON payload received. Unknown name "output_config"— breaking every Claude model served through Antigravity in IDEs. Those fields are now dropped before the envelope is built. (thanks @Duongkhanhtool) - fix(combo): round-robin members fail over faster under concurrency saturation via a configurable queue depth — when a round-robin combo member was saturated, requests sat in the per-model semaphore's unbounded queue and only failed over to the next member after the full
queueTimeoutMs(default 30s) elapsed — so a burst of agentic requests deep-queued one hot member instead of spilling to healthy ones. The per-model semaphore now accepts a bounded queue depth and emitsSEMAPHORE_QUEUE_FULLonce it is full (the round-robin loop already cascades on that code), so a configured low depth fails over immediately. A newqueueDepthcombo-config knob (global default / provider override / per-combo, default 20 for backward compatibility; 0 = never queue → fail over now) is exposed in Settings → Combo Defaults. (#3872 — thanks @KooshaPari) - fix(pricing): align Claude Code (
cc) pricing with current Anthropic per-MTok rates — theccprovider block in the default pricing table had stale numbers across every Claude 4.x family entry — most visibly,claude-opus-4-5-20251101was billed at the deprecated Opus 4.1 rate (input $15/output $75), andclaude-haiku-4-5-20251001was at half the current Haiku 4.5 rate. Thecached(cache hit) andcache_creation(5-minute cache write) multipliers were also off across Opus 4.6/4.7/4.8, Sonnet 4.5/4.6, Haiku 4.5, and Fable 5. All eight entries now match the rates Anthropic publishes (input, 5m cache write at 1.25x input, cache hit at 0.1x input, output; reasoning billed at the output rate), so cost accounting on the dashboard and per-request usage events stop under- or over-reporting Claude Code spend. (thanks @chulanpro5) - fix(executors): sanitize Anthropic-shape content parts before GitHub Copilot
/chat/completions— Claude models on GitHub Copilot driven from clients like Cursor IDE (e.g.gh/claude-sonnet-4.6) failed withProvider returned error: type has to be either 'image_url' or 'text' (reset after 30s)because the client passed through Anthropic-shape content parts (tool_use,tool_result,thinking) untouched, and the Copilot chat-completions endpoint only acceptstext/image_url.GithubExecutor.transformRequestnow serializes any unsupported part type astext(preserving the model's context), drops empty parts, and collapses tonullwhen an assistant message's only content was tool_calls —tool_callsride alongside untouched. Codex-family models still route through/responsesunchanged. (thanks @cngznNN) - fix(sse): refactor stall detection to reduce false positives on slow but progressing streams. (thanks @zakirkun)
- fix(executors): synthesize
x-opencode-requestfor custom-named OpenCode providers — the OpenCode CLI only emits thex-opencode-*header set when the provider id starts withopencode; a custom-named provider (e.g.omniroute) instead sendsx-session-affinity/x-session-id(mapped tox-opencode-sessionsince #4022) but no request-correlation id, sox-opencode-requestwas silently dropped.OpencodeExecutornow synthesizes a freshx-opencode-requeston that session-affinity fallback path so custom-named providers are not disadvantaged on the opencode.ai upstream.x-opencode-client/x-opencode-projectare intentionally not fabricated (no valid client source — an invented value risks upstream rejection) and remain forward-only;DefaultExecutoris untouched. (#4465 — thanks @pizzav-xyz) - fix(compression): RTK now compresses Anthropic-shape
tool_resultblocks —applyRtkCompressiononly compressed OpenAI-shape tool results (role:"tool"); Anthropic-shape tool results (tool_resultcontent blocks inside arole:"user"message) were skipped, so coding agents speaking the Anthropic Messages format got zero RTK savings even though RTK's command-aware filters (e.g.git-status) would have compressed the output. RTK now treats a message containing atool_resultblock as eligible (gated byapplyToToolResults), captures Anthropictool_useblocks for command resolution, and compresses each block's inner text (string or nested text-block array) while preservingtype+tool_use_idexactly — matching whatcaveman/aggressivealready did. (#4468 — thanks @diegosouzapw) - fix(dashboard): request-log auto-refresh no longer dies from a "ghost" load-more on first page load — the request-log viewer's infinite-scroll
IntersectionObserveruses a 200px rootMargin, so its sentinel was already intersecting on mount whenever the first page didn't fill the scroll container. That fired aloadMore()with no user interaction, growing the window pastPAGE_SIZE— and auto-refresh only polls while on the first page (limit <= pageSize), so it stayed permanently paused (only a manual filter change re-armed it). The observer now grows the window only after a genuine user scroll (new pureshouldTriggerInfiniteScrollguard), and a filter change re-arms the guard, so the default first-page view resumes its ~10s auto-refresh. (#4269 — thanks @tjengbudi) - fix(sse): large
/v1/chat/completionsrequests no longer crash the server with a Node heap OOM — the chat request body was parsed multiple times along the route (route guard, injection guard, handler), buffering very large payloads several times and pushing concurrent agentic traffic into an out-of-memory crash. The body is now parsed once at the route guard and threaded through, so each request is buffered a single time. (#4380 — thanks @NakHalal) - fix(guardrails): tighten the
system_prompt_leakheuristic to stop false positives on agent traffic — the leak detector flagged normal agent/tool conversations as prompt-leak attempts; it now requires an additional qualifier before flagging, so legitimate agent traffic is no longer blocked. (#4041 — thanks @KooshaPari) - fix(translator): drop orphan tool results on the Claude→OpenAI request path — a
tool_resultwith no preceding matchingtool_use(orphan) produced upstream 500/502 errors for Command Code / Custom OpenAI clients on ≥3.8.26. Orphan tool results are now filtered before the request is sent. (#4385 — thanks @adityapnusantara) - fix(providers): register API-key validators for Firecrawl and Jina Reader — both providers returned "Provider validation not supported" when validating their API key; they now have proper validators registered in
SEARCH_VALIDATOR_CONFIGS. (#4401 — thanks @ponkcore) - fix(providers): generic web-cookie validator must not shadow per-provider validators — a follow-up to the
AUTH_007validation work (#4023): the generic web-cookie validator was matching before more specific per-provider validators, so provider-specific validation was skipped. Validator resolution now prefers the per-provider validator. (#4467 — thanks @diegosouzapw) - fix(translator): inject a placeholder message when the Responses API
input[]is empty — aPOST /v1/responseswithinput: []translated tomessages: [], which every upstream Chat-Completions provider rejects (surfaced as a confusing 406); a single placeholder user message is now injected, mirroring the existing empty-string handling. (#4393 — thanks @diegosouzapw) - fix(providers): serve the api.airforce live
/modelscatalog instead of the stale seed — the api.airforce provider listed a stale hard-coded seed; it now serves the upstream live/modelscatalog. (#4395 — thanks @diegosouzapw) - fix(cli): non-interactive-safe prompts +
contextalias — the CLI'sconfirm()/prompt helpers no longer hang in non-interactive (piped/CI) contexts, and a singularcontextalias is accepted alongsidecontexts; the contexts workflow is documented. (#4439, #4397 — thanks @diegosouzapw) - fix(cli):
omniroute updateno longer reports a stale "latest" version from npm's cache —getLatestVersion()rannpm view omniroute versionwithout--prefer-online, so npm could serve a cached value from its HTTP cache and tell users on an older build (e.g. 3.8.30) they were already "running the latest version" even after a newer one (3.8.31) was published. The version check now passes--prefer-onlineto force npm to revalidate against the registry. (#4376 — thanks @akbardwi) - fix(sse):
web_search_20250305no longer 400s on MiniMax's Anthropic-compatible endpoint — PR #2960 added a Claude→Claude bypass that forwards Anthropic's typed server toolweb_search_20250305untouched, assuming the Claude-format upstream implements Anthropic server tools. MiniMax's/anthropicendpoint does not, soclaude → minimaxrequests carrying that tool gotHTTP 400 "invalid params, function name or parameters is empty (2013)".supportsNativeWebSearchFallbackBypassnow consults the (already-plumbed)providerand excludes providers known not to implement server tools (currentlyminimax) from the bypass, so the built-in web-search tool is converted to theomniroute_web_searchfunction fallback — which MiniMax accepts as a normal function tool. (#4481 — thanks @shafqatevo) - fix(command-code): pass
reasoning/thinkingfields through to upstream params — Command Code requests carryingreasoning/thinkingcontrols had those fields dropped before the upstream call, so reasoning-effort and extended-thinking settings were silently ignored; they are now forwarded to the upstream params. (#4473 — thanks @adivekar-utexas) - fix(usage): keep Kiro overage-enabled accounts routable after base quota hits zero — a Kiro account with overage enabled was excluded from routing once its base quota reached zero, even though overage billing should keep it serving; such accounts now stay routable past base-quota exhaustion. (#4469 — thanks @heaven321357 / @CleanDev-Fix)
- fix(providers): model-aware
supportsRedactedThinkingfor mixed-format providers — the redacted-thinking capability was resolved per provider rather than per model, so a mixed-format provider (some models support redacted thinking, others don't) got the wrong answer for some models; the check is now model-aware. (#4479 — thanks @TF0rd)
🔒 Security
- fix(sse): use a crypto-secure RNG for combo/deck load-balancing selection — random combo/deck member selection used a non-cryptographic PRNG, flagged by CodeQL (
#665); it now uses a crypto-secure RNG. (#4457 — thanks @diegosouzapw) - fix(sse): unbiased
crypto.randomIntfor combo selection (follow-up to #4457) — the initial crypto-secure conversion used modulo reduction over the secure bytes, which introduces a small modulo bias; selection now usescrypto.randomInt(rejection sampling) for a uniform, unbiased distribution across combo/deck members. (#4462 — thanks @diegosouzapw)
📝 Maintenance
- refactor(chatCore): extract
resolveChatCoreRequestSetup(first setup-phase slice) toward modularizing the chatCore god-file. (#4392 — thanks @diegosouzapw) - refactor(chatCore): extract the Codex service-tier resolvers into a pure
chatCore/serviceTier.tsleaf (continues the god-file split). (#4477, #3501 — thanks @diegosouzapw) - perf(dashboard): lazy-load the usage analytics charts so the dashboard's initial bundle/paint is lighter (charts hydrate on demand). (#4466 — thanks @KooshaPari)
- perf(kiro): cut request-completion hot-path CPU and cap the DB-lock event-loop block so Kiro request completion does not stall the event loop under load. (#4459 — thanks @artickc)
- fix(catalog): restore-green — add OpenAI
gpt-4.1-mini/gpt-4.1-nano+o3-mini/o4-minipricing rows to keep the static-parity gate green after the registry expansion (#4394), plus the web-cookie validator shadowing fix. (#4447 — thanks @diegosouzapw) - chore(quality): reconcile file-size + complexity baselines after the
/review-prsround, and theserver.tsfile-size baseline after the pool-tools registration (#3368). (#4461, #4423 — thanks @diegosouzapw) - docs(remote-mode): add a copy-paste end-to-end verification example. (#4430 — thanks @diegosouzapw)
- docs: add operational documentation (usage/quota, database, open-sse architecture, monitoring). (#3455 — thanks @oyi77)
What's Changed
- Release v3.8.32 by @diegosouzapw in #4418
Full Changelog: v3.8.31...v3.8.32