v3.8.36
✨ New Features
Quota-Share system
- feat(quota): introduce a dedicated
quota-sharecombo strategy (Fase 3 #9) — Deficit Round Robin scheduling with per-model in-flight gating (P2C), automatic DB migration that promotes existingqtSd/*combos, and per-policy gating so invalid allocations cannot bleedallowto unintended connections. (#4939, #4901) - feat(quota): multi-window usage buckets, per-(key,model) caps, and session stickiness — connections now track consumption across 5 h, 7 d, and per-model windows;
quota_allocation_model_capsenforces per-key/model limits; session stickiness preserves prompt-cache integrity across turns. (#4928, #4927, #4929) - feat(quota): headroom strategy + proactive saturation — new
headroomcombo strategy selects connections by available quota margin; universal proactive saturation via upstream token-usage response headers; real Claude quota saturation sourced from/api/oauth/usage. (#4908, #4907, #4885) - feat(quota): concurrency control + cooldown-wait (Fase 2.1) —
max_concurrentis enforced at dispatch time; quota-share combos queue concurrent requests with a short cooldown-wait and re-dispatch on slot availability (Variant A); a cron heal proactively restores connections after their window resets. (#4965, #4970, #4967, #4900)
Combo routing
- feat(combo): task-aware routing strategy — routes requests to the best-fit connection based on task-type metadata, enabling per-task provider specialization within a combo. (#4945)
- feat(combo): Fusion strategy (16th strategy) — fan out to a configurable panel of models in parallel, then synthesize results through a judge model. (#4652)
- feat(combos): add an editable per-combo
descriptionfield. The routing-combo form now has a Description input, persisted in the combodatablob via/api/combos(POST/PUT) and round-tripped through GET — no new DB column. (#5005) - feat(routing): honor
X-Route-Modelrequest header to overridebody.model, enabling per-request model switching without modifying the request body. (#4863 — thanks @costaeder)
Providers & models
- feat(providers): update volcengine-ark model list, adding DeepSeek-V4-Flash and DeepSeek-V4-Pro. (#4905 — thanks @kenlin8827)
- feat(provider): add CodeBuddy CN (
copilot.tencent.com) — full OAuth + executor + model catalog stack. (#4664) - feat(opencode-go): advertise
glm-5.2andkimi-k2.7-codeto align with official Go endpoints. (#4711) - feat(sse): add Google Flow video-generation provider. (#4769)
- feat(api/v1): include alias-backed models in the
/v1/modelslisting. (#4630)
Proxy pool
- feat(proxy-pool): Cloudflare Workers proxy deployer + pool integration — deploy Cloudflare Workers relays directly from the dashboard and register them in the proxy pool. (#4640)
- feat(proxy-pool): Deno Deploy relays + group action buttons — deploy Deno Deploy relay workers and manage proxy groups with new bulk-action controls. (#4643)
Compression & infrastructure
- feat(compression): Kiro/CodeWhisperer tool-result compression engine — dedicated compressor for Kiro/CodeWhisperer tool outputs integrated into the streaming pipeline. (#4635)
- feat(endpoint): per-endpoint custom system prompt injection. A toggle + text field in the Endpoint settings card lets users inject a custom system prompt into every model request, applied via the existing system-prompt engine. Stored in settings DB. (#5022 — thanks @whale9820)
- feat(live-ws): allow non-loopback clients via
LIVE_WS_ALLOWED_HOSTSenv var, enabling multi-host setups to access the live WebSocket API. (#4877 — thanks @KooshaPari) - feat(db): track API endpoint dimension on
usage_historyfor per-endpoint cost and usage analytics. (#4676)
🔧 Bug Fixes
Translator
- fix(translator): regroup parallel tool results to be adjacent to their originating assistant turn, fixing tool-message ordering for providers that require strict interleaving. (#4882)
- fix(translator): preserve literal empty-string tool arguments in OpenAI-to-Claude streaming — they were previously dropped, causing tool calls to arrive with missing parameters. (#4959)
- fix(translator): normalize tools to Anthropic-native shape for non-Anthropic providers, ensuring tool definitions pass validation regardless of the format at the call site. (#4650)
- fix(translator): provider thinking compatibility — correct thinking-block serialization for DeepSeek and Gemini providers. (#4946)
- fix(translator): emit
</think>close marker for Anthropic thinking blocks, fixing truncated reasoning output in streamed responses. (#4633) - fix(translator): normalize
developerrole tosystemfor OpenAI-format providers. (#4625) - fix(translator): strip top-level
client_metadataon the OpenAI passthrough path (port from 9router#1157). (#4624) - fix(translator): replay
reasoning_contenton plain Xiaomi MiMo turns (port from 9router#1321). (#4639)
Copilot / GitHub executor
- fix(copilot): never route Gemini/Claude model variants to the
/responsesendpoint — these models require the chat-completions path only. (#4627) - fix(github): route Copilot Codex models to
/responses(port from 9router#102). (#4626) - fix(copilot,antigravity): cap
maxOutputTokensat 16384 to stop "Invalid Argument" 400 errors on high-token requests. (#4636) - fix(codex): drop non-standard
codex.*streaming events that breakresponses.streamconsumers. (#4715 — thanks @jeffer1312)
Claude / Anthropic
- fix(claude): omit
adaptive_thinkingandoutput_config.effortfor Haiku model variants, which reject those parameters. (#4661) - fix(claude): skip
mcp__tool-name cloaking and guard against missingconnectionIdto prevent crashes on Claude-native MCP tool calls. (#4861 — thanks @costaeder) - fix(claude-oauth): respect
429backoff headers on the Claude OAuth usage endpoint to reduce polling spam during quota checks. (#4655)
Routing & SSE
- fix(sse): fail over on
400responses that carry rate-limit text in the body, not just on canonical429status codes. (#4986) - fix(sse): honor per-account proxy and fingerprint-rotation settings in the opencode executor. (#4989)
- fix(sse): soft-penalize exhausted providers in auto-combo scoring instead of hard-excluding them, improving fallback resilience. (#4990)
- fix(sse): drop the CCP pin when the pinned provider is durably unhealthy, with anti-flap logic to prevent oscillation. (#4864 — thanks @costaeder)
- fix(combo): fetch models dynamically from custom provider endpoints instead of relying on a static list. (#4860)
- fix(combo): propagate the selected connection ID to fallback error responses so model lockout applies to the correct connection rather than the wrong fallback target. (#4809 — thanks @Chewji9875)
- fix(sse): skip third-party tool-name cloaking for Anthropic-native server tools to prevent naming conflicts. (#4808 — thanks @NomenAK)
Quota
- fix(quota): quota-exclusive
qtSd/*connections now appear in/v1/modelslistings; EPSILON-threshold check no longer falsely blocks under-budget allocations. (#4830) - fix(quota): migration 107 correctly activates the
quota-sharestrategy on existingqtSd/*combos. (#4962)
API / responses
- fix(api): parse the
/v1/responsesrequest body once instead of 3–4 times on the hot path, reducing per-request overhead. (#4958) - fix(api): evict stale in-memory rate-limit windows to stop a slow heap leak on long-running instances. (#4957)
- fix(api): require authentication on the compression
run-telemetryendpoint; documentOMNIROUTE_EVAL_CREDENTIALSenv var. (#4796) - fix(api): stop
GET /api/system/env/repairreturning HTTP500on packaged installs (it broke the onboarding wizard).createRequire(import.meta.url)ran at module top-level; once webpack bundles the route into the standalone build,import.meta.urlis frozen to the build-machine path andcreateRequirethrows during evaluation, so the whole route failed to load.createRequireis now resolved lazily inside the guardedbetter-sqlite3block, root-dir resolution falls back toprocess.cwd(), and the route passes an explicitrootDir. (#5028)
Dashboard
- fix(dashboard): show custom provider given-name instead of internal id across dashboard pages — cache, combo health, compression analytics, cost overview, health/autopilot, provider stats, route explainability, provider utilization, runtime. Adds shared
resolveProviderNameresolver anduseProviderNodeMaphook. (#4603) - fix(dashboard): on OAuth providers (e.g. GLM Coding), "Test all models" with auto-hide-failed now switches the model list to the "visible" filter after the run, so just-hidden failed models actually disappear on-screen — parity with the passthrough-provider path (#3610). Previously they were hidden in the DB but stayed visible under the "All" filter, so it looked like nothing was hidden. (#4887)
- fix(dashboard): restore the home-page provider topology card that was hidden by a default state change in #4596. (#4963)
- fix(dashboard): proxy-pool success gating, sync-timestamp persistence, and opt-in Redis backend. (#4988)
- fix(dashboard): show custom vision models in the LLM selector dropdown. (#4653)
Providers
- fix(pollinations): stop forcing
jsonModeon every request. Pollinations treatsjsonMode=trueas "the model MUST return JSON" and rejects (HTTP 400 "messages must contain the word 'json'") any normal chat request whose messages don't mention "json", so all non-JSON chat was broken.jsonModeis now only enabled when the caller actually requests JSON output (response_format.typeofjson_objectorjson_schema). (#3981) - fix(antigravity): default
safetySettingsto all-OFF for parity with the native Gemini paths. The Antigravity (Google Cloud Code) request builder setsafetySettings: undefined, whichJSON.stringifydrops — so no safety settings reached Google and its server-side defaults false-flagged benign technical prompts asprohibited_content(HTTP 200 + blocked body, which combo failover treats as terminal). Now honors a caller-supplied value and otherwise defaults toDEFAULT_SAFETY_SETTINGS, matching the claude-to-gemini / openai-to-gemini paths. (#5003) - fix(antigravity): exclude the standard Gemini rate-limit message from quota-exhaustion keyword matching to prevent false-positive saturation flags. (#4810 — thanks @Chewji9875)
- fix(chatgpt-web): map the advertised
gpt-5.5,gpt-5.5-pro,gpt-5.4-proandgpt-5.2-procatalog ids to their dash-form ChatGPT backend slugs. They were missing fromMODEL_MAP, so the executor sent the dot-form id verbatim, which the ChatGPT backend silently ignored and served the default Plus model instead of the requested one. Adds a drift guard asserting no advertised dot-form id reaches the backend verbatim. (#4665) - fix(gemini): preserve the
patternfield in the Antigravity tool schema sanitizer to avoid stripping valid regex constraints from tool definitions. (#4651) - fix(opencode): preserve DeepSeek reasoning content in streamed responses. (#4631)
- fix(perplexity): validate API keys via the
/v1/modelsendpoint instead of issuing a full chat request. (#4654) - fix(qoder): exchange PAT for
jt-*job token before initiating Cosy chat, fixing auth failures after the Qoder credential format change. (#4884) - fix(executors): strip parameters unsupported by the target provider/model to prevent
400 Invalid parametererrors on strict endpoints. (#4658) - fix(executors): preserve literal
reasoning_effort: "max"for Ollama Cloud instead of normalizing toxhigh. Ollama Cloud acceptshigh|medium|low|max|noneand rejectsxhigh(invalid reasoning value: 'xhigh'); OpenRouter DeepSeekmax→xhighnormalization is unaffected. (#4993 — thanks @Thinkscape) - fix(headroom): translate openai-responses input through OpenAI for external compression.
adaptBodyForCompressionnow serialisesfunction_call_outputitems whoseoutputfield is a JSON object (not a string) so compression engines can process the content — previously those items were excluded from compression becausehasTextContent()returned false for object values. (#5023 — thanks @anki1kr) - fix(proxy): fan out direct dispatcher streams to all registered proxy endpoints. (#4803 — thanks @makcimbx)
Compression
- fix(compression): eliminate ReDoS in the
math_inlinepreservation regex — the previous pattern could catastrophically backtrack on untrusted input. (#4838) - fix(compression): stop RTK over-truncating file-read tool results — RTK now respects the full content length for file-read outputs. (#4987)
Build / CLI / infrastructure
- fix(build): drop
@omniroute/open-ssefromoptimizePackageImportsto fix the Next.js build OOM crash. (#4968) - fix(cli): SIGKILL the systray child PID before closing the IPC channel to prevent macOS NSStatusItem orphan processes. (#4732)
- fix(cli): bump
better-sqlite3runtime pin to 12.10.1 for Node 26 compatibility. (#4685) - fix(cli): harden the systray2 tray runtime (port from 9router#1080). (#4628)
- fix(cli-tools): tolerate JSONC (comments and trailing commas) in tool settings files. (#4659)
- fix(install): make the
transformersdependency optional so CUDA-host installs that lack Python bindings succeed. (#4807 — thanks @megamen32) - fix(db): correct storage tuning settings to prevent WAL runaway on high-write workloads. (#4834 — thanks @rdself)
- fix(image): prevent compatible nodes from shadowing provider aliases in the image routing table. (#4656)
Plugin
- fix(plugin): opencode
auth.jsondual-key fallback for the auto-prefix migration. The config hook now looks up both the prefixed (opencode-omniroute) and bare (omniroute) keys, so users who authenticated before theopencode-prefix landed no longer need to re-auth. (#5027 — thanks @herjarsa)
🔒 Security
- fix(security): block SSRF allowlist bypass via
x-relay-pathheader manipulation on Deno/Vercel relays. (#4899) - fix(security): pin image-fetch DNS resolution to prevent SSRF DNS-rebinding attacks (GHSA-cmhj-wh2f-9cgx). (#4634)
- fix(security): do not trust the loopback socket as local-only when the server is behind a reverse proxy, closing a potential auth bypass path. (#4632)
- fix(security): validate the Kiro region parameter to prevent SSRF via crafted region strings (GHSA-6mwv-4mrm-5p3m). (#4629)
- fix(copilot): replace
execSyncshell interpolation withexecFilein therunOmniRouteClitool to prevent command injection. The user-supplied command is now split into an argv array and passed toexecFile(no shell), so shell metacharacters are treated as literal text; error output is routed throughsanitizeErrorMessage(). (#5024 — thanks @hamsa0x7)
📝 Maintenance
God-file decomposition (continued, #3501)
- refactor(chatCore): extracted 12 focused helpers from
chatCore.tscovering the streaming pipeline (assembleStreamingPipeline), cache-store logic (storeStreamingSemanticCacheResponse,storeSemanticCacheResponse), response headers (assembleStreamingResponseHeaders,buildNonStreamingResponseHeaders), JSON→SSE bridge (maybeConvertJsonBodyToSse), guardrail context (buildPostCallGuardrailContext), usage buffer (applyClientUsageBuffer), plugin hook (runPluginOnRequestHook), analytics (writeCompressionAnalytics,emitOutputStyleTelemetry), and compression predicates/settings (resolveCompressionSettings, et al.). (#4811–#4837) - refactor(sse/db/api): continued decomposition of
services/usage.ts(extracted quota-core, scalar/format helpers, Antigravity/GLM/MiniMax usage families),db/core.ts(schema-column reconciliation, snake↔camel column-mapping),db/apiKeys.ts(row-parsers, model-permission matching), andvalidation.ts(URL/headers/transport leaf layer, web-cookie/Meta-AI validators, enterprise-cloud + probe, audio/speech/apikey, search/embedding/rerank, and OpenAI/Anthropic format validators). (#4921–#4956) - refactor(pricing/providers): decomposed
pricing.tsinto shared tiers + partitionedDEFAULT_PRICINGmodules, and split theproviders.tscatalog into semantic data modules organized by provider family. (#4917, #4918) - refactor(open-sse): extract
safeParseJSONutility and deduptryParseJSONcall sites; extract and dedup the fallbacktool_callID generation helper. (#4735, #4736)
Quality & CI
- chore(quality): release base-red reconciliation + ratchet rebaselines — file-size, env-doc, and catalog baseline updates across multiple gates. (#4630, #4879, #4886, #4915, #4961, #4973)
- ci(quality): shift heavy validation gates to the PR→release merge fast-path to accelerate the release cycle. (#4857)
- fix(ci): include
coverage/lcov.infoin the coverage-report artifact so SonarQube can consume it. (#4670) - fix(test): validate Anthropic-compatible connections via
POST /v1/messagesfor accurate connectivity testing. (#4657)
Docs
- docs(resilience): document Quota-Share Concurrency Control —
max_concurrentenforcement, serialization behavior, and cooldown-wait semantics. (#4980) - docs(perf): add per-endpoint p50/p95/p99 latency and cost budgets reference. (#4867 — thanks @KooshaPari)
- docs(ops): add canonical incident response runbook. (#4868 — thanks @KooshaPari)
- docs(ops): document the release-green family —
green-prs,check:release-green,babysit, and nightly gate workflows. (#4679) - docs(agentbridge): document Electron
NODE_EXTRA_CA_CERTS, real model IDs, and identity caveat for agent bridge integrations. (#4718) - docs: clarify Kiro provides ~50 credits/month per account, not unlimited. (#4690)
Misc
- chore(claude,codex): bump pinned CLI identities — Claude
2.1.158 → 2.1.187, Codex0.132.0 → 0.142.0. (#4883) - chore(dashboard): rename Qoder display label from "Qoder AI" to "Qoder". (#4733)
What's Changed
- Release v3.8.36 by @diegosouzapw in #4854
- fix(release): green v3.8.36 release CI (pack-artifact + resilience keys) by @diegosouzapw in #5029
- fix(db): replace Math.random with crypto.randomUUID for ID generation by @hamsa0x7 in #5026
- fix(build): exclude .claude worktrees from tsconfig scope (stops local next build OOM) by @diegosouzapw in #5031
Full Changelog: v3.8.35...v3.8.36