Skip to content

v3.8.36

Choose a tag to compare

@diegosouzapw diegosouzapw released this 25 Jun 20:48
· 1 commit to main since this release
b5c789a

✨ New Features

Quota-Share system

  • feat(quota): introduce a dedicated quota-share combo strategy (Fase 3 #9) — Deficit Round Robin scheduling with per-model in-flight gating (P2C), automatic DB migration that promotes existing qtSd/* combos, and per-policy gating so invalid allocations cannot bleed allow to unintended connections. (#4939, #4901)
  • feat(quota): multi-window usage buckets, per-(key,model) caps, and session stickiness — connections now track consumption across 5 h, 7 d, and per-model windows; quota_allocation_model_caps enforces per-key/model limits; session stickiness preserves prompt-cache integrity across turns. (#4928, #4927, #4929)
  • feat(quota): headroom strategy + proactive saturation — new headroom combo strategy selects connections by available quota margin; universal proactive saturation via upstream token-usage response headers; real Claude quota saturation sourced from /api/oauth/usage. (#4908, #4907, #4885)
  • feat(quota): concurrency control + cooldown-wait (Fase 2.1) — max_concurrent is enforced at dispatch time; quota-share combos queue concurrent requests with a short cooldown-wait and re-dispatch on slot availability (Variant A); a cron heal proactively restores connections after their window resets. (#4965, #4970, #4967, #4900)

Combo routing

  • feat(combo): task-aware routing strategy — routes requests to the best-fit connection based on task-type metadata, enabling per-task provider specialization within a combo. (#4945)
  • feat(combo): Fusion strategy (16th strategy) — fan out to a configurable panel of models in parallel, then synthesize results through a judge model. (#4652)
  • feat(combos): add an editable per-combo description field. The routing-combo form now has a Description input, persisted in the combo data blob via /api/combos (POST/PUT) and round-tripped through GET — no new DB column. (#5005)
  • feat(routing): honor X-Route-Model request header to override body.model, enabling per-request model switching without modifying the request body. (#4863 — thanks @costaeder)

Providers & models

  • feat(providers): update volcengine-ark model list, adding DeepSeek-V4-Flash and DeepSeek-V4-Pro. (#4905 — thanks @kenlin8827)
  • feat(provider): add CodeBuddy CN (copilot.tencent.com) — full OAuth + executor + model catalog stack. (#4664)
  • feat(opencode-go): advertise glm-5.2 and kimi-k2.7-code to align with official Go endpoints. (#4711)
  • feat(sse): add Google Flow video-generation provider. (#4769)
  • feat(api/v1): include alias-backed models in the /v1/models listing. (#4630)

Proxy pool

  • feat(proxy-pool): Cloudflare Workers proxy deployer + pool integration — deploy Cloudflare Workers relays directly from the dashboard and register them in the proxy pool. (#4640)
  • feat(proxy-pool): Deno Deploy relays + group action buttons — deploy Deno Deploy relay workers and manage proxy groups with new bulk-action controls. (#4643)

Compression & infrastructure

  • feat(compression): Kiro/CodeWhisperer tool-result compression engine — dedicated compressor for Kiro/CodeWhisperer tool outputs integrated into the streaming pipeline. (#4635)
  • feat(endpoint): per-endpoint custom system prompt injection. A toggle + text field in the Endpoint settings card lets users inject a custom system prompt into every model request, applied via the existing system-prompt engine. Stored in settings DB. (#5022 — thanks @whale9820)
  • feat(live-ws): allow non-loopback clients via LIVE_WS_ALLOWED_HOSTS env var, enabling multi-host setups to access the live WebSocket API. (#4877 — thanks @KooshaPari)
  • feat(db): track API endpoint dimension on usage_history for per-endpoint cost and usage analytics. (#4676)

🔧 Bug Fixes

Translator

  • fix(translator): regroup parallel tool results to be adjacent to their originating assistant turn, fixing tool-message ordering for providers that require strict interleaving. (#4882)
  • fix(translator): preserve literal empty-string tool arguments in OpenAI-to-Claude streaming — they were previously dropped, causing tool calls to arrive with missing parameters. (#4959)
  • fix(translator): normalize tools to Anthropic-native shape for non-Anthropic providers, ensuring tool definitions pass validation regardless of the format at the call site. (#4650)
  • fix(translator): provider thinking compatibility — correct thinking-block serialization for DeepSeek and Gemini providers. (#4946)
  • fix(translator): emit </think> close marker for Anthropic thinking blocks, fixing truncated reasoning output in streamed responses. (#4633)
  • fix(translator): normalize developer role to system for OpenAI-format providers. (#4625)
  • fix(translator): strip top-level client_metadata on the OpenAI passthrough path (port from 9router#1157). (#4624)
  • fix(translator): replay reasoning_content on plain Xiaomi MiMo turns (port from 9router#1321). (#4639)

Copilot / GitHub executor

  • fix(copilot): never route Gemini/Claude model variants to the /responses endpoint — these models require the chat-completions path only. (#4627)
  • fix(github): route Copilot Codex models to /responses (port from 9router#102). (#4626)
  • fix(copilot,antigravity): cap maxOutputTokens at 16384 to stop "Invalid Argument" 400 errors on high-token requests. (#4636)
  • fix(codex): drop non-standard codex.* streaming events that break responses.stream consumers. (#4715 — thanks @jeffer1312)

Claude / Anthropic

  • fix(claude): omit adaptive_thinking and output_config.effort for Haiku model variants, which reject those parameters. (#4661)
  • fix(claude): skip mcp__ tool-name cloaking and guard against missing connectionId to prevent crashes on Claude-native MCP tool calls. (#4861 — thanks @costaeder)
  • fix(claude-oauth): respect 429 backoff headers on the Claude OAuth usage endpoint to reduce polling spam during quota checks. (#4655)

Routing & SSE

  • fix(sse): fail over on 400 responses that carry rate-limit text in the body, not just on canonical 429 status codes. (#4986)
  • fix(sse): honor per-account proxy and fingerprint-rotation settings in the opencode executor. (#4989)
  • fix(sse): soft-penalize exhausted providers in auto-combo scoring instead of hard-excluding them, improving fallback resilience. (#4990)
  • fix(sse): drop the CCP pin when the pinned provider is durably unhealthy, with anti-flap logic to prevent oscillation. (#4864 — thanks @costaeder)
  • fix(combo): fetch models dynamically from custom provider endpoints instead of relying on a static list. (#4860)
  • fix(combo): propagate the selected connection ID to fallback error responses so model lockout applies to the correct connection rather than the wrong fallback target. (#4809 — thanks @Chewji9875)
  • fix(sse): skip third-party tool-name cloaking for Anthropic-native server tools to prevent naming conflicts. (#4808 — thanks @NomenAK)

Quota

  • fix(quota): quota-exclusive qtSd/* connections now appear in /v1/models listings; EPSILON-threshold check no longer falsely blocks under-budget allocations. (#4830)
  • fix(quota): migration 107 correctly activates the quota-share strategy on existing qtSd/* combos. (#4962)

API / responses

  • fix(api): parse the /v1/responses request body once instead of 3–4 times on the hot path, reducing per-request overhead. (#4958)
  • fix(api): evict stale in-memory rate-limit windows to stop a slow heap leak on long-running instances. (#4957)
  • fix(api): require authentication on the compression run-telemetry endpoint; document OMNIROUTE_EVAL_CREDENTIALS env var. (#4796)
  • fix(api): stop GET /api/system/env/repair returning HTTP 500 on packaged installs (it broke the onboarding wizard). createRequire(import.meta.url) ran at module top-level; once webpack bundles the route into the standalone build, import.meta.url is frozen to the build-machine path and createRequire throws during evaluation, so the whole route failed to load. createRequire is now resolved lazily inside the guarded better-sqlite3 block, root-dir resolution falls back to process.cwd(), and the route passes an explicit rootDir. (#5028)

Dashboard

  • fix(dashboard): show custom provider given-name instead of internal id across dashboard pages — cache, combo health, compression analytics, cost overview, health/autopilot, provider stats, route explainability, provider utilization, runtime. Adds shared resolveProviderName resolver and useProviderNodeMap hook. (#4603)
  • fix(dashboard): on OAuth providers (e.g. GLM Coding), "Test all models" with auto-hide-failed now switches the model list to the "visible" filter after the run, so just-hidden failed models actually disappear on-screen — parity with the passthrough-provider path (#3610). Previously they were hidden in the DB but stayed visible under the "All" filter, so it looked like nothing was hidden. (#4887)
  • fix(dashboard): restore the home-page provider topology card that was hidden by a default state change in #4596. (#4963)
  • fix(dashboard): proxy-pool success gating, sync-timestamp persistence, and opt-in Redis backend. (#4988)
  • fix(dashboard): show custom vision models in the LLM selector dropdown. (#4653)

Providers

  • fix(pollinations): stop forcing jsonMode on every request. Pollinations treats jsonMode=true as "the model MUST return JSON" and rejects (HTTP 400 "messages must contain the word 'json'") any normal chat request whose messages don't mention "json", so all non-JSON chat was broken. jsonMode is now only enabled when the caller actually requests JSON output (response_format.type of json_object or json_schema). (#3981)
  • fix(antigravity): default safetySettings to all-OFF for parity with the native Gemini paths. The Antigravity (Google Cloud Code) request builder set safetySettings: undefined, which JSON.stringify drops — so no safety settings reached Google and its server-side defaults false-flagged benign technical prompts as prohibited_content (HTTP 200 + blocked body, which combo failover treats as terminal). Now honors a caller-supplied value and otherwise defaults to DEFAULT_SAFETY_SETTINGS, matching the claude-to-gemini / openai-to-gemini paths. (#5003)
  • fix(antigravity): exclude the standard Gemini rate-limit message from quota-exhaustion keyword matching to prevent false-positive saturation flags. (#4810 — thanks @Chewji9875)
  • fix(chatgpt-web): map the advertised gpt-5.5, gpt-5.5-pro, gpt-5.4-pro and gpt-5.2-pro catalog ids to their dash-form ChatGPT backend slugs. They were missing from MODEL_MAP, so the executor sent the dot-form id verbatim, which the ChatGPT backend silently ignored and served the default Plus model instead of the requested one. Adds a drift guard asserting no advertised dot-form id reaches the backend verbatim. (#4665)
  • fix(gemini): preserve the pattern field in the Antigravity tool schema sanitizer to avoid stripping valid regex constraints from tool definitions. (#4651)
  • fix(opencode): preserve DeepSeek reasoning content in streamed responses. (#4631)
  • fix(perplexity): validate API keys via the /v1/models endpoint instead of issuing a full chat request. (#4654)
  • fix(qoder): exchange PAT for jt-* job token before initiating Cosy chat, fixing auth failures after the Qoder credential format change. (#4884)
  • fix(executors): strip parameters unsupported by the target provider/model to prevent 400 Invalid parameter errors on strict endpoints. (#4658)
  • fix(executors): preserve literal reasoning_effort: "max" for Ollama Cloud instead of normalizing to xhigh. Ollama Cloud accepts high|medium|low|max|none and rejects xhigh (invalid reasoning value: 'xhigh'); OpenRouter DeepSeek max→xhigh normalization is unaffected. (#4993 — thanks @Thinkscape)
  • fix(headroom): translate openai-responses input through OpenAI for external compression. adaptBodyForCompression now serialises function_call_output items whose output field is a JSON object (not a string) so compression engines can process the content — previously those items were excluded from compression because hasTextContent() returned false for object values. (#5023 — thanks @anki1kr)
  • fix(proxy): fan out direct dispatcher streams to all registered proxy endpoints. (#4803 — thanks @makcimbx)

Compression

  • fix(compression): eliminate ReDoS in the math_inline preservation regex — the previous pattern could catastrophically backtrack on untrusted input. (#4838)
  • fix(compression): stop RTK over-truncating file-read tool results — RTK now respects the full content length for file-read outputs. (#4987)

Build / CLI / infrastructure

  • fix(build): drop @omniroute/open-sse from optimizePackageImports to fix the Next.js build OOM crash. (#4968)
  • fix(cli): SIGKILL the systray child PID before closing the IPC channel to prevent macOS NSStatusItem orphan processes. (#4732)
  • fix(cli): bump better-sqlite3 runtime pin to 12.10.1 for Node 26 compatibility. (#4685)
  • fix(cli): harden the systray2 tray runtime (port from 9router#1080). (#4628)
  • fix(cli-tools): tolerate JSONC (comments and trailing commas) in tool settings files. (#4659)
  • fix(install): make the transformers dependency optional so CUDA-host installs that lack Python bindings succeed. (#4807 — thanks @megamen32)
  • fix(db): correct storage tuning settings to prevent WAL runaway on high-write workloads. (#4834 — thanks @rdself)
  • fix(image): prevent compatible nodes from shadowing provider aliases in the image routing table. (#4656)

Plugin

  • fix(plugin): opencode auth.json dual-key fallback for the auto-prefix migration. The config hook now looks up both the prefixed (opencode-omniroute) and bare (omniroute) keys, so users who authenticated before the opencode- prefix landed no longer need to re-auth. (#5027 — thanks @herjarsa)

🔒 Security

  • fix(security): block SSRF allowlist bypass via x-relay-path header manipulation on Deno/Vercel relays. (#4899)
  • fix(security): pin image-fetch DNS resolution to prevent SSRF DNS-rebinding attacks (GHSA-cmhj-wh2f-9cgx). (#4634)
  • fix(security): do not trust the loopback socket as local-only when the server is behind a reverse proxy, closing a potential auth bypass path. (#4632)
  • fix(security): validate the Kiro region parameter to prevent SSRF via crafted region strings (GHSA-6mwv-4mrm-5p3m). (#4629)
  • fix(copilot): replace execSync shell interpolation with execFile in the runOmniRouteCli tool to prevent command injection. The user-supplied command is now split into an argv array and passed to execFile (no shell), so shell metacharacters are treated as literal text; error output is routed through sanitizeErrorMessage(). (#5024 — thanks @hamsa0x7)

📝 Maintenance

God-file decomposition (continued, #3501)

  • refactor(chatCore): extracted 12 focused helpers from chatCore.ts covering the streaming pipeline (assembleStreamingPipeline), cache-store logic (storeStreamingSemanticCacheResponse, storeSemanticCacheResponse), response headers (assembleStreamingResponseHeaders, buildNonStreamingResponseHeaders), JSON→SSE bridge (maybeConvertJsonBodyToSse), guardrail context (buildPostCallGuardrailContext), usage buffer (applyClientUsageBuffer), plugin hook (runPluginOnRequestHook), analytics (writeCompressionAnalytics, emitOutputStyleTelemetry), and compression predicates/settings (resolveCompressionSettings, et al.). (#4811#4837)
  • refactor(sse/db/api): continued decomposition of services/usage.ts (extracted quota-core, scalar/format helpers, Antigravity/GLM/MiniMax usage families), db/core.ts (schema-column reconciliation, snake↔camel column-mapping), db/apiKeys.ts (row-parsers, model-permission matching), and validation.ts (URL/headers/transport leaf layer, web-cookie/Meta-AI validators, enterprise-cloud + probe, audio/speech/apikey, search/embedding/rerank, and OpenAI/Anthropic format validators). (#4921#4956)
  • refactor(pricing/providers): decomposed pricing.ts into shared tiers + partitioned DEFAULT_PRICING modules, and split the providers.ts catalog into semantic data modules organized by provider family. (#4917, #4918)
  • refactor(open-sse): extract safeParseJSON utility and dedup tryParseJSON call sites; extract and dedup the fallback tool_call ID generation helper. (#4735, #4736)

Quality & CI

  • chore(quality): release base-red reconciliation + ratchet rebaselines — file-size, env-doc, and catalog baseline updates across multiple gates. (#4630, #4879, #4886, #4915, #4961, #4973)
  • ci(quality): shift heavy validation gates to the PR→release merge fast-path to accelerate the release cycle. (#4857)
  • fix(ci): include coverage/lcov.info in the coverage-report artifact so SonarQube can consume it. (#4670)
  • fix(test): validate Anthropic-compatible connections via POST /v1/messages for accurate connectivity testing. (#4657)

Docs

  • docs(resilience): document Quota-Share Concurrency Control — max_concurrent enforcement, serialization behavior, and cooldown-wait semantics. (#4980)
  • docs(perf): add per-endpoint p50/p95/p99 latency and cost budgets reference. (#4867 — thanks @KooshaPari)
  • docs(ops): add canonical incident response runbook. (#4868 — thanks @KooshaPari)
  • docs(ops): document the release-green family — green-prs, check:release-green, babysit, and nightly gate workflows. (#4679)
  • docs(agentbridge): document Electron NODE_EXTRA_CA_CERTS, real model IDs, and identity caveat for agent bridge integrations. (#4718)
  • docs: clarify Kiro provides ~50 credits/month per account, not unlimited. (#4690)

Misc

  • chore(claude,codex): bump pinned CLI identities — Claude 2.1.158 → 2.1.187, Codex 0.132.0 → 0.142.0. (#4883)
  • chore(dashboard): rename Qoder display label from "Qoder AI" to "Qoder". (#4733)

What's Changed

  • Release v3.8.36 by @diegosouzapw in #4854
  • fix(release): green v3.8.36 release CI (pack-artifact + resilience keys) by @diegosouzapw in #5029
  • fix(db): replace Math.random with crypto.randomUUID for ID generation by @hamsa0x7 in #5026
  • fix(build): exclude .claude worktrees from tsconfig scope (stops local next build OOM) by @diegosouzapw in #5031

Full Changelog: v3.8.35...v3.8.36