You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
✨ New Features
feat(sidebar): colored menu icons — sidebar menu icons now render with a per-item accent color: curated colors for known items (SIDEBAR_ICON_ACCENTS) plus a deterministic hash-based fallback (getSidebarIconAccent) so every item gets a stable, distinct color across sessions. (#3812 — thanks @rafacpti23)
feat(providers): add Factory (factory.ai) as a subscription gateway provider — factory (Factory Droids' hosted gateway) is now a first-class routing provider on the OpenAI-compatible https://api.factory.ai/v1 endpoint with Bearer apikey auth; the key is supplied from the Dashboard connection (not env). (#5065 — thanks @KooshaPari)
feat(providers): add Grok Build (xAI) provider with OAuth import-token flow — grok-cli (alias gc) routes through Grok's CLI chat proxy; users paste their ~/.grok/auth.json (or the JWT), with automatic refresh_token rotation. The public xAI client_id is embedded via resolvePublicCred("grok_id") (Hard Rule #11), never a literal. (#5020 — thanks @fulorgnas)
feat(dashboard): click-to-edit model alias in the provider page — click an alias to edit it inline (Enter/blur saves, Escape cancels), instead of only being able to delete and re-add it. (#5119 — thanks @waguriagentic)
feat(providers): allow local/private provider URLs by default (Allow Local Provider URLs flag) — adding/validating an OpenAI-compatible provider on a loopback/LAN address (e.g. http://127.0.0.1:3264/api) was rejected by the SSRF guard with "Blocked private or local provider URL", even though OmniRoute is local-first. A new OMNIROUTE_ALLOW_LOCAL_PROVIDER_URLS feature flag (default ON, toggle in Settings → Feature Flags) now scopes the provider-validation guard to allow local/private hosts while still blocking cloud-metadata endpoints (169.254.169.254, metadata.google.internal). Disable it to restore strict public-only blocking. Webhook/remote-image SSRF defaults are unchanged. (#5066, thanks @daniij)
feat(blackbox): refresh provider model catalog with latest models. (thanks @ptkelanatechsolutions)
kiro: inline <thinking> stream splitter — when <thinking_mode>enabled</thinking_mode> is present, assistantResponseEvent content is now split into separate delta.content / delta.reasoning_content SSE chunks (new open-sse/executors/kiroThinking.ts module wired into KiroExecutor.transformEventStreamToSSE).
feat(cursor): parse Cursor Composer DeepSeek-style inline tool calls — Composer cu/composer-2.5* models embed tool invocations in their visible text using <|tool▁calls▁begin|>…<|tool▁calls▁end|> markers instead of structured protobuf frames; a new streaming parser (composerToolCalls.ts) intercepts these in both streaming and non-streaming paths, suppresses the markers from the client-visible content, and emits proper OpenAI tool_calls deltas so downstream clients handle them natively. (thanks @noestelar)
feat(proxy): support auth-less host:port batch import and surface proxy-test failures. (thanks @dimaslanjaka)
feat(video): Alibaba DashScope video provider (wan2.7-t2v) — adds the alibaba video provider (DashScope async task → poll → MP4) wired through the standard apikey credential path, so text-to-video requests can route to Alibaba's wan2.7-t2v model. (thanks @josevictorferreira)
feat(cc): per-connection "summarized thinking display" toggle for Claude-Code-compatible providers — exposes a connection-level toggle that drives the existing Copilot summarized-thinking marker, so operators can opt a CC-compatible connection into summarized reasoning display from the UI (schema + request defaults + provider modals, with i18n). (thanks @rdself)
feat(compression): compression playground in the studio (Play + Compare tabs) — /dashboard/compression/studio gains a synthetic playground: paste text → per-engine lanes (each deterministic engine run alone via /api/compression/preview) plus a combined waterfall ordered by stackPriority, and a free A/B Compare grid with on-demand, USD-capped fidelity verdicts (/api/compression/compare + compare/verify). The preview route now uses the real cl100k tokenizer, returns engineBreakdown, and accepts an ordered pipeline[]; new compare / compare/verify / retrieve routes; the live WS feed moved to /dashboard/compression/live. Management-only. (#5080)
feat(dashboard): expose Fusion judgeModel + fusionTuning in the combo editor — the Fusion strategy editor now surfaces the judge model (synthesizes the panel answers; defaults to the first panel model) plus the quorum-grace tuning fields (minPanel, stragglerGraceMs, panelHardTimeoutMs) that open-sse/services/fusion.ts already reads. Schema-validated + bounded; empty tuning is never persisted. (#5074)
feat(compression): opt-in per-step fidelity gate for the stacked pipeline — each compression step can now be guarded by a pure fidelity checker (4 invariants, fail-open) so a lossy engine that would degrade the prompt past a threshold is rejected and its lane skipped instead of silently shipping. Configurable via fidelityGate (advanced thresholds intentionally API-omitted), with a per-lane rejection breakdown surfaced in the studio playground toggle. (#5143)
feat(compression): fuzzy near-duplicate dedup (session-dedup 2nd pass) — the session-dedup engine gains a second fuzzy pass that collapses near-duplicate (not just byte-identical) segments, with a playground toggle to compare on/off. (#5143)
feat(quota): opt-in Codex/Claude auto-ping keepalive — an opt-in background keepalive can periodically ping Codex/Claude connections to keep their session/quota state warm, reducing cold-start failures on the first real request. (#5102)
feat(ops): SRE playbooks + ops helper scripts — salvaged from a closed stale PR; adds operator runbooks and ops helper scripts. (#5138 — thanks @KooshaPari / @diegosouzapw)
feat(mcp): web-session robustness — cookie dedup + browser-pool observability — the MCP web-session path now de-duplicates cookies when (re)hydrating a session (avoiding conflicting duplicate Cookie headers) and exposes browser-pool observability (pool size / in-use / acquisition metrics) for the headless web providers. (#5121, builds on #3368)
feat(compression): Ionizer engine — lossy JSON-array sampling reversible via CCR — a new compression engine that down-samples large JSON arrays to a representative subset and records a Compact Change Representation (CCR) so the omitted rows can be reconstructed, trading exactness for a large token reduction on tabular/array-heavy payloads. (#5148)
🔧 Bug Fixes
fix(proxy): make the SOCKS5 handshake timeout operator-tunable (SOCKS_HANDSHAKE_TIMEOUT_MS) — under high concurrency against a single residential gateway host, the SOCKS5 connect handshake could exceed the hardcoded 10s even though the proxy was reachable, surfacing as a false [Proxy Fast-Fail] Proxy unreachable (the pool size is already tunable via OMNIROUTE_PROXY_DISPATCHER_CONNECTIONS). The handshake timeout now reads SOCKS_HANDSHAKE_TIMEOUT_MS (default unchanged at 10000, capped at 120000) so a concurrency-heavy deployment can raise it without a code change. Mitigation for #5109 (the full concurrency-100 collapse still needs the reporter's live load-test confirmation). (#5109)
fix(api): resolve GET /v1/models/{id} case-insensitively — clients that normalise the model id (e.g. OpenCode requesting minimax/minimax-m3 for the canonical catalog entry minimax/MiniMax-M3) missed the single-model lookup, which is case-sensitive, and fell back to advertising context_length: 0. findModelById now prefers an exact-case match and falls back to a case-insensitive match, so the real entry (and its context window) is returned regardless of casing. (#5082)
fix(services): embed WS proxy honours LIVE_WS_HOST; reject empty messages early — two headless/Docker deployment fixes (#5110). The embed WebSocket proxy (:20131) only read EMBED_WS_PROXY_HOST, so behind a reverse proxy/tunnel it stayed bound to 127.0.0.1 even with LIVE_WS_HOST=0.0.0.0 set and the Live dashboard showed "WebSocket disconnected"; it now falls back to LIVE_WS_HOST (default still loopback). Separately, a request with an explicitly empty messages: [] array was forwarded upstream and bounced back as a confusing raw 400/502; handleChat now rejects it up front with a clear messages: at least one message is required (Responses-API input requests are unaffected). (#5110)
fix(proxy): repair one-click Deno & Cloudflare relay deployments — the /api/settings/proxy/test endpoint only recognized the vercel relay type, so testing a deployed Deno or Cloudflare relay returned proxy.type must be http, https, or socks5 and never reached the relay; it now routes all relay types through isRelayType(). On installs with STORAGE_ENCRYPTION_KEY the relay-auth token is read via extractRelayAuth (encrypted relayAuthEnc form), fixing the silent 401 that left publicIp null. The Cloudflare Worker upload now sends the script part as application/javascript (the API rejects application/javascript+module; ES-module semantics come from main_module), and the proxy-registry schema accepts the deno/cloudflare types + deno-relay/cloudflare-relay sources so editing a deployed relay no longer 400s. (#5128)
fix(kiro): retire claude-sonnet-4.5 from the Kiro catalog + pin the exact Kiro 400 error — claude-sonnet-4.5 left the Kiro free-tier lineup (current active models: Opus 4.8/4.7/4.6, Sonnet 4.6, Haiku 4.5), so it is removed from the Kiro registry entry and the free-model catalog. A regression test now pins Kiro's verbatim [400] Invalid model. Please select a different model to continue. to the isModelUnavailableError model-unavailable classification. A 400 on every model (including current ones) points to a server-side Kiro tier/region gate, not an OmniRoute catalog bug. (#5140, closes #4484)
fix(dashboard): preserve every rendered field when loading/saving Resilience settings — ResilienceTab renders comboCooldownWait and quotaShareConcurrencyLimit, but both the initial-load and save paths rewrote component state without those fields, so after a successful /api/resilience response the cards received undefined and the page fell back to the generic "failed to load" state. A shared toResilienceResponse() mapper now keeps all rendered fields, and PATCH /api/resilience returns quotaShareConcurrencyLimit to match GET and the UI contract. (#5139 — thanks @rdself)
fix(quota): hydrate the in-memory quota cache from snapshots + scope auto-combo candidates — after a restart the quota cache was empty, so a known-exhausted connection looked healthy until re-queried; isAccountQuotaExhausted now lazily hydrates from persisted quota_snapshots. Auto-combo candidate expansion is also scoped to the connections each combo target actually allows, instead of pulling in every connection for the provider. (#5015 — thanks @JxnLexn)
fix(resilience): harden quota cutoff, Gemini audio MIME, and model-lockout cooldown — stored quota hard-cutoff values are no longer coerced to enabled=true from arbitrary strings; Gemini audio input parts have their MIME type validated/normalized before forwarding; and model lockout now honours the configured maxCooldownMs ceiling. (#5093 — thanks @KooshaPari)
fix(streaming): harden long OpenAI-compatible SSE streams — a late pipeline-wind-down error can no longer overwrite an already-recorded successful stream (streamCompletionRecorded guard), client disconnects finalize as 499 client_disconnected instead of poisoning provider/account failure state, JSON bodies that are actually SSE (wrong application/json content-type) are sniffed and re-streamed, and reasoning fields (reasoning/reasoning_content + OpenRouter/Gemini encrypted reasoning_details) are preserved through the JSON-as-SSE fallback. (#5124 — thanks @rdself)
fix(usage): dedupe request-usage logging and debounce stats events — saveRequestUsage now guards against duplicate inserts (natural key: timestamp + provider + model + connection + api-key + token counts), back-fills a missing endpoint, and only emits usageRecorded when a row was actually inserted; stats update/pending event bursts are collapsed into a single debounced notification to reduce churn. (#4940 — thanks @nguyenxvotanminh3)
fix(sse): convert the native Gemini request body to OpenAI format in the Antigravity MITM handler — contents / systemInstruction / generationConfig / thinkingConfig are now translated to OpenAI chat-completions format before forwarding to /v1/chat/completions, so thinking-capable models (e.g. ag/claude-opus-4-6-thinking) no longer fail with provider-side 400 "invalid argument" errors. (#4845 — thanks @anuragg-saxenaa)
fix(db): translate the two pt-BR SQLite driver-fallback log lines to English — [DB] Pré-inicializando sql.js WASM… and [DB] Drivers síncronos indisponíveis… were the only non-English server log strings, mixing languages in the logs. Now [DB] Pre-initializing sql.js WASM (synchronous drivers unavailable)… / [DB] Synchronous drivers unavailable — falling back to sql.js (WASM), guarded by a test that scans the driver path for accented log strings. (#5103)
fix(diagnostics): non-streaming Claude responses no longer false-502 as empty_choices — the v3.8.37 malformed-200 detector (#4942) only understood OpenAI choices and Responses-API output shapes, so a /v1/messages response that stays in Claude shape ({type:"message", content:[…]}) fell through to empty_choices → 502 (cascading to "All models failed" in a combo). Most visibly, an extended-thinking turn whose buffered body is a single empty thinking block with a valid signature (Claude Code's non-streaming Bash classifier) 502'd on every call. detectMalformedNonStream now understands the Claude shape: text/tool_use blocks and thinking blocks carrying a signature count as valid output, while a genuinely empty content:[] is still flagged. (#5108, thanks @insoln)
fix(combo): empty-content 502 now fails over within the same request instead of exhausting the provider — a leg that answers HTTP 200 with no usable completion is rewritten to 502 "Provider returned empty content", but the combo exhaustion classifier treated that synthetic 502 as a connection-level failure (#1731v2) and marked the whole provider/connection exhausted, skipping every remaining same-provider leg in that request. The connection is actually healthy (it just returned an empty body), so empty-content 502s are now classified as model-level transient failures: the request advances to the next leg and the rest of that provider's legs stay eligible. Genuine gateway 502s still trip connection exhaustion. (#5085, thanks @andrea-kingautomation)
fix(dashboard): surface the detailed credential-validation error instead of a bare "invalid" badge — the inline "Check" in the Add-Connection modal discarded the error message returned by /api/providers/validate and showed only an invalid badge. For web providers (claude-web / chatgpt-web) the real cause is often an environment error the backend already reports (e.g. TLS impersonation client failed to start: EACCES … mkdir tls-client-node/bin), so users were left guessing. The modal now renders the full reason next to the badge. (#5088, thanks @tkhs101)
fix(executors): strip client_metadata from forwarded body for Cerebras and Mistral — Cerebras returns 400 (wrong_api_format) and Mistral returns 422 (extra_forbidden) when the passthrough body carries client_metadata (an OpenAI Codex / Claude CLI field with no equivalent on these upstreams). The default executor now drops it for these two providers before sending downstream; other providers (notably openai/codex) keep it. (thanks @saurabh321gupta)
fix(codebuddy): only send reasoning params when the client requests reasoning. (thanks @anki1kr)
fix(sse): keep streaming for forceStream providers when a JSON client requests it. Providers marked forceStream:true reject stream:false upstream (HTTP 400); resolveStreamFlag now guards against this so stream-only providers keep streaming even when the client sends Accept: application/json or stream:false. (thanks @anki1kr)
fix(sse): prevent non-JSON SSE lines and duplicate [DONE] from breaking clients. (thanks @qianze0628)
fix(sse): dedupe case-variant Anthropic headers in the executor buildHeaders path — Node/undici's fetch merges anthropic-version and Anthropic-Version into a single "v, v" value that the Anthropic API rejects, so both case variants are now collapsed to one canonical lowercase header (same for anthropic-beta). (thanks @Delcado19)
oauth(kiro): support Kiro IDC (organization) token import — when the ~/.aws/sso/cache token carries a clientIdHash, auto-import now reads the linked client registration file to obtain clientId/clientSecret, probes the Kiro IDE profile.json for profileArn (ARN region normalized to us-east-1 for the runtime gateway), and refreshes via the regional AWS OIDC endpoint instead of the social path; the import schema and modal forward these credentials so manual imports also work for IDC tokens. (thanks @enjoyer-hub)
fix(translator): preserve client cache_control breakpoints when routing Claude-format requests (e.g. Claude Code) to Alibaba DashScope's OpenAI-compatible providers (alibaba / alibaba-cn). The Claude→OpenAI translation previously stripped the markers from the system and message text blocks, so DashScope's explicit caching never engaged and every request was a cache miss. Cache hints now survive when preservation is requested for caching-capable OpenAI-format providers. (thanks @sacrtap)
fix(tts): resolve Gemini TTS models from catalog and add gemini-3.1-flash-tts-preview as the new default Vertex TTS model. (thanks @nguyenha935)
fix(sse): don't cool down a healthy connection on a self-inflicted upstream timeout (504) — when OmniRoute's own deadline elapses (surfaced as TimeoutError/BodyTimeoutError → 504), the connection is no longer disabled/failed-over, so a slow-but-healthy provider isn't penalised for our timeout. Genuine upstream 5xx/429 still trigger cooldown; antigravity keeps its own policy. (thanks @costaeder)
fix(translator): forward image tool_result blocks as image_url instead of stringifying base64. (thanks @alican532)
fix(sse): robust Anthropic /v1/messages streaming — real ping keepalive + client-disconnect guard — slow first tokens on reasoning models could trip strict clients' idle-read watchdog; the route now keeps the stream warm with a real event: ping (Anthropic clients ignore SSE comments) from the very first frame, and a client disconnect (AbortError / controller-closed) no longer counts as a provider failure (no failover/cooldown). (thanks @costaeder)
fix: preserve model hidden flags (isHidden) across model sync — replaceCustomModels pruned the compat-override list to the new custom-model ids, silently wiping the isHidden flag of eye-hidden SYNCED models on every periodic sync / import (all hidden models turned back on). The redundant cleanup is removed (per-model removal already handles its own compat cleanup), so eye-hidden models stay hidden across re-sync. (#5086 — thanks @herjarsa)
fix(models): derive model-discovery config from the registry modelsUrl — providers absent from the hardcoded PROVIDER_MODELS_CONFIG but carrying a registry modelsUrl (e.g. MiniMax) now get an auto-derived Bearer /v1/models discovery config, so "discover models" works instead of returning nothing. (thanks @herjarsa)
fix(compression): resolve worker + rule/filter assets via runtime anchors (standalone bundle) — the LLMLingua worker and the RTK rule/filter loaders relied on fileURLToPath(import.meta.url), which the standalone bundle freezes to the build-machine path, so the worker never spawned and rule/filter packs failed to resolve. They now anchor on process.cwd()/argv[1] (with pathToFileURL for the worker URL). (thanks @fulorgnas)
fix(api): sanitize error responses on seven management routes (Rule #12 hardening) — cli-tools/backups, cli-tools/guide-settings/[toolId], logs/export, models/catalog, providers/test-batch, settings/import-json and usage/proxy-logs no longer return raw error.message; they wrap caught errors in sanitizeErrorMessage(...), and the routes are removed from the check-error-helper allowlist. (thanks @JxnLexn)
fix(sse): keep output_text-only Responses bodies from being dropped/false-502'd — some upstreams return a shorthand Responses body whose answer is only in output_text with an empty output[]. sanitizeResponsesApiResponse discarded the text, so the response then tripped the malformed-200 guard. The sanitizer now synthesizes an output[] message item from a non-empty output_text (complements the Claude-native fix in #5108; both stem from #4942).
fix(executors): preserve a lone caller-supplied Anthropic-Version header casing — the case-variant dedupe (#4846) unconditionally rewrote Anthropic-Version/Anthropic-Beta to lowercase even when only one variant was present, clobbering the caller's header. Dedupe now runs only when both case variants coexist (the actual undici-merge collision it was meant to fix).
fix(responses): default text.format to { type: "text" } for openai-compatible responses providers — some Responses-compatible upstreams (e.g. LM Studio) reject a text object missing text.format with a 400 missing_required_parameter; the default executor now fills the Responses-API default before forwarding (guarded to openai-compatible-*responses*, never overwriting an existing format). (thanks @StevanusPangau)
fix(translator): stop stripping client-provided reasoning_content for reasoning-replay providers — the #4849 agentic-context strip (which drops reasoning_content from tool-call assistant turns to avoid O(n²) token growth) ran unconditionally, so replay providers (DeepSeek V4, Kimi K2, Qwen-Thinking, etc.) lost the client's reasoning and the reasoning-replay cache then overwrote it with a stale cached value (and such upstreams 400 without the original reasoning). The strip now skips reasoning-replay targets while non-reasoning providers keep the O(n²) protection. (#5122)
fix(providers): add MiniMax M3 & Nemotron 3 Ultra to the Cline catalog — the two models were missing from Cline's provider catalog and could not be selected; both are now registered. (#5136, closes #3321)
fix(dashboard): key model-visibility toggle on the canonical providerId — the per-model visibility toggle keyed off a display id, so toggling a model on one provider alias could mis-target another; it now keys on the canonical providerId. (#5091 — thanks @Theadd)
fix(diagnostics): recognize the Claude API format in detectMalformedNonStream — salvaged null-guard so a Claude-shaped non-streaming body is no longer misclassified. (#5141 — thanks @herjarsa / @diegosouzapw)
fix(logging): track the final connection IDs in failover logs — failover log lines now record the connection that actually served (or last failed) the request, instead of only the first attempt. (#5016 — thanks @JxnLexn)
fix(sse): ignore disconnect races during in-band stream error handling — a client disconnect that races with in-band upstream error handling no longer surfaces as a spurious provider failure. (#5007 — thanks @JxnLexn)
fix(dashboard): surface the server error on handleToggleCombo failure — a failed combo toggle now shows the backend error instead of silently no-op'ing. (#5138 — thanks @KooshaPari / @diegosouzapw)
fix(quota): track provider quota reset windows + enrich the Codex playground — observed quota reset windows are tracked and surfaced, and the Codex playground gains the enriched quota metadata. (#5141 — thanks @Witroch4 / @diegosouzapw)
fix(sidebar): drop the orphan settings accent color — removed a dangling accent-color entry that broke typecheck:core. (#5142)
fix(sse): preserve non-stream reasoning fields for compatible clients — non-streaming responses now keep the upstream reasoning fields (reasoning / reasoning_content and OpenRouter/Gemini reasoning_details) instead of stripping them in responseSanitizer, so clients that render reasoning on buffered responses no longer lose it. (#5155 — thanks @rdself)
fix(i18n): add missing English UI labels — fills in untranslated English strings that were surfacing as raw keys in the dashboard. (#5153 — thanks @rdself)
🔒 Security
fix(security): exact-host Anthropic baseUrl check — the Anthropic base-URL guard used a substring match that a crafted host could partially satisfy; it now requires an exact host match (resolves CodeQL js/incomplete-url-substring-sanitization alert #674). (#5130)
📝 Maintenance
refactor(store): remove dead legacy store modules — salvaged cleanup of unused legacy store code. (#5138 — thanks @JxnLexn / @diegosouzapw)
test(combo): deterministic routing-decision matrix for all 17 strategies — a deterministic E2E matrix pins the routing decision of every combo strategy. (#5146)
chore: baseline reconciliations (complexity / file-size / cognitive), golden-snapshot + apikey-count alignment for new providers, orphan-test relocation, release base-red repairs, CHANGELOG i18n mirror sync, and an actions/cache 5→6 bump. (#5145, #5144, #5125, #5126, #5120, #5117, #5112)
test: gated live smoke for combo strategies (in-process + VPS HTTP) and refreshed release expectations to match current code. (#5151, #5150 — thanks @KooshaPari / @diegosouzapw)