Release v3.8.30 · diegosouzapw/OmniRoute

[3.8.30] — 2026-06-20

✨ New Features

feat(dashboard): category (media serviceKind) filter on the providers page — /dashboard/providers gains a media-category filter row (Image / Video / Music / Text→Speech / Speech→Text / Embedding) that composes with the existing search, free-only and "show configured only" filters. Membership is derived from the backend media registries (a provider that serves a kind is surfaced even if it never declared serviceKinds), keeping the UI in lockstep with the backend. (#4240)
feat(combo): per-step account allowlist — scope a round-robin/weighted step to a subset of a provider's connections — a combo model step can now carry a first-class account allowlist so a round-robin (or weighted) strategy is scoped to a chosen subset of a provider's connections (e.g. only foo1+foo2 out of foo1..foo4) without hand-pinning one step per account. Empty = the whole active pool (unchanged). When a step both has an allowlist and is tag-routed, the two intersect (most-restrictive wins); a single pinned account still takes precedence. The combo builder's Precision step editor gains an optional "Restrict to accounts" picker. (#3266)
feat(providers): add OpenAdapter, dit.ai and TokenRouter as OpenAI-compatible providers — three community-requested OpenAI-compatible aggregators now register as standard named OpenAI-style providers with live /v1/models discovery (the zenmux pattern), falling back to a seeded catalog when the upstream list is unavailable: OpenAdapter (https://api.openadapter.in/v1, free tier, 70+ open-source models — #4239), dit.ai (https://api.dit.ai/v1, dynamic-pricing router/gateway — #4155), and TokenRouter (https://api.tokenrouter.com/v1, free MiniMax model — #3841, thanks @FerLuisxd). No custom executor/translator — default OpenAI passthrough.
feat(api): x-omniroute-no-memory request header — per-request opt-out of memory/skills injection — clients that manage their own context (e.g. their own RAG/memory) can send x-omniroute-no-memory: true (mirrors the existing x-omniroute-no-cache convention) to skip the gateway injecting up to memorySettings.maxTokens (~2k) tokens of memory and skills context into that chat request — avoiding the token/cost inflation it otherwise adds on every call. Absent the header, behavior is unchanged. (PRD-2026-06-19-no-memory-header)
feat(dashboard): MITM tool card lists the exact hosts-file entries to add manually — the CLI-tools MITM card's "How it works" section now lists the full set of 127.0.0.1 <host> lines for the selected tool (sourced from the canonical MITM target registry) instead of a single example domain. Users on locked-down machines — where the automatic, sudo-gated hosts-file edit isn't available — can now copy every required entry by hand. (thanks @mrcyclo)
feat(cli): omniroute launch-codex + setup-codex — run/configure the Codex CLI against OmniRoute — a launcher and setup command that point the Codex CLI at an OmniRoute endpoint (remote-mode aware). (#4270)
feat(cli): Claude Code launcher + setup — remote mode + profiles — omniroute launch/setup for Claude Code with remote-mode support and named connection profiles. (#4274)
feat(cli): OpenCode setup — OpenAI-compatible provider + remote-aware plugin — setup-opencode registers OmniRoute as an OpenAI-compatible provider for OpenCode and installs a remote-aware plugin. (#4277)
feat(cli): one-command setup for popular AI coding tools — new setup-* commands that configure each tool to talk to OmniRoute: Cline (#4280), Kilo Code (#4284), Continue (#4289), Cursor (#4291), Roo Code (#4292), Crush (#4298), Goose (#4300), Qwen Code (#4301), Aider (#4302) and the Gemini CLI (native /v1beta) (#4303).
feat(providers): provider model sweep — live discovery, refreshed catalogs, dead-provider cleanup — a broad sweep that enables live /v1/models discovery for more OpenAI-style providers (the zenmux pattern), refreshes the seeded catalogs with current models, and marks dead providers deprecated. (#4324)
feat(mitm): translate Antigravity cloudcode end-to-end (Gap B) — the MITM decrypt path now translates Antigravity cloudcode traffic end-to-end. (#4299)
feat(keys): per-key USD usage quota controls — an API key can now carry a USD spend quota that caps its usage once the threshold is reached. (#4327 — thanks @Witroch4)

🔧 Changed

change(memory): memory is now OFF by default — DEFAULT_MEMORY_SETTINGS.enabled now defaults to false. Enabling memory injects up to ~2,000 tokens of retrieved context into every chat request (and that context is billed), which was a surprising default for new installs and for clients with their own context. Memory is now an explicit opt-in: installs that already enabled it keep it on; installs that never configured it default to off. The Settings → Memory panel now shows a token-cost warning when memory is enabled. (PRD-2026-06-19-no-memory-header)

🐛 Fixed

fix(compliance): startup cleanup honors the dashboard data-retention setting instead of always trimming to 7 days — on every restart, cleanupExpiredLogs() (run at startup) read retention only from the CALL_LOG_RETENTION_DAYS / APP_LOG_RETENTION_DAYS env vars, which default to 7 days when unset, and trimmed usage_history (the Usage Analysis data) before the dashboard-based runAutoCleanup() — which respects the configured retention — ever ran. So a dashboard "Data Retention" of 90 days was silently overridden and the Usage Analysis page only ever showed the last 7 days after a restart. Retention now follows the precedence explicit env var → dashboard DB setting → 7-day default, per table (usage_history→usageHistory, call_logs/proxy_logs/request_detail_logs→callLogs, mcp_tool_audit→mcpAudit); an operator who sets the env var still wins, and non-DB deployments still fall back to it. (#4354 — thanks @akbardwi)
fix(providers): bailian-coding-plan static fallback catalog matches the registry (10 models) — the provider-model sweep (#4324) added four current Model Studio coding-plan models (qwen3.7-plus, qwen3-coder-plus, qwen3-coder-next, glm-4.7) to the bailian-coding-plan registry entry but missed the static fallback mirror in staticModels.ts, which still listed only the older six. The static catalog (served when live discovery is unavailable) therefore diverged from the registry, and the existing static↔registry parity test went red on the release branch (only surfacing when test-impact analysis happened to select it). The static mirror now carries all ten models in registry order, restoring parity. (#4324)
fix(executors): ArenaLLM accepts LMArena's split Supabase SSR auth cookie — LMArena migrated to @supabase/ssr chunked auth cookies: the single arena-auth-prod-v1 cookie is now empty and the real session is split across arena-auth-prod-v1.0, arena-auth-prod-v1.1, … (ascending). A user who pasted the (now-empty) single cookie therefore sent an empty session and upstream rejected it as "invalid cookie". The LMArena executor now reconstructs the single cookie from its chunks — reading .0, .1, … in ascending numeric order until one is missing and concatenating their raw values (@supabase/ssr's combineChunks rule: plain join(""), no base64-decode, no JSON-parse, the base64- prefix kept verbatim) — while preserving the rest of the pasted jar. A non-empty single cookie is still forwarded unchanged (back-compat). The credential UX now instructs pasting the full Cookie header and tracks the .0/.1 storage keys. (#4271 — thanks @caussao)
fix(compression): preserve the cacheable prefix for automatic-cache providers — OpenAI / Codex (and Azure-OpenAI) use automatic prefix caching: the upstream caches the longest matching prefix of a request (system prompt + earliest messages) without any explicit cache_control markers in the body. The cache-aware compression guard only protected that prefix when the request carried explicit cache_control, so for automatic-cache providers the guard was skipped — and with compression enabled and preserveSystemPrompt: false (or a prefix-compressing mode like aggressive/ultra) it rewrote the system prompt / earliest messages, guaranteeing a cache miss and higher token spend through OmniRoute than going direct. The guard now treats a caching provider as sufficient on its own (isCachingProvider alone, independent of cache_control) to skip the system prompt and downgrade prefix-compressing modes, and OpenAI/Codex/Azure are now recognized as caching providers. Compression is still off by default — this only affects operators who enabled it with prefix preservation turned off. (#3955)
fix(executors): DuckDuckGo AI Chat uses duckduckgo.com (fixes 400) — the DuckDuckGo AI Chat executor fetched status/chat and set Origin/Referer against https://duck.ai while still sending Sec-Fetch-Site: same-origin, so the request's same-origin triplet (host + Origin + Referer) was inconsistent and the backend rejected it with HTTP 400. All current DDG reverse-engineering references — and the provider registry's own baseUrl — use https://duckduckgo.com; the executor now uses it consistently for the status URL, chat URL, Origin, and Referer (the same-origin header is now coherent). The x-fe-version scrape regex also required a 40-hex tail but the real served token has a 20-hex tail (e.g. serp_20250401_100419_ET-19d438eb199b2bf7c300), so it silently fell back to a hardcoded default; the pattern is relaxed to a bounded {20,40} tail (still ReDoS-safe). This addresses the DuckDuckGo half of the report; the separate Chipotle/chipotle upstream breakage is tracked independently. (#4037 — thanks @daniij)
fix(security): bound the prompt-injection scan to the first 16 KB (hot-path perf) — the prompt-injection guard joined every message/system string into one buffer and ran several regexes over the whole thing on every chat request, with no size cap — so a 300 KB body (pasted code, RAG context) meant O(body) CPU scanning on the hot path, a self-inflicted latency/GC source under concurrency. Both detection call sites (detectInjection in inputSanitizer.ts and the custom-pattern scan in promptInjection.ts) now slice the joined text to the first 16 KB (MAX_INJECTION_SCAN_BYTES) before the regex loop. Injection directives sit near the top of a prompt, so the generous cap preserves real detection while scanning only a bounded prefix; the existing 10 MB body-size cap (which protects ingestion) is unchanged. (#3932 — thanks @KooshaPari)
fix(sse): retry direct-connection socket failures on a fresh socket (fewer 502 bursts) — the default direct-connection undici dispatcher pools keep-alive sockets for up to 4 s, but some edges (e.g. nvidia, opencode-zen) silently close idle keep-alive sockets within that window, so the next request reusing a pooled socket fails with UND_ERR_SOCKET ("other side closed") — in bursts. proxyFetch already retried once on such transient errors, but the retry reused the same pooled dispatcher and could grab another stale socket, then fell through to native fetch (which also pools) → the job sat in the rate-limit queue until the 30 s timeout → 502 + circuit-breaker open. The retry now uses a dedicated no-keep-alive / no-pipelining dispatcher so it opens a brand-new socket that can't be a dead pooled one; the first attempt still uses the pooled dispatcher (healthy keep-alive reuse is preserved). Complements the v3.8.29 diagnostics (describeFetchCause, #4281). (#4252 — thanks @klimadev)
fix(sse): combo now stops at the first body-specific 400 instead of trying every target — the #2101 guard that detects a body-specific 400 (context overflow / malformed / model-access-denied, e.g. "model is not supported when using Codex with a ChatGPT account") logged "stopping combo" but executed a bare break, which only exited the inner retry loop; executeTarget then returned null and the outer target loop treated that as "this target produced nothing" and advanced to the next model. A combo of N targets that all reject the same request body therefore marched through all N (the report shows a 143-model Codex combo iterating every target), wasting upstream calls and per-attempt work. The guard now surfaces the 400 via the { ok, response } contract (mirroring the 499 client-disconnect path) so the combo resolves and stops immediately. (#4279)
fix(sse): non-streaming combo over a Responses-API target no longer returns empty content — a Responses-API target (codex/cx) streams from upstream even on stream:false, and its terminal response.completed snapshot can carry a non-empty output that lacks the assistant message item (e.g. only a reasoning item) while the streamed output_text deltas had reconstructed the full message. The SSE→JSON aggregator preferred the terminal output wholesale, dropping the reconstructed text → HTTP 200 with empty content (hit notably via n8n, which defaults to stream:false). The aggregator now falls back to the reconstructed delta output when the terminal output has no message item but the reconstruction does; the terminal snapshot still wins whenever it already carries the message. (#3948)
fix(executors): preserve tool-name casing on native Claude OAuth (read no longer leaks back as Read) — native Claude OAuth traffic runs through an anti-fingerprint tool-name cloak that renames a tool literally named read to Read on the wire and records the reverse alias on a non-enumerable _toolNameMap, which the response side uses to restore the client's original casing. Since v3.8.27 the executor returned a JSON-round-tripped copy of the body as transformedBody, and that round-trip dropped the non-enumerable map — so the restore saw an empty map and the cloaked Read streamed verbatim to the client, corrupting the tool name. The executor now re-attaches the cloak map onto the serialized body (mirroring the Antigravity executor), so tool-name casing round-trips correctly. (#4307 — thanks @dev-cj)
fix(api): cache-HIT X-OmniRoute-Response-Cost now reports the incremental cost (≈0), not the original — on a semantic-cache HIT the gateway serves the stored response without an upstream call, but X-OmniRoute-Response-Cost was reporting the original call's full cost (recomputed from the cached usage). A consumer summing response-cost for billing was therefore charging for responses that cost ≈$0 to serve (and stale entries could inflate it). Cache hits now bill X-OmniRoute-Response-Cost: 0.0000000000 (the real incremental cost), and the avoided cost is surfaced in a new X-OmniRoute-Cost-Saved header for cache analytics — mirroring the existing tokens_saved concept. The MISS path is unchanged. (PRD-2026-06-19-cache-hit-cost-reporting)
fix(models): imported vision-capable models keep their vision capability — after importing a provider key, vision-capable models (e.g. OpenRouter models whose architecture declares image input, and other synced providers) were listed as text-only in /v1/models and the dashboard — even though image requests actually worked. Synced model records never captured the vision flag, and the catalog's OpenRouter live-enrichment (which derives vision from architecture.input_modalities) is skipped once a provider has synced models. Discovery now captures supportsVision at sync time (from architecture.input_modalities, the string architecture.modality, or a top-level input_modalities), mirroring the existing supportsThinking capture, and the catalog surfaces capabilities.vision for synced models. (#4264 — thanks @FerLuisxd)
fix(providers): Cloudflare Workers AI model discovery shows model names, not UUIDs — importing a Cloudflare Workers AI key listed models with internal UUID identifiers (e.g. 429b9e8b-d99e-…) instead of their usable slugs (@cf/meta/llama-3.1-8b-instruct). Cloudflare's /ai/models/search returns { id: "<uuid>", name: "@cf/…" }, and discovery was passing the raw objects through — so the UUID id became the callable model id. The cloudflare-ai discovery now maps each result's name → id, surfacing the real @cf/… model ids. (#4259 — thanks @FerLuisxd)
fix(translator): clamp Responses API call_id to 64 characters — the OpenAI Responses API rejects call_id values longer than 64 characters with a 400. Long upstream tool-call ids (some clients emit ids well over the limit) are now clamped deterministically on both the function_call item and its matching function_call_output, so the pair stays matched through the orphaned-output filter and the request is accepted. (thanks @anuragg-saxenaa, @ngapngap)
fix(oauth): GitHub Copilot token refresh now sends the public client_id — the github provider config never carried a clientId, so GitHub OAuth refresh_token exchanges either omitted client_id or sent the literal string undefined (and a bogus client_secret=undefined), which GitHub rejects — leaving a Copilot connection stuck once its short-lived token expired and the long-lived refresh path was needed. The provider now resolves its public device-flow client_id from the embedded public credential and omits client_secret entirely (GitHub's Copilot app is a public client with no secret). (thanks @baslr)
fix(translator): a tool property named pattern survives Gemini/Antigravity schema sanitization — the Gemini schema sanitizer strips JSON-Schema constraint keywords Gemini rejects (pattern, minLength, …) at every nesting level, but it also deleted any tool property literally named one of those keywords. glob/grep tools declare a property called pattern, so on ag/* (Antigravity) backends that argument (and its required entry) was silently dropped, breaking the tools. Keyword stripping is now position-aware: it only removes constraint keywords at the schema-node level and never against the user-defined names inside a properties map. A genuine string-level pattern constraint is still stripped. (thanks @youthanh)
fix(translator): MCP namespace tools flatten to individual functions on the Responses→Chat path — when a Codex CLI client routes a Responses-API request to a non-Codex backend (e.g. kr/claude-opus-4.7), each MCP server is declared as a namespace tool ({ type:"namespace", name, tools:[…] }). The Responses→Chat translator had no namespace branch, so the whole group collapsed into a single empty-schema function named mcp__<server>__ and every MCP call returned unsupported call: mcp__<server>__, breaking all MCP-based workflows (context7, codegraph, custom MCPs) for that combination. The translator now expands a namespace into one Chat function per sub-tool (preserving each sub-tool's name and parameters); an empty namespace yields no tools instead of a broken placeholder. The native Codex passthrough path was already correct. (thanks @V13t4nh)
fix(cli): the active remote-context credential wins over an ambient OMNIROUTE_API_KEY — when a remote context is selected, its scoped access token now takes precedence over an OMNIROUTE_API_KEY present in the environment, so the connected remote is targeted as expected. (#4364)
fix(cli): wire the contexts command into the CLI program — the omniroute contexts command (list/switch saved remote contexts) was implemented but never registered, so it was unreachable; it is now wired into the CLI program. (#4369)
fix(mitm): mask bare Bearer <token> header values in the Traffic Inspector — the inspector now redacts bare Authorization: Bearer … values so tokens don't leak into captured traffic. (#4358)
fix(pricing): price the gpt-5.x-pro OpenAI models + align the opencode-go discovery test — adds pricing for the gpt-5.x-pro models so cost telemetry reports a real cost instead of zero. (#4355)
fix(sse): release the reader and cancel the stream on abort/error (no more Undici pool socket leak) — on abort or a mid-stream error the response reader is released and the stream cancelled, preventing leaked pooled sockets that degraded later requests. (#4309 — thanks @Ardem2025)
fix(kiro): emit an early role-only start chunk to release the stream-readiness gate — Kiro streams now send an initial role-only chunk so the stream-readiness gate releases promptly instead of stalling. (#4311 — thanks @artickc)
fix(dashboard): the proxy modal stops pre-filling new scopes with an unrelated proxy — adding a new scope assignment no longer inherits a previously-selected proxy's configuration. (#4312)
fix(open-sse): inner-ai stops silently rerouting unmatched models to models[0] — an unmatched model id is no longer silently served by the first available model; the lookup now returns null and the request is handled explicitly. (#4310)
fix(pollinations): handle auth-required premium models (claude, gemini, midjourney) — premium Pollinations models that require authentication are now handled correctly instead of failing. (#4266 — thanks @oyi77)
fix(codex): isolate the Spark quota scope — Codex Spark usage is tracked under its own quota scope so it no longer bleeds into other Codex quotas. (#4293 — thanks @xz-dev)
fix(dashboard): improve the API "try it" functionality — fixes the request path used by the dashboard's API "try it" panel. (#4296 — thanks @edrickrenan)
fix: polyfill crypto.randomUUID for non-secure contexts — restores UUID generation when the dashboard is served over a non-secure (plain-HTTP) origin where crypto.randomUUID is unavailable. (#4287 — thanks @pizzav-xyz)
fix(proxy): allow concurrent proxy dispatcher streams — the proxy dispatcher no longer serializes streams, so concurrent requests through a proxied connection run in parallel. (#4288 — thanks @wilsonicdev)
fix(build): co-locate llmlingua SLM optionals into dist/node_modules (postinstall) — the optional llmlingua SLM packages are co-located into the standalone build so the compression worker can actually spawn in production. (#4286)
fix(mitm): surface AgentBridge traffic in the Traffic Inspector (D4 ingest) — AgentBridge requests now appear in the Traffic Inspector. (#4285)
fix(sse): surface undici err.cause on dispatcher failure — dispatcher failures now flatten the cause chain (and AggregateErrors) into the error detail for diagnosability. (#4281)
fix(cli): harden launch/launch-codex with free-claude-code patterns — the launchers adopt the hardened launch patterns ported from free-claude-code. (#4278)
fix(compression): end-to-end audit — fixes across the whole compression flow — a sweep of the compression pipeline fixing ultra/aggressive/lossless edge cases, accessibility-anchor handling, language detection, and mode decoupling. (#4323)

🧪 Tests

test: align two tests left red by merged PRs — re-aligns the db-rules classification count (#4335) and the LMArena split-cookie metadata test (#4271) after concurrent merges. (#4346)
test(ci): reconcile the release/v3.8.30 baseline + test drift — reconciles quality baselines and drifted tests accumulated on the release branch. (#4276)

📝 Maintenance

refactor(combo): ComboContext + extract phaseComboSetup (god-file split, phase 1) — begins decomposing the combo god-file by extracting combo setup into a context object, without touching dispatch/semaphore logic. (#4326)
feat(quality): cap test-file size — anti-reinflation Layer 1 — freezes the existing god-tests and caps new test files at 800 lines to stop re-inflation. (#4273)
feat(quality): seed per-module mutationScore floors + a blocking aggregation ratchet (T3) — adds per-module mutation-score floors with a blocking aggregate gate. (#4305)
feat(quality): make the a11y gate real (@axe-core/playwright in nightly) — wires the previously-phantom accessibility gate into the nightly run with real baselines. (#4321)
feat(quality): unblock R1 — test-redundancy measurement via disableBail — enables the test-redundancy measurement that was previously blocked by fail-fast. (#4322)
fix(quality): the complexity gate now covers bin/ + electron/, and tracked-artifacts runs in pre-commit — extends the complexity gate's scope and moves the tracked-artifacts check into the pre-commit hook. (#4318)
fix(quality): restore release/v3.8.30 green — 3 latent reds from concurrent merges — fixes three latent test reds surfaced by concurrent merges into the release branch. (#4335)
fix(combo): keep phaseComboSetup under the complexity ceiling — extracts a helper so the new combo setup phase stays under the complexity gate. (#4338)
ci(mutation): split over-budget batches by range/pair so every batch fits the job cap — re-splits the mutation batches so each fits the CI job budget. (#4272)
chore(ci): align the electron audit gate to the root advisory policy — the electron-workspace audit gate now follows the same advisory policy as the root. (#4275)
chore(quality): reconcile the complexity/quality baselines across concurrent-merge drift — rolls up the cycle's baseline reconciliations driven by concurrent merges into the release branch. (#4330, #4336, #4370)
docs: ban AI-generation footers in commits/PRs/CHANGELOG (Hard Rule #16) — codifies the prohibition on AI-generation footers and bot co-author trailers. (#4328)
docs(design): add the OmniRoute design system and visual identity specification — adds the design-system / visual-identity specification document. (thanks @diegosouzapw)

🔒 Security

fix(sse): harden the DuckDuckGo lite scraper sanitization (CodeQL) — closes four HIGH CodeQL alerts in the no-key web-search scraper: decodeEntities now resolves & last so an already-escaped entity (e.g. &lt;) survives as literal text instead of being double-unescaped (js/double-escaping); stripTags decodes entities first, then strips tags in a loop to a fixpoint and drops any trailing unclosed <…, so entity-encoded markup like <script> can never reach the LLM/client as a live tag (js/incomplete-multi-character-sanitization); and the host checks in the search tests use new URL().hostname equality instead of substring .includes (js/incomplete-url-substring-sanitization). (#4356)

🔧 Dependencies

fix(deps): bump undici to 7.28.0 and dompurify to 3.4.11 (security) — addresses the undici SOCKS5-TLS / cache advisories and the dompurify advisory. (#4306)
chore(deps): bump actions/checkout from 4 to 7 — CI checkout-action update. (#4297)

🙌 Contributors

Thanks to everyone whose work landed in v3.8.30 (external contributors first, maintainer last):

Contributor	PRs / Issues
@oyi77	#4266
@pizzav-xyz	#4287
@wilsonicdev	#4288
@xz-dev	#4293
@edrickrenan	#4296
@Ardem2025	#4309
@artickc	#4311
@Witroch4	#4327
@FerLuisxd	#4264, #4259, #3841
@caussao	#4271
@daniij	#4037
@KooshaPari	#3932
@dev-cj	#4307
@klimadev	#4252
@akbardwi	#4354
@mrcyclo	#4325
@anuragg-saxenaa	#4317
@ngapngap	#4317
@baslr	#4320
@youthanh	#4339
@V13t4nh	#4340
@diegosouzapw	maintainer

What's Changed

fix(deps): bump undici to 7.28.0 and dompurify to 3.4.11 (security) by @diegosouzapw in #4304
Release v3.8.30 by @diegosouzapw in #4267

Full Changelog: v3.8.29...v3.8.30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3.8.30

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

[3.8.30] — 2026-06-20

✨ New Features

🔧 Changed

🐛 Fixed

🧪 Tests

📝 Maintenance

🔒 Security

🔧 Dependencies

🙌 Contributors

What's Changed

Contributors

Uh oh!