[3.8.30] — 2026-06-20
✨ New Features
- feat(dashboard): category (media serviceKind) filter on the providers page —
/dashboard/providersgains a media-category filter row (Image / Video / Music / Text→Speech / Speech→Text / Embedding) that composes with the existing search, free-only and "show configured only" filters. Membership is derived from the backend media registries (a provider that serves a kind is surfaced even if it never declaredserviceKinds), keeping the UI in lockstep with the backend. (#4240) - feat(combo): per-step account allowlist — scope a round-robin/weighted step to a subset of a provider's connections — a combo model step can now carry a first-class account allowlist so a round-robin (or weighted) strategy is scoped to a chosen subset of a provider's connections (e.g. only
foo1+foo2out offoo1..foo4) without hand-pinning one step per account. Empty = the whole active pool (unchanged). When a step both has an allowlist and is tag-routed, the two intersect (most-restrictive wins); a single pinned account still takes precedence. The combo builder's Precision step editor gains an optional "Restrict to accounts" picker. (#3266) - feat(providers): add OpenAdapter, dit.ai and TokenRouter as OpenAI-compatible providers — three community-requested OpenAI-compatible aggregators now register as standard named OpenAI-style providers with live
/v1/modelsdiscovery (the zenmux pattern), falling back to a seeded catalog when the upstream list is unavailable: OpenAdapter (https://api.openadapter.in/v1, free tier, 70+ open-source models — #4239), dit.ai (https://api.dit.ai/v1, dynamic-pricing router/gateway — #4155), and TokenRouter (https://api.tokenrouter.com/v1, free MiniMax model — #3841, thanks @FerLuisxd). No custom executor/translator — default OpenAI passthrough. - feat(api):
x-omniroute-no-memoryrequest header — per-request opt-out of memory/skills injection — clients that manage their own context (e.g. their own RAG/memory) can sendx-omniroute-no-memory: true(mirrors the existingx-omniroute-no-cacheconvention) to skip the gateway injecting up tomemorySettings.maxTokens(~2k) tokens of memory and skills context into that chat request — avoiding the token/cost inflation it otherwise adds on every call. Absent the header, behavior is unchanged. (PRD-2026-06-19-no-memory-header) - feat(dashboard): MITM tool card lists the exact hosts-file entries to add manually — the CLI-tools MITM card's "How it works" section now lists the full set of
127.0.0.1 <host>lines for the selected tool (sourced from the canonical MITM target registry) instead of a single example domain. Users on locked-down machines — where the automatic, sudo-gated hosts-file edit isn't available — can now copy every required entry by hand. (thanks @mrcyclo) - feat(cli):
omniroute launch-codex+setup-codex— run/configure the Codex CLI against OmniRoute — a launcher and setup command that point the Codex CLI at an OmniRoute endpoint (remote-mode aware). (#4270) - feat(cli): Claude Code launcher + setup — remote mode + profiles —
omniroute launch/setupfor Claude Code with remote-mode support and named connection profiles. (#4274) - feat(cli): OpenCode setup — OpenAI-compatible provider + remote-aware plugin —
setup-opencoderegisters OmniRoute as an OpenAI-compatible provider for OpenCode and installs a remote-aware plugin. (#4277) - feat(cli): one-command setup for popular AI coding tools — new
setup-*commands that configure each tool to talk to OmniRoute: Cline (#4280), Kilo Code (#4284), Continue (#4289), Cursor (#4291), Roo Code (#4292), Crush (#4298), Goose (#4300), Qwen Code (#4301), Aider (#4302) and the Gemini CLI (native/v1beta) (#4303). - feat(providers): provider model sweep — live discovery, refreshed catalogs, dead-provider cleanup — a broad sweep that enables live
/v1/modelsdiscovery for more OpenAI-style providers (the zenmux pattern), refreshes the seeded catalogs with current models, and marks dead providersdeprecated. (#4324) - feat(mitm): translate Antigravity cloudcode end-to-end (Gap B) — the MITM decrypt path now translates Antigravity
cloudcodetraffic end-to-end. (#4299) - feat(keys): per-key USD usage quota controls — an API key can now carry a USD spend quota that caps its usage once the threshold is reached. (#4327 — thanks @Witroch4)
🔧 Changed
- change(memory): memory is now OFF by default —
DEFAULT_MEMORY_SETTINGS.enablednow defaults tofalse. Enabling memory injects up to ~2,000 tokens of retrieved context into every chat request (and that context is billed), which was a surprising default for new installs and for clients with their own context. Memory is now an explicit opt-in: installs that already enabled it keep it on; installs that never configured it default to off. The Settings → Memory panel now shows a token-cost warning when memory is enabled. (PRD-2026-06-19-no-memory-header)
🐛 Fixed
- fix(compliance): startup cleanup honors the dashboard data-retention setting instead of always trimming to 7 days — on every restart,
cleanupExpiredLogs()(run at startup) read retention only from theCALL_LOG_RETENTION_DAYS/APP_LOG_RETENTION_DAYSenv vars, which default to 7 days when unset, and trimmedusage_history(the Usage Analysis data) before the dashboard-basedrunAutoCleanup()— which respects the configured retention — ever ran. So a dashboard "Data Retention" of 90 days was silently overridden and the Usage Analysis page only ever showed the last 7 days after a restart. Retention now follows the precedence explicit env var → dashboard DB setting → 7-day default, per table (usage_history→usageHistory,call_logs/proxy_logs/request_detail_logs→callLogs,mcp_tool_audit→mcpAudit); an operator who sets the env var still wins, and non-DB deployments still fall back to it. (#4354 — thanks @akbardwi) - fix(providers): bailian-coding-plan static fallback catalog matches the registry (10 models) — the provider-model sweep (#4324) added four current Model Studio coding-plan models (
qwen3.7-plus,qwen3-coder-plus,qwen3-coder-next,glm-4.7) to thebailian-coding-planregistry entry but missed the static fallback mirror instaticModels.ts, which still listed only the older six. The static catalog (served when live discovery is unavailable) therefore diverged from the registry, and the existing static↔registry parity test went red on the release branch (only surfacing when test-impact analysis happened to select it). The static mirror now carries all ten models in registry order, restoring parity. (#4324) - fix(executors): ArenaLLM accepts LMArena's split Supabase SSR auth cookie — LMArena migrated to
@supabase/ssrchunked auth cookies: the singlearena-auth-prod-v1cookie is now empty and the real session is split acrossarena-auth-prod-v1.0,arena-auth-prod-v1.1, … (ascending). A user who pasted the (now-empty) single cookie therefore sent an empty session and upstream rejected it as "invalid cookie". The LMArena executor now reconstructs the single cookie from its chunks — reading.0,.1, … in ascending numeric order until one is missing and concatenating their raw values (@supabase/ssr'scombineChunksrule: plainjoin(""), no base64-decode, no JSON-parse, thebase64-prefix kept verbatim) — while preserving the rest of the pasted jar. A non-empty single cookie is still forwarded unchanged (back-compat). The credential UX now instructs pasting the full Cookie header and tracks the.0/.1storage keys. (#4271 — thanks @caussao) - fix(compression): preserve the cacheable prefix for automatic-cache providers — OpenAI / Codex (and Azure-OpenAI) use automatic prefix caching: the upstream caches the longest matching prefix of a request (system prompt + earliest messages) without any explicit
cache_controlmarkers in the body. The cache-aware compression guard only protected that prefix when the request carried explicitcache_control, so for automatic-cache providers the guard was skipped — and with compression enabled andpreserveSystemPrompt: false(or a prefix-compressing mode likeaggressive/ultra) it rewrote the system prompt / earliest messages, guaranteeing a cache miss and higher token spend through OmniRoute than going direct. The guard now treats a caching provider as sufficient on its own (isCachingProvideralone, independent ofcache_control) to skip the system prompt and downgrade prefix-compressing modes, and OpenAI/Codex/Azure are now recognized as caching providers. Compression is still off by default — this only affects operators who enabled it with prefix preservation turned off. (#3955) - fix(executors): DuckDuckGo AI Chat uses duckduckgo.com (fixes 400) — the DuckDuckGo AI Chat executor fetched status/chat and set
Origin/Refereragainsthttps://duck.aiwhile still sendingSec-Fetch-Site: same-origin, so the request's same-origin triplet (host + Origin + Referer) was inconsistent and the backend rejected it with HTTP 400. All current DDG reverse-engineering references — and the provider registry's ownbaseUrl— usehttps://duckduckgo.com; the executor now uses it consistently for the status URL, chat URL,Origin, andReferer(the same-origin header is now coherent). Thex-fe-versionscrape regex also required a 40-hex tail but the real served token has a 20-hex tail (e.g.serp_20250401_100419_ET-19d438eb199b2bf7c300), so it silently fell back to a hardcoded default; the pattern is relaxed to a bounded{20,40}tail (still ReDoS-safe). This addresses the DuckDuckGo half of the report; the separate Chipotle/chipotleupstream breakage is tracked independently. (#4037 — thanks @daniij) - fix(security): bound the prompt-injection scan to the first 16 KB (hot-path perf) — the prompt-injection guard joined every message/system string into one buffer and ran several regexes over the whole thing on every chat request, with no size cap — so a 300 KB body (pasted code, RAG context) meant O(body) CPU scanning on the hot path, a self-inflicted latency/GC source under concurrency. Both detection call sites (
detectInjectionininputSanitizer.tsand the custom-pattern scan inpromptInjection.ts) now slice the joined text to the first 16 KB (MAX_INJECTION_SCAN_BYTES) before the regex loop. Injection directives sit near the top of a prompt, so the generous cap preserves real detection while scanning only a bounded prefix; the existing 10 MB body-size cap (which protects ingestion) is unchanged. (#3932 — thanks @KooshaPari) - fix(sse): retry direct-connection socket failures on a fresh socket (fewer
502bursts) — the default direct-connection undici dispatcher pools keep-alive sockets for up to 4 s, but some edges (e.g.nvidia,opencode-zen) silently close idle keep-alive sockets within that window, so the next request reusing a pooled socket fails withUND_ERR_SOCKET("other side closed") — in bursts.proxyFetchalready retried once on such transient errors, but the retry reused the same pooled dispatcher and could grab another stale socket, then fell through to native fetch (which also pools) → the job sat in the rate-limit queue until the 30 s timeout →502+ circuit-breaker open. The retry now uses a dedicated no-keep-alive / no-pipelining dispatcher so it opens a brand-new socket that can't be a dead pooled one; the first attempt still uses the pooled dispatcher (healthy keep-alive reuse is preserved). Complements the v3.8.29 diagnostics (describeFetchCause, #4281). (#4252 — thanks @klimadev) - fix(sse): combo now stops at the first body-specific 400 instead of trying every target — the
#2101guard that detects a body-specific 400 (context overflow / malformed / model-access-denied, e.g. "model is not supported when using Codex with a ChatGPT account") logged "stopping combo" but executed a barebreak, which only exited the inner retry loop;executeTargetthen returnednulland the outer target loop treated that as "this target produced nothing" and advanced to the next model. A combo of N targets that all reject the same request body therefore marched through all N (the report shows a 143-model Codex combo iterating every target), wasting upstream calls and per-attempt work. The guard now surfaces the 400 via the{ ok, response }contract (mirroring the 499 client-disconnect path) so the combo resolves and stops immediately. (#4279) - fix(sse): non-streaming combo over a Responses-API target no longer returns empty content — a Responses-API target (codex/
cx) streams from upstream even onstream:false, and its terminalresponse.completedsnapshot can carry a non-emptyoutputthat lacks the assistant message item (e.g. only areasoningitem) while the streamedoutput_textdeltas had reconstructed the full message. The SSE→JSON aggregator preferred the terminaloutputwholesale, dropping the reconstructed text → HTTP 200 with empty content (hit notably via n8n, which defaults tostream:false). The aggregator now falls back to the reconstructed delta output when the terminal output has no message item but the reconstruction does; the terminal snapshot still wins whenever it already carries the message. (#3948) - fix(executors): preserve tool-name casing on native Claude OAuth (
readno longer leaks back asRead) — native Claude OAuth traffic runs through an anti-fingerprint tool-name cloak that renames a tool literally namedreadtoReadon the wire and records the reverse alias on a non-enumerable_toolNameMap, which the response side uses to restore the client's original casing. Since v3.8.27 the executor returned a JSON-round-tripped copy of the body astransformedBody, and that round-trip dropped the non-enumerable map — so the restore saw an empty map and the cloakedReadstreamed verbatim to the client, corrupting the tool name. The executor now re-attaches the cloak map onto the serialized body (mirroring the Antigravity executor), so tool-name casing round-trips correctly. (#4307 — thanks @dev-cj) - fix(api): cache-HIT
X-OmniRoute-Response-Costnow reports the incremental cost (≈0), not the original — on a semantic-cache HIT the gateway serves the stored response without an upstream call, butX-OmniRoute-Response-Costwas reporting the original call's full cost (recomputed from the cachedusage). A consumer summingresponse-costfor billing was therefore charging for responses that cost ≈$0 to serve (and stale entries could inflate it). Cache hits now billX-OmniRoute-Response-Cost: 0.0000000000(the real incremental cost), and the avoided cost is surfaced in a newX-OmniRoute-Cost-Savedheader for cache analytics — mirroring the existingtokens_savedconcept. The MISS path is unchanged. (PRD-2026-06-19-cache-hit-cost-reporting) - fix(models): imported vision-capable models keep their vision capability — after importing a provider key, vision-capable models (e.g. OpenRouter models whose
architecturedeclares image input, and other synced providers) were listed as text-only in/v1/modelsand the dashboard — even though image requests actually worked. Synced model records never captured the vision flag, and the catalog's OpenRouter live-enrichment (which derives vision fromarchitecture.input_modalities) is skipped once a provider has synced models. Discovery now capturessupportsVisionat sync time (fromarchitecture.input_modalities, the stringarchitecture.modality, or a top-levelinput_modalities), mirroring the existingsupportsThinkingcapture, and the catalog surfacescapabilities.visionfor synced models. (#4264 — thanks @FerLuisxd) - fix(providers): Cloudflare Workers AI model discovery shows model names, not UUIDs — importing a Cloudflare Workers AI key listed models with internal UUID identifiers (e.g.
429b9e8b-d99e-…) instead of their usable slugs (@cf/meta/llama-3.1-8b-instruct). Cloudflare's/ai/models/searchreturns{ id: "<uuid>", name: "@cf/…" }, and discovery was passing the raw objects through — so the UUIDidbecame the callable model id. Thecloudflare-aidiscovery now maps each result'sname→ id, surfacing the real@cf/…model ids. (#4259 — thanks @FerLuisxd) - fix(translator): clamp Responses API
call_idto 64 characters — the OpenAI Responses API rejectscall_idvalues longer than 64 characters with a 400. Long upstream tool-call ids (some clients emit ids well over the limit) are now clamped deterministically on both thefunction_callitem and its matchingfunction_call_output, so the pair stays matched through the orphaned-output filter and the request is accepted. (thanks @anuragg-saxenaa, @ngapngap) - fix(oauth): GitHub Copilot token refresh now sends the public client_id — the
githubprovider config never carried aclientId, so GitHub OAuthrefresh_tokenexchanges either omittedclient_idor sent the literal stringundefined(and a bogusclient_secret=undefined), which GitHub rejects — leaving a Copilot connection stuck once its short-lived token expired and the long-lived refresh path was needed. The provider now resolves its public device-flowclient_idfrom the embedded public credential and omitsclient_secretentirely (GitHub's Copilot app is a public client with no secret). (thanks @baslr) - fix(translator): a tool property named
patternsurvives Gemini/Antigravity schema sanitization — the Gemini schema sanitizer strips JSON-Schema constraint keywords Gemini rejects (pattern,minLength, …) at every nesting level, but it also deleted any tool property literally named one of those keywords. glob/grep tools declare a property calledpattern, so onag/*(Antigravity) backends that argument (and itsrequiredentry) was silently dropped, breaking the tools. Keyword stripping is now position-aware: it only removes constraint keywords at the schema-node level and never against the user-defined names inside apropertiesmap. A genuine string-levelpatternconstraint is still stripped. (thanks @youthanh) - fix(translator): MCP
namespacetools flatten to individual functions on the Responses→Chat path — when a Codex CLI client routes a Responses-API request to a non-Codex backend (e.g.kr/claude-opus-4.7), each MCP server is declared as anamespacetool ({ type:"namespace", name, tools:[…] }). The Responses→Chat translator had nonamespacebranch, so the whole group collapsed into a single empty-schema function namedmcp__<server>__and every MCP call returnedunsupported call: mcp__<server>__, breaking all MCP-based workflows (context7, codegraph, custom MCPs) for that combination. The translator now expands a namespace into one Chat function per sub-tool (preserving each sub-tool's name and parameters); an empty namespace yields no tools instead of a broken placeholder. The native Codex passthrough path was already correct. (thanks @V13t4nh) - fix(cli): the active remote-context credential wins over an ambient
OMNIROUTE_API_KEY— when a remote context is selected, its scoped access token now takes precedence over anOMNIROUTE_API_KEYpresent in the environment, so the connected remote is targeted as expected. (#4364) - fix(cli): wire the
contextscommand into the CLI program — theomniroute contextscommand (list/switch saved remote contexts) was implemented but never registered, so it was unreachable; it is now wired into the CLI program. (#4369) - fix(mitm): mask bare
Bearer <token>header values in the Traffic Inspector — the inspector now redacts bareAuthorization: Bearer …values so tokens don't leak into captured traffic. (#4358) - fix(pricing): price the
gpt-5.x-proOpenAI models + align the opencode-go discovery test — adds pricing for the gpt-5.x-pro models so cost telemetry reports a real cost instead of zero. (#4355) - fix(sse): release the reader and cancel the stream on abort/error (no more Undici pool socket leak) — on abort or a mid-stream error the response reader is released and the stream cancelled, preventing leaked pooled sockets that degraded later requests. (#4309 — thanks @Ardem2025)
- fix(kiro): emit an early role-only start chunk to release the stream-readiness gate — Kiro streams now send an initial role-only chunk so the stream-readiness gate releases promptly instead of stalling. (#4311 — thanks @artickc)
- fix(dashboard): the proxy modal stops pre-filling new scopes with an unrelated proxy — adding a new scope assignment no longer inherits a previously-selected proxy's configuration. (#4312)
- fix(open-sse): inner-ai stops silently rerouting unmatched models to
models[0]— an unmatched model id is no longer silently served by the first available model; the lookup now returns null and the request is handled explicitly. (#4310) - fix(pollinations): handle auth-required premium models (claude, gemini, midjourney) — premium Pollinations models that require authentication are now handled correctly instead of failing. (#4266 — thanks @oyi77)
- fix(codex): isolate the Spark quota scope — Codex Spark usage is tracked under its own quota scope so it no longer bleeds into other Codex quotas. (#4293 — thanks @xz-dev)
- fix(dashboard): improve the API "try it" functionality — fixes the request path used by the dashboard's API "try it" panel. (#4296 — thanks @edrickrenan)
- fix: polyfill
crypto.randomUUIDfor non-secure contexts — restores UUID generation when the dashboard is served over a non-secure (plain-HTTP) origin wherecrypto.randomUUIDis unavailable. (#4287 — thanks @pizzav-xyz) - fix(proxy): allow concurrent proxy dispatcher streams — the proxy dispatcher no longer serializes streams, so concurrent requests through a proxied connection run in parallel. (#4288 — thanks @wilsonicdev)
- fix(build): co-locate llmlingua SLM optionals into
dist/node_modules(postinstall) — the optional llmlingua SLM packages are co-located into the standalone build so the compression worker can actually spawn in production. (#4286) - fix(mitm): surface AgentBridge traffic in the Traffic Inspector (D4 ingest) — AgentBridge requests now appear in the Traffic Inspector. (#4285)
- fix(sse): surface undici
err.causeon dispatcher failure — dispatcher failures now flatten the cause chain (andAggregateErrors) into the error detail for diagnosability. (#4281) - fix(cli): harden
launch/launch-codexwith free-claude-code patterns — the launchers adopt the hardened launch patterns ported from free-claude-code. (#4278) - fix(compression): end-to-end audit — fixes across the whole compression flow — a sweep of the compression pipeline fixing ultra/aggressive/lossless edge cases, accessibility-anchor handling, language detection, and mode decoupling. (#4323)
🧪 Tests
- test: align two tests left red by merged PRs — re-aligns the db-rules classification count (#4335) and the LMArena split-cookie metadata test (#4271) after concurrent merges. (#4346)
- test(ci): reconcile the release/v3.8.30 baseline + test drift — reconciles quality baselines and drifted tests accumulated on the release branch. (#4276)
📝 Maintenance
- refactor(combo):
ComboContext+ extractphaseComboSetup(god-file split, phase 1) — begins decomposing the combo god-file by extracting combo setup into a context object, without touching dispatch/semaphore logic. (#4326) - feat(quality): cap test-file size — anti-reinflation Layer 1 — freezes the existing god-tests and caps new test files at 800 lines to stop re-inflation. (#4273)
- feat(quality): seed per-module mutationScore floors + a blocking aggregation ratchet (T3) — adds per-module mutation-score floors with a blocking aggregate gate. (#4305)
- feat(quality): make the a11y gate real (
@axe-core/playwrightin nightly) — wires the previously-phantom accessibility gate into the nightly run with real baselines. (#4321) - feat(quality): unblock R1 — test-redundancy measurement via
disableBail— enables the test-redundancy measurement that was previously blocked by fail-fast. (#4322) - fix(quality): the complexity gate now covers
bin/+electron/, and tracked-artifacts runs in pre-commit — extends the complexity gate's scope and moves the tracked-artifacts check into the pre-commit hook. (#4318) - fix(quality): restore release/v3.8.30 green — 3 latent reds from concurrent merges — fixes three latent test reds surfaced by concurrent merges into the release branch. (#4335)
- fix(combo): keep
phaseComboSetupunder the complexity ceiling — extracts a helper so the new combo setup phase stays under the complexity gate. (#4338) - ci(mutation): split over-budget batches by range/pair so every batch fits the job cap — re-splits the mutation batches so each fits the CI job budget. (#4272)
- chore(ci): align the electron audit gate to the root advisory policy — the electron-workspace audit gate now follows the same advisory policy as the root. (#4275)
- chore(quality): reconcile the complexity/quality baselines across concurrent-merge drift — rolls up the cycle's baseline reconciliations driven by concurrent merges into the release branch. (#4330, #4336, #4370)
- docs: ban AI-generation footers in commits/PRs/CHANGELOG (Hard Rule #16) — codifies the prohibition on AI-generation footers and bot co-author trailers. (#4328)
- docs(design): add the OmniRoute design system and visual identity specification — adds the design-system / visual-identity specification document. (thanks @diegosouzapw)
🔒 Security
- fix(sse): harden the DuckDuckGo lite scraper sanitization (CodeQL) — closes four HIGH CodeQL alerts in the no-key web-search scraper:
decodeEntitiesnow resolves&last so an already-escaped entity (e.g.&lt;) survives as literal text instead of being double-unescaped (js/double-escaping);stripTagsdecodes entities first, then strips tags in a loop to a fixpoint and drops any trailing unclosed<…, so entity-encoded markup like<script>can never reach the LLM/client as a live tag (js/incomplete-multi-character-sanitization); and the host checks in the search tests usenew URL().hostnameequality instead of substring.includes(js/incomplete-url-substring-sanitization). (#4356)
🔧 Dependencies
- fix(deps): bump undici to 7.28.0 and dompurify to 3.4.11 (security) — addresses the undici SOCKS5-TLS / cache advisories and the dompurify advisory. (#4306)
- chore(deps): bump actions/checkout from 4 to 7 — CI checkout-action update. (#4297)
🙌 Contributors
Thanks to everyone whose work landed in v3.8.30 (external contributors first, maintainer last):
| Contributor | PRs / Issues |
|---|---|
| @oyi77 | #4266 |
| @pizzav-xyz | #4287 |
| @wilsonicdev | #4288 |
| @xz-dev | #4293 |
| @edrickrenan | #4296 |
| @Ardem2025 | #4309 |
| @artickc | #4311 |
| @Witroch4 | #4327 |
| @FerLuisxd | #4264, #4259, #3841 |
| @caussao | #4271 |
| @daniij | #4037 |
| @KooshaPari | #3932 |
| @dev-cj | #4307 |
| @klimadev | #4252 |
| @akbardwi | #4354 |
| @mrcyclo | #4325 |
| @anuragg-saxenaa | #4317 |
| @ngapngap | #4317 |
| @baslr | #4320 |
| @youthanh | #4339 |
| @V13t4nh | #4340 |
| @diegosouzapw | maintainer |
What's Changed
- fix(deps): bump undici to 7.28.0 and dompurify to 3.4.11 (security) by @diegosouzapw in #4304
- Release v3.8.30 by @diegosouzapw in #4267
Full Changelog: v3.8.29...v3.8.30