Skip to content

v3.8.44

Latest

Choose a tag to compare

@diegosouzapw diegosouzapw released this 04 Jul 16:26
1bda6c1

✨ New Features

  • feat(resilience): throttle upstream quota fetches on the per-request preflight path (#6009) — a new global min-interval gate (open-sse/services/quotaFetchThrottle.ts) spaces the actual network calls made by the Codex quota fetcher so that many accounts on one IP no longer fetch quota in the same second (which, per router-for-me/CLIProxyAPI#2385, can get a Codex OAuth token revoked). Complements the existing bulk-sync spacing (PROVIDER_LIMITS_SYNC_SPACING_MS) which already serialized the periodic provider-limits sync — this covers the concurrent combo/preflight path it didn't. Cache hits are never delayed; fail-open (only ever awaits a timer). Configurable via OMNIROUTE_QUOTA_FETCH_MIN_INTERVAL_MS (default 250ms, clamped 0..5000; 0 disables). Regression guard: tests/unit/quota-fetch-throttle-6009.test.ts (5). (thanks @powellnorma)
  • feat(autoCombo): add per-request Auto-Combo controls via two headers (#6024 / #6025 / #6023) — X-OmniRoute-Mode steers an auto combo's scoring for a single request (friendly presets fast/balanced/quality/cheap/reliable/offline or a raw mode-pack name; balanced forces the default weights), and X-OmniRoute-Budget sets a hard per-request USD cost ceiling. Both override the combo's stored config only for the request that carries them; unknown/garbage values are ignored so the saved config is preserved. The resolvers are pure (open-sse/services/autoCombo/requestControls.ts) and feed the engine's existing config.modePack / config.budgetCap inputs — no engine changes. Regression guard: tests/unit/auto-combo-request-controls-6024.test.ts (5). (thanks @chirag127)
  • feat(providers): add the Kenari OpenAI-compatible gateway (BYOK). Regression guard: tests/unit/kenari.test.ts. (thanks @doedja)
  • feat(models): add claude-sonnet-5 to the Antigravity model catalog (alias mapping in antigravityModelAliases.ts) (#6103). Regression guard: tests/unit/antigravity-model-aliases.test.ts. (thanks @anki1kr)
  • feat(api): add /v1/ocr endpoint (Mistral OCR), an OCR provider category, and Mistral moderation support. (#5950) (thanks @waguriagentic)
  • Discovery tool (Phase 2): add the discoveryResults DB module (CRUD over the discovery_results table, migration 074) and wire the opt-in provider-discovery service to persist and read findings through it (persistDiscoveryResult, getDiscoveryResults, getDiscoveryResultById, markVerified, deleteDiscoveryResult) with (provider, method, endpoint) upsert de-duplication. Adds the /api/discovery/* HTTP surface — GET /results, GET|DELETE /results/:id, POST /scan, POST /verify/:id — under strict loopback-only authorization (/api/discovery/ is in LOCAL_ONLY_API_PREFIXES and is NOT manage-scope-bypassable, so the scan route's outbound probes can never be reached from a tunnel/remote origin). Adds a dashboard UI tab (Tools → Discovery, /dashboard/discovery) to run scans and review, verify, or delete findings. The service stays opt-in / default-off. (#5939)
  • feat(api): expose a read-only provider plugin manifest at GET /api/v1/provider-plugin-manifest for sidecar/relay discovery. (#6001) (thanks @KooshaPari)
  • feat(sidecar): advertise the provider manifest URL to Bifrost/CLIProxyAPI via the X-OmniRoute-Provider-Manifest-Url header (OMNIROUTE_PROVIDER_MANIFEST_URL). (#6007) (thanks @KooshaPari)
  • feat(autoCombo): add a latency/speed-optimized routing mode (shared rankBySpeed scoring core) plus the omniroute_pick_fastest_model MCP tool. (#6011) (thanks @KooshaPari)
  • feat(providers): refresh The Old LLM (Free) model catalog (#5181) — seed the current free /api/chatgpt tier (GPT-5/5.1/5.2/5.3/5.4, o3/o4-mini, Gemini 3 Pro / 2.5 Pro / 2.0 Flash / 1.5 Flash, Claude 4.6 Opus/Sonnet & 4.5 Haiku, GPT-4o, Grok 4, DeepSeek V3/R1, Sonar Pro) while keeping the legacy alias IDs for saved-preference compatibility. Also fixes a latent routing bug: mapModel() now passes known upstream IDs through unchanged, so Gemini/o-series/Grok/DeepSeek/Sonar models no longer silently collapse onto GPT_5_4. Regression guard: tests/unit/theoldllm-model-refresh-5181.test.ts. (thanks @WslzGmzs)
  • feat(resilience): surface Codex banked reset credits per connected account (#5199) — the Codex quota parsers (buildCodexUsageQuotas, parseCodexUsageResponse) now additively read rate_limit_reset_credits.available_count (+ optional rate_limit_reached_type) from the /wham/usage payload OmniRoute already fetches, and the provider-limits dashboard renders a "Banked Reset Credits" row when a positive count is present. Display-only and fail-open — the field is eligibility-gated, so accounts without it are unaffected (parsers never throw on absent/garbage shapes); redemption (an unofficial mutating endpoint) is intentionally out of scope. Regression guard: tests/unit/codex-banked-reset-credits-5199.test.ts (8). (thanks @ofekbetzalel)
  • feat(providers): add sign-up geo-restriction notices for SenseNova and StepFun (#5462) — the provider add-form now warns that SenseNova's console appears to require a Chinese (+86) phone number with no documented international path, and that StepFun's default endpoint is its China platform while a global StepFun Open Platform (platform.stepfun.ai, operated by Sparkling AI Pte. Ltd., Singapore) with email/Google/Discord login exists for international users. Informational notice only — neither provider is disabled. Regression guard: tests/unit/regional-provider-cn-notices-5462.test.ts. (thanks @chirag127)
  • feat(usage): add on-demand period-scoped usage-data reset (Settings → System Storage) with a purge API and time-window selector. (#5831)
  • feat(claude-code): add an opt-in auto-permission classifier compat mode (off/auto/always) for Claude Code, toggleable from the CLI Code settings. (#5810)
  • feat(providers): add optional client-identity header profiles for compatible nodes — preset User-Agent/fingerprint headers (e.g. matching a known CLI) merged into the existing customHeaders field. (#5812)
  • feat(build): add a backend-only fast build mode (scripts/build/build-next-isolated.mjs + backendOnlyPages.mjs) that skips compiling the dashboard frontend pages, cutting local/CI build time for backend-only changes. (#6119 — thanks @artickc)
  • feat(minimax): extract MiniMax M3's raw <think>...</think> leakage into reasoning_content on the 8 OpenAI-format provider tiers, leaving the Claude-format minimax/minimax-cn tiers untouched (they already report reasoning correctly). (#6073 — thanks @KooshaPari)
  • feat(services): promote Bifrost (@maximhq/bifrost — Go AI-gateway) from an env-only relay sidecar to a first-class embedded/supervised service, matching the existing cliproxy/9router model — installer, bootstrap SERVICES[] entry, migration 113 DB seed, 7 lifecycle API routes under /api/services/bifrost/ (loopback-only), a dashboard tab, and relay auto-wiring that defaults BIFROST_BASE_URL to the supervised port when running. Implements item #2 of #5670; the broader RouterBackend contract (items #1, #3-#5) stays out of scope. (#5817, part of #5670)
  • feat(services): add Mux (coder/mux — local agent-orchestration daemon) as a fourth-tier embedded service on the existing ServiceSupervisor framework — npm-based installer, bootstrap.ts registration, migration 113 DB seed, 7 lifecycle API routes under /api/services/mux/ (loopback-only, defense-in-depth bind to 127.0.0.1), and a dashboard tab reusing the shared service-management components. (#6034)
  • feat(xai): surface Grok/xAI usage on the quota dashboard via local usageHistory aggregation (getXaiUsage) — since xAI exposes no per-account quota API, this sums tokens routed to the connection from usage_history and reports them as a cumulative, uncapped quota, mirroring the existing Xiaomi MiMo self-track pattern. (#5806)
  • feat(minimax): extract MiniMax M3's raw <think>...</think> tags into a separate reasoning_content field on the 8 provider tiers that register M3 with format:"openai" (trae, huggingchat, bazaarlink, ollama-cloud, opencode, cline, opencode-zen, codebuddy-cn) — previously the thinking text leaked directly into content. Reuses the existing extractThinkingFromContent primitive, extending its allowlist with a minimax-m3-only pattern; the two direct minimax/minimax-cn tiers are untouched since they already surface reasoning natively over Anthropic's Messages format. (Inspired by 9router#2231.) (#6050 — thanks @KooshaPari)
  • feat(i18n): auto-detect the browser language on first visit — a pure detectBrowserLocale() matcher (exact match, zh-HK/zh-MO folded to zh-TW, language-prefix match, else null) plus a client-only LocaleAutoDetect component mounted once in the root layout. When no locale cookie is set yet, it reads navigator.languages, computes a match against the supported locales, and persists it via the same cookie/localStorage writer LanguageSelector already used (extracted to shared/lib/persistLocale.ts). (Inspired by 9router#1324.) (#5979)
  • feat(cli-tools): add CodeWhale — the actively-maintained successor to DeepSeek TUI (same author, renamed project) — as a dual dashboard entry alongside the existing "deepseek-tui" catalog entry, so existing DeepSeek TUI users keep a working card while new users are steered to CodeWhale. New /api/cli-tools/codewhale-settings route writes ~/.codewhale/config.toml and keeps the legacy ~/.deepseek/config.toml in sync. (Inspired by 9router#1761.) (#5996)
  • feat(server): support reverse-proxy basePath deployment via a new opt-in OMNIROUTE_BASE_PATH env var (empty by default), using Next.js's native basePath support so a deployment behind a reverse-proxy subpath (e.g. https://host/omniroute/) works without manual header stripping; the two hardcoded auth-redirect targets in src/server/authz/pipeline.ts now prefix with request.nextUrl.basePath. Default empty basePath is a no-op for existing root-path deployments. (Inspired by 9router#1810.) (#5992)
  • feat(providers): add SumoPod (ai.sumopod.com) and X5Lab (api.x5lab.dev) OpenAI-compatible BYOK aggregator gateways, wired via the default executor with bearer API-key auth; both use passthroughModels with a live /v1/models fetcher instead of a hardcoded catalog. Regression guard: tests/unit/sumopod-x5lab-provider.test.ts. (Inspired by 9router#1288.) (#5963)
  • feat(providers): add Charm Hyper (hyper.charm.land) as a new OpenAI-compatible, bearer-auth API-key gateway provider with a free tier (100 monthly Hypercredits); models resolve via passthrough (modelsUrl + live /v1/models) since the catalog isn't publicly documented. (Inspired by 9router#2006.) (#5961)
  • feat(providers): add Nube.sh (ai.nube.sh) as a new BYOK OpenAI-compatible gateway (LiteLLM proxy), Bearer/API-key auth. Its live model catalog is only reachable with a valid key, so no model IDs are hardcoded — it uses passthroughModels + modelsUrl for live enumeration. (Inspired by 9router#2294.) (#5936 — thanks @whale9820)
  • feat(providers): add b.ai (api.b.ai) as a new OpenAI-compatible BYOK provider, distinct from the existing thebai/theb.ai provider, using passthrough model discovery with no hardcoded model list. (Inspired by 9router#963.) (#5969)
  • feat(providers): add Qiniu (七牛云) AI inference gateway as a BYOK API-key provider — proxies many upstream models (DeepSeek V3/V4, Claude, Kimi, and more) behind a single key, shipping with an empty static seed and relying on passthroughModels + the live /v1/models catalog instead of a stale hardcoded model id. Regression guard: tests/unit/qiniu-provider.test.ts. (Inspired by 9router#911.) (#5966)
  • feat(providers): port ModelScope (Alibaba 魔搭) as a new API-key, OpenAI-compatible provider — verified against ModelScope's own docs that the real production domain is api-inference.modelscope.cn (.cn, not the upstream PR's .ai) and shipped passthroughModels: true with an empty seed + modelsUrl instead of the upstream PR's static 5-model snapshot, since the open-model catalog moves fast. (Ported from 9router#1764.) (#5965 — thanks @tn5052)
  • feat(providers): add Augment (Auggie CLI) as a new local, no-auth provider that spawns the user's local auggie CLI and pipes a flattened prompt via stdin, wrapping stdout as an OpenAI-compatible SSE stream or single JSON body. Auth is delegated to auggie login outside OmniRoute (synthetic noAuth: true connection, no DB row required); "Test Connection" spawns auggie --version. Hardened against the untrusted-input spawn sink: no shell: true on Windows (argv passed straight to the OS loader, no metacharacter interpretation), and model is validated against the registry allowlist before spawn (rejecting unknown or --prefixed values) with a trailing -- end-of-options marker. (Inspired by 9router#1200.) (#5972 — thanks @chamdanilukman)
  • feat(providers): add NVIDIA NIM image generation — a dedicated nvidia-nim image format/handler (separate host, ai.api.nvidia.com/v1/genai/<model>, native NIM body shape) for the 4 FLUX models (flux.1-dev, flux.1-schnell, flux.1-kontext-dev, flux.2-klein-4b), shaping each model's per-model request body (dimension/mode validation, required input image + aspect ratio, optional edit image) and normalizing the NIM response's varying shapes into the OpenAI {created, data} shape. (Inspired by 9router#1195.) (#5971)
  • feat(oauth): import a Codex connection from a raw ChatGPT access token — OmniRoute's only Codex import path previously required both access_token and refresh_token, leaving no path for a user with only a bare ChatGPT website access token. createProviderConnection gains an explicit access_token auth-type branch (intentionally never deduped), a new POST /api/oauth/codex/import-token route (Zod-validated), and OAuthModal's manual-paste path now detects an eyJ-prefixed pasted token and posts it to the new endpoint, mirroring the existing grok-cli raw-token flow. The executor's refreshCredentials() already degrades safely to null without a refresh token, forcing re-auth on expiry. (Inspired by 9router#1290.) (#5995 — thanks @ryanngit)
  • feat(dashboard): add a tool-source diagnostics settings toggle — a new Settings → Advanced card lets operators flip the existing logToolSources flag from the UI instead of editing the DB row directly; logToolSources is added to the .strict() /api/settings Zod PATCH schema (previously rejected). (Inspired by 9router#1825.) (#5978 — thanks @DuyPrX)
  • feat(dashboard): collapse and sort provider quota rows by remaining percentage — the expanded quota list is sorted highest-remaining-first and collapsed to the first 3 rows by default, with a "Show N more"/"Show less" toggle when a connection reports more than 3 quotas, keeping at-risk quotas visible above a long list of healthy ones. Sort/slice logic extracted into pure, directly-unit-tested helpers (sortQuotasByRemaining, getVisibleQuotas). (Inspired by 9router#1919.) (#5977)
  • feat(dashboard): suggest HuggingFace Hub media models — a new GET /api/v1/providers/suggested-models route proxies the public HF Hub models search API (Zod-validated, no token exposed client-side) and ImageExampleCard merges the results into the model picker as a selectable chip row for the huggingface provider; also adds a dedicated huggingface-image format/handler for HF's raw-image-bytes response. (Inspired by 9router#1633.) (#5990)
  • feat(cli-tools): add a Crush entry to the dashboard CLI-Tools catalog plus a new /api/cli-tools/crush-settings route (GET/POST/DELETE) — OmniRoute already shipped a crush CLI setup command (bin/cli/commands/setup-crush.mjs) but the dashboard catalog had no matching entry; the new route writes to the same canonical ~/.config/crush/crush.json path so the dashboard and CLI command agree. (Inspired by 9router#1233.) (#5970)
  • feat(providers): extend Vercel AI Gateway (vercel-ai-gateway/vag) beyond chat-only to support embeddings and image generation — the gateway's OpenAI-compatible /v1 API also exposes /embeddings and /images/generations, so entries were added to EMBEDDING_PROVIDERS (embeddingRegistry.ts) and IMAGE_PROVIDERS (imageRegistry.ts) modeled on the existing openai entries. (#5968 — thanks @tantai-newnol)
  • feat(api-keys): add per-key device/connection tracking — a SHA-256 fingerprint of IP + User-Agent, with a 30-minute TTL and per-key/global caps, tracks distinct client devices seen with each API key (in-memory only, raw IP never stored). A new GET /api/keys/[id]/devices route exposes masked device details, and the API Keys dashboard tab gets a "Devices" count badge alongside the existing Sessions badge. This is a new granularity distinct from the existing maxSessions cap, which limits concurrent sticky-routing sessions rather than tracking device identity. (#5998 — thanks @mugni-rukita)
  • feat(proxy): add Webshare (proxy.webshare.io) as a fourth source in the free-proxy provider framework alongside 1proxy, Proxifly, and IPLocate. WebshareProvider paginates the account's /api/v2/proxy/list/ endpoint, upserts proxies into the shared free_proxies table, and tombstones proxies the account no longer lists while never touching rows already promoted into the live proxy pool. Unlike the other sources, Webshare is a paid per-account list, gated on FREE_PROXY_WEBSHARE_API_KEY. (#5993 — thanks @ricatix)
  • feat(antigravity): support custom Google Cloud project ID settings from the connection edit modal (Antigravity family). (#5905 — thanks @nickwizard)
  • feat(dashboard): add a wildcard-CORS runtime warning banner (Settings → Authorization) when CORS_ALLOW_ALL/* origins are in effect, plus a new docs/security/CORS.md security guide covering the risk and safer alternatives. (#5602, #5759)
  • feat(api): add a /v1/audio/translations endpoint (Whisper-style audio translation), a new audioTranslation handler, and translation providers wired into audioRegistry. Regression guard: tests/unit/audio-translations-route.test.ts (8, incl. no-stack-leak). (#5809)
  • feat(providers): allow a custom icon URL for compatible provider nodes (migration 113 + nodes.ts + Zod schema + API routes + catalog + ProviderIcon UI). Regression guards: 14 backend + 5 frontend(vitest) + 24 page-utils tests. (#5815)
  • feat(xai): register a dedicated XaiExecutor with reasoning-effort suffix parsing. Regression guard: tests/unit/executors/xai-executor.test.ts (6). (#5800)
  • feat(webfetch): support self-hosted FireCrawl instances via FIRECRAWL_BASE_URL/FIRECRAWL_TIMEOUT_MS. Regression guard: tests/unit/executors/firecrawl-fetch.test.ts (4). (#5793)
  • feat(providers): add ClinePass as a first-class API-key (BYOK) provider — Cline's paid gateway (cline-pass/* models, plain Bearer key), distinct from the existing OAuth cline provider. Regression guard: 16 clinepass tests. (#5942 — thanks @adentdk)
  • feat(relay): gate Bifrost auto-routing by the provider plugin manifest — only manifest-eligible providers reach the sidecar; ineligible/unknown providers fall back to the existing TS routing path with explicit reasons. Regression guards: 4 provider-plugin-manifest + 11 relay-routing-backend tests. (#5870 — thanks @KooshaPari)
  • feat(providers): wire Claude Sonnet 5 end-to-end across the model pipeline — registries, modelSpecs, pricing (×3), cost, Sonnet-family fallback, 1M-context, and static models. (#5833 — thanks @ggiak)

🔧 Bug Fixes

  • dashboard (/dashboard/system/proxy 500 on every render): ProxyRegistryManager called useProxyBatchOperations(load) before the const load = useCallback(...) declaration in the component body, so every server render threw a TDZ ReferenceError: Cannot access 'load' before initialization and the whole proxy page 500'd (#5918 regression, caught by the release-PR e2e smoke — the PR→release fast-gates never render pages). The hook block now sits after the load declaration. Regression guard: tests/unit/ui/ProxyRegistryManager-tdz-render.test.tsx (SSR renderToString — the exact crash mode).

  • server (TRACE/TRACK/CONNECT returned raw 500 on every route): methods that undici/fetch cannot represent blew up inside Next's middleware adapter (TypeError: 'TRACE' HTTP method is unsupported.) as an unhandled 500 (caught by the release-PR dast-smoke Schemathesis negative tests on the new /api/keys/{id}/devices endpoint). The raw HTTP method guard now answers a clean 405 + Allow header for these methods on any path, before Next sees the request. Regression guard: tests/unit/dast-method-not-allowed.test.ts (new case).

  • i18n (auto-detect refreshed every first visit): LocaleAutoDetect (#5979) called router.refresh() on every cookie-less first visit — even when the detected browser locale was exactly the one the server had just rendered — re-navigating the page mid-interaction (flaky e2e "execution context destroyed" + a visible flash for every new visitor). It now refreshes only when the detected locale differs from the server-rendered <html lang>. Regression guard: tests/unit/ui/LocaleAutoDetect-refresh.test.tsx.

  • models (oc/ alias must reach the no-auth OpenCode provider): restore the #2901 routing contract after the #5918 transitive-alias change made the registered no-auth opencode provider unreachable by any prefix (oc/ chained through the manual opencodeopencode-zen slug override and misrouted its combo entries). resolveProviderAlias now stops the alias chain as soon as a hop lands on a registered provider id, while keeping #5918's transitivity across alias-only hops and its loop/depth guards. Regression guards: tests/unit/combo-builder-opencode-prefix.test.ts, tests/unit/provider-alias-transitive-5918.test.ts.

  • providers (Auggie executor EPIPE crash): a fast-exiting auggie CLI (e.g. binary present but immediately failing) delivered EPIPE asynchronously as an 'error' event on the child's stdin stream — which a plain try/catch around stdin.write() cannot catch — crashing the request instead of surfacing the sanitized CLI error. Both spawn sites now attach a stdin 'error' handler so the child's own exit/close handlers report the failure. Regression guard: tests/unit/auggie-executor.test.ts (deterministic 3/3 locally).

  • dashboard (CoolingConnectionsPanel broke next build): the cooling-connections panel from #6061 imported Card from a shadcn-style path that does not exist in this repo (@/components/ui/card) and pulled the server DB barrel (@/lib/localDb) into a client component — next build failed to compile on the release branch. The panel now renders with repo-native markup and reads formatResetCountdown from the new client-safe src/shared/utils/formatting.ts. Regression guards: tests/unit/format-reset-countdown.test.ts, tests/unit/ui/CoolingConnectionsPanel.test.tsx. (#6155)

  • oauth (Zed "Unknown provider" crash): adding Zed from the providers dashboard threw an unhandled OAuth GET error: Unknown provider: zed (500) (#6041). Zed is a keychain-import-only provider — it's listed in the OAuth catalog so the UI shows it, but has no OAuth handler, so the generic /api/oauth/[provider]/[action] route hit getProvider("zed") and crashed. The route now recognizes keychain-import-only providers and returns a clear 400 pointing users at the Import button (for both GET and POST OAuth actions), instead of a 500. Regression guard: tests/unit/oauth-keychain-import-only-6041.test.ts. (thanks @imblowsnow)

  • fix(providers): disable the unsupported thinking param for minimax-m2.7 on NVIDIA NIM (the upstream rejects it) (#6102). Regression guard: tests/unit/nvidia-minimax-thinking-strip.test.ts. (thanks @anki1kr)

  • fix(mitm): add an in-process guard so concurrent MITM server starts no longer race — a second start while one is already in flight is short-circuited instead of double-binding the listener (#6107). Regression guard: tests/unit/mitm-start-guard.test.ts. (thanks @anki1kr)

  • translator (Responses → Chat Completions): strip the Responses-API-only truncation field before forwarding a /v1/responses request to a non-OpenAI Chat Completions upstream (#6109). Strict upstreams (e.g. NVIDIA NIM) rejected it with HTTP 400 Unsupported parameter(s): truncation, breaking Codex-style clients routed to those providers. client_metadata, background, and safety_identifier were already stripped — truncation was the remaining gap. Regression guard: tests/unit/responses-strip-truncation-2311.test.ts. (thanks @TuanNguyen0708)

  • combo (prefer known context capacity over unknown): when a combo filters out at least one target for exceeding a known context limit, the router now prefers the remaining known-compatible targets over targets whose context metadata is simply unknown, instead of letting unknown-metadata targets be the only survivors. If no known-compatible context target remains, context-only candidates fall back to the normal strategy order. Regression guard: tests/unit/combo-context-window-filter.test.ts. (#6088 — thanks @Thinkscape)

  • models (GLM-5.2 context normalization): stop treating every hosted GLM-5.2 provider alias as the native 1M-context model. Native/bare GLM-5.2 and verified OpenCode / ZenMux routes keep their 1,000,000-token context, while hosted-provider aliases now respect the caps declared in their provider metadata instead of inheriting the native max. Regression guards: tests/unit/model-capabilities-registry.test.ts, tests/unit/models-catalog-route.test.ts. (#6091 — thanks @Thinkscape)

  • providers (Gemini Web): refresh the Gemini Web cookie handling and model catalog so live Gemini Web sessions keep authenticating and routing to current models. Regression guard: tests/unit/gemini-web.test.ts. (#6095 — thanks @backryun)

  • providers (Perplexity Web): refresh the Perplexity Web model catalog to the current set (GPT-5.4/5.5, Claude Sonnet 5.0 / Opus 4.8, GLM-5.2, Kimi K2.6, Nemotron 3 Ultra) and update the internal mode / model_preference mappings and thinking variants so requests resolve to live upstream models. Regression guard: tests/unit/perplexity-web.test.ts. (#6106 — thanks @backryun)

  • dashboard ("Update now" → Internal Server Error): clicking Update now on the dashboard home could crash the page with a blank "Internal Server Error" screen (Minified React error #31). The handler POSTs the loopback-only /api/system/version auto-update endpoint and, on a non-OK JSON response (e.g. a 403 when the dashboard is reached through a reverse proxy / non-loopback origin), passed the raw error envelope object { error: { code, message, correlation_id } } straight to notify.error(), which rendered the object as a React child and threw #31. The update-error path now funnels the body through extractApiErrorMessage() (the same safe extractor added in #5340), so a readable string always reaches the toast. Regression guard: tests/unit/ui/home-update-error-render-5991.test.ts. (#5991)

  • fix(onboarding): route the provider-details link in the onboarding wizard by the node's stable id instead of the composite provider slug, which could point at the wrong provider details page for multi-account/fingerprint nodes. Regression guard: tests/unit/onboarding-wizard-details-link-6145.test.ts. (#6145 — thanks @chirag127)

  • fix(cli): give setup-claude a fallback profile generator mirroring setup-codex, so profile generation no longer silently no-ops when the primary generator path is unavailable. Regression guard: tests/unit/cli/setup-claude.test.ts (new cases). (#6138 — thanks @derhornspieler)

  • fix(glm): suppress a leaked </think> close marker in the GLM Anthropic transport, which was surfacing the raw reasoning-close tag in visible response content instead of being consumed as part of the thinking-block framing. Regression guard: tests/unit/glm-think-close-marker-leak.test.ts. (#6133 — thanks @dhaern)

  • fix(provider-limits): close a TOCTOU race in quota-recovery clearing by moving the check-then-clear to a CAS (compare-and-swap) primitive in src/lib/db/providers.ts, so two concurrent recovery paths can no longer both observe stale state and double-clear/re-lock a connection. Regression guard: tests/unit/provider-limits-recovery.test.ts. (#6139 — thanks @janeza2)

  • fix(provider-limits): clear transient rate-limit state (rateLimitedUntil, lastError, backoffLevel) as soon as quota recovers, instead of leaving stale rate-limit fields behind that could keep a now-healthy connection looking unavailable. Regression guard: tests/unit/provider-limits-recovery.test.ts. (#6128 — thanks @janeza2)

  • combos (OpenCode/MiMo fingerprint accounts): expand fingerprint-scoped OpenCode/MiMo accounts into their full per-fingerprint set in the combo builder, which previously showed only the first matching account entry and hid the rest from combo target selection. Regression guard: tests/unit/combo-builder-fingerprint-expansion.test.ts. (#6092, closes #6087 — thanks @anki1kr)

  • fix(auth): persist quota-preflight account lockouts until the reset window elapses, instead of losing the lockout on process restart and letting a still-quota-exhausted account be selected again immediately. Regression guards: tests/unit/sse-auth.test.ts, tests/unit/opencode-quota-fetcher.test.ts, tests/unit/usage-service-hardening.test.ts. (#6090 — thanks @Thinkscape)

  • combo (fingerprint-based provider expansion): expand fingerprint-based providers into per-fingerprint combo targets (open-sse/services/combo/fingerprintExpansion.ts) so a combo referencing a fingerprint-scoped provider fans out to every matching fingerprint account instead of collapsing onto one. Regression guards: tests/unit/combo-fingerprint-expansion.test.ts, tests/integration/fingerprint-expansion.test.ts. (#6082 — thanks @pizzav-xyz)

  • fix (safety-net redirect reqId crash): fix a reqId ReferenceError thrown inside the safety-net combo redirect path in src/sse/handlers/chat.ts, remove dead code in src/domain/quotaCache.ts, and rename the stray root DESING.md to DESIGN.md. Regression guard: tests/unit/chat-safetynet-reqid-6097.test.ts. (#6097 — thanks @fix2015)

  • fix(compression): send a patch-only body to PUT /api/settings/compression from CompressionHub, instead of round-tripping the full settings object and risking clobbering fields changed elsewhere between load and save. Regression guard: tests/unit/ui/CompressionHub-patch-only.test.tsx. (#6077, closes #6039 — thanks @anki1kr)

  • fix(codex): use access_token.exp instead of id_token.exp when computing expiresAt on Codex auth import, since the id_token can expire far sooner than the actual access token, causing imported connections to be treated as expired while still usable. Regression guard: tests/unit/codex-auth-import-expiry.test.ts. (#6084, closes #6075 — thanks @anki1kr)

  • fix(security): persist the IP allow/block-list configuration (it was resetting to Disabled and clearing configured IPs on every restart/update) and actually enforce it in the authz pipeline (src/server/authz/pipeline.ts), where it was previously validated but never applied. Regression guards: tests/unit/ip-filter-persistence-6131.test.ts, tests/unit/authz/ip-filter-enforcement-6131.test.ts, tests/unit/ip-filter.test.ts. (closes #6131, #6132)

  • fix (Claude tool_result adjacency): reattach an OpenAI-shaped tool_result to sit directly adjacent to its originating tool_use before translating to Claude's message format (open-sse/translator/request/openai-to-claude/toolResultAdjacency.ts), since Claude's API rejects/mishandles a tool result separated from its tool call by intervening messages. Regression guard: tests/unit/translator-openai-to-claude.test.ts (new cases). (#6035 — thanks @KooshaPari)

  • fix(config): externalize ws/bufferutil/utf-8-validate in next.config.mjs so the copilot-m365-web executor's WebSocket masking path works at runtime — chat requests through it were silently timing out because the bundler was inlining ws instead of leaving it as a real Node dependency. Regression guard: tests/unit/next-config.test.ts. (#6130, closes #6062 — thanks @anki1kr, whose #6098 fix it re-lands)

  • fix(registry): update grok-cli model context lengths to match the actual Grok CLI /context capacities — grok-build 128k→256k, grok-composer-2.5-fast 128k→200k — so context-aware routing stops filtering these models out for exceeding a stale, too-low limit. Registry-only. (#5913 — thanks @Chewji9875)

  • fix(providers): strip an orphan tool_result (one with no preceding tool_use) on the Antigravity MITM path before translating to OpenAI format, since an unpaired tool result upstream caused request failures. Regression guard: tests/unit/antigravity-orphan-toolresult-6026.test.ts. (closes #6026, #6115)

  • fix(providers): emulate OpenAI-style tool_calls in the GitLab Duo executor (new open-sse/executors/gitlabResponses.ts), since the executor previously didn't emulate tool-call semantics for Duo, breaking tool-using clients routed to GitLab Duo. Regression guard: tests/unit/gitlab-duo-toolcalls-6051.test.ts. (closes #6051, #6111)

  • fix(429 / accountFallback): persist the per-account 429 cooldown cascade across the request boundary and classify OpenCode's "Monthly usage limit. Resets in N days." message as a connection-scoped quota exhaustion with an N-day cooldown (instead of a ~5s transient retry), so an exhausted account stops being re-selected until its window resets. (#6061 — thanks @KooshaPari / @anki1kr, whose superseded #6086 carried the same day-parser approach)

  • combo (sibling-model fallback on per-model-quota 500s): when a combo held multiple models from the same provider (e.g. two Gemini models) and the first returned a server 500, the router retried the same locked model and surfaced a 429 "cooling down" instead of trying the sibling — markConnectionLevelExhaustion was wrongly tripped by a model-level 500 for per-model-quota providers (gemini, github, passthrough, compatible), and the retry loop didn't check isModelLocked before re-hitting the same model. Both gaps are fixed; the combo now falls through to the untried sibling model. Regression guard: tests/unit/combo/combo-target-exhaustion.test.ts (21 cases). (#5976 — thanks @hartmark)

  • providers (Cline non-streaming envelope): Cline can return OpenAI-compatible chat completions wrapped as { success, data: { choices, usage, ... } }; the non-streaming path checked the top-level body for empty content before unwrapping, so a valid wrapped response could be misclassified as malformed/empty. The envelope is now unwrapped immediately after provider-envelope handling, before empty-content detection, usage extraction, and translation. Regression guard: tests/unit/cline-response-envelope.test.ts. (#6046 — thanks @KooshaPari)

  • providers (kimi-web, qwen-web): align the kimi-web model catalog and request-scenario selection with www.kimi.com's live GetAvailableModels response, and stop aliasing qwen3-coder-plus on qwen-web now that it is present as its own model in the live Qwen web catalog. (#5915 — thanks @janeza2)

  • translator (Antigravity/Gemini tool schemas): strip multipleOf from function-declaration parameters before forwarding to Antigravity/Gemini — it is not part of the Gemini OpenAPI 3.0 schema subset accepted upstream and triggered a hard 400 ("Unknown name multipleOf"). Added to GEMINI_UNSUPPORTED_SCHEMA_KEYS so it is stripped at every schema level; minimum/maximum are unaffected since Gemini accepts them. (Ported from 9router#2309, reported by @abil0321.) (#6052)

  • translator (Kiro system prompt leak): Kiro/CodeWhisperer has no system role, so system messages were normalized into a bare user turn — the full Claude Code system prompt then appeared as raw user text, polluting model context. System-origin content is now wrapped in <system-reminder> tags before merging into the Kiro user message; real user turns are unaffected. (Ported from 9router#2306, reported by @VitzS7.) (#6053)

  • fix(codex): convert Chat Completions json_schema response_format → Responses API text.format on the Codex path, and preserve an existing text.format through verbosity normalization. Regression guards: 48 translator-openai-responses-req + 8 codex-verbosity tests. (#5933 — thanks @yusufrahadika)

  • fix(thinking): only inject the redacted_thinking replay block when tool_use is present and thinking is enabled, avoiding a fabricated replay block on plain (non-tool) turns. (#5945, #5953)

  • fix(resilience): honor active codex session affinity over per-request reset-aware re-scoring, so an in-flight session sticks to its pinned account instead of being re-scored away mid-conversation. New src/sse/services/sessionAffinityPin.ts module. Regression guard: tests/unit/codex-session-affinity-reset-aware-5903.test.ts. (#5903, #5943)

  • fix(resilience): compute per-window is_exhausted and honor the quota-exhaustion preflight for priority combos, so a combo no longer keeps routing to a target whose current window is already exhausted. New open-sse/services/combo/quotaExhaustionCutoff.ts. Regression guard: tests/unit/combo-priority-quota-exhaustion-cutoff-5923.test.ts. (#5923, #5941)

  • fix(providers): strip a /v1 suffix from the base URL unconditionally in both models-discovery paths, avoiding a doubled /v1/v1/models fetch error (e.g. Api Airforce). Regression guard: tests/unit/airforce-v1-double-prefix-5899.test.ts. (#5899, #5920 — thanks @anki1kr)

  • fix(api): relax provider-scoped chat completion validation on /api/providers/[provider]/chat/completions. Regression guard: tests/unit/provider-scoped-chat-completions-validation.test.ts. (#5907 — thanks @nickwizard)

  • fix(providers): validate v0 Platform (Vercel) API keys via the /chats endpoint instead of a probe that rejected valid keys. Regression guard: tests/unit/provider-validation-specialty.test.ts. (#5954 — thanks @vittoroliveira-dev)

  • fix(mcp): auto-recover stale streamable HTTP MCP sessions on initialize instead of failing the reconnect. Regression guard: tests/unit/mcp-session-sweep.test.ts. (#5957 — thanks @Chewji9875)

  • fix(translator): enforce strict Anthropic content-block compliance when converting an antigravity → openai request. Regression guard: tests/unit/translator-antigravity-to-openai.test.ts (9). (#5935)

  • fix(sse): strip ANSI/VT100 escape codes from gemini-cli stream frames using a ReDoS-safe pattern. Regression guard: tests/unit/gemini-cli-ansi-sanitization.test.ts (5). (#5934 — thanks @anki1kr)

  • fix(discovery): resolve a doubled /v1 discovery path and a REDIRECT_BLOCKED probe-loop abort in the model-discovery route. Regression guard: tests/unit/provider-models-route.test.ts. (#5904 — thanks @hamsa0x7)

  • fix(providers): Perplexity Web now emits real tool_calls in streaming mode — previously only non-streaming requests (hasTools && !stream) converted <tool>{...}</tool> text into OpenAI tool_calls; streaming requests (the default for agentic coding clients) got the raw <tool> text as plain delta.content and never emitted a tool_calls SSE delta. Now mirrors the chatgpt-web toolMode helpers (buildToolModeResponse()/toolCompletionToSseStream(), extended with a caller-supplied idSeed so tool-call ids stay provider-specific), buffering the completion and emitting a terminal SSE replay carrying delta.tool_calls + finish_reason: tool_calls regardless of the caller's stream flag. (#5927, #5937)

  • providers (openai-family model inference no longer hijacks cataloged models): resolveModelByProviderInference() had an unconditional /^gpt-/i heuristic that hijacked any model id starting with gpt-/o1/o3 into provider openai, even when the id is cataloged under other providers — breaking bare (non-combo) requests for open-weight models like gpt-oss-120b (served by fireworks/cerebras/scaleway/byteplus/sambanova/heroku), which don't exist on openai's catalog, producing a 404 with no fallback. The heuristic is now gated on providers.length === 0 so it only fires for genuinely uncataloged openai-family ids. Regression guard: tests/unit/gptoss-provider-inference-5852.test.ts. (#5852, #5938)

  • fix(providers): deepseek-web reliability — auto-refresh the session on 401/403, refresh the v2.0.0 client headers, and fix the token-kind bulk import path. Regression guards: tests/unit/deepseek-web-autorefresh-401-response.test.ts, tests/unit/bulk-web-session-import.test.ts. (#5988 — thanks @backryun)

  • fix(api): guard the shared frontend API client (handleResponse in src/shared/utils/api.ts) against non-JSON error responses — it previously called response.json() unconditionally and read data.error directly, throwing an unrelated parse error (or undefined) instead of a useful message when an upstream/proxy returned a non-JSON error body. Now routes through parseResponseBody/getErrorMessage to build a safe message regardless of body shape. Regression guard: tests/unit/shared-api-utils.test.ts. (#5973)

  • fix(embeddings): forward the connection-level proxy configuration to embedding requests — src/lib/embeddings/service.ts previously ignored a connection's configured proxy when making embedding calls, so proxy-only network setups leaked embedding traffic outside the proxy. Regression guard: tests/unit/embeddings-proxy-forwarding.test.ts. (#5975)

  • fix(resilience): parse Retry-After from a 429's JSON body for cooldown calculation, not just the HTTP header — a new retryAfterJson.ts helper extracts a retry-after hint from common JSON error-body shapes and accountFallback.ts's cooldown path now prefers it when the header is absent. Regression guard: tests/unit/account-fallback-retry-after-json.test.ts. (Includes #6013's retry-after-json extraction.) (#5974 — thanks @KooshaPari)

📝 Maintenance

  • release close (release-PR one-pass CI sweep): restore Zod validation on the provider-scoped chat route with a .passthrough() schema that keeps #5907's relaxed semantics (t06 route-validation gate); point /api/keys/{id}/devices' 401 response at the management error envelope in docs/openapi.yaml (Schemathesis schema-conformance); rebaseline i18nUiCoverage.pct 77.5→76.8 (~1352 new en.json UI keys from the cycle await the async translation workflow — same shape as the v3.8.39 rebaseline); dismiss 2 CodeQL js/incomplete-url-substring-sanitization false positives on unit-test asserts (v3.8.35 precedent).

  • release close (Phase 0 pre-flight): align cycle-stale tests with merged behavior — provider count 166→167 (Kenari #6104), Linux-regenerated translate-path golden (+kenari), OpenCode quota scope providerconnection (#6061) — and absorb cycle ratchet drift (file-size caps for oauth/[provider]/[action]/route.ts 960, providerLimits.ts 998, chat.ts 1662, auth.ts 2426, with #6158 tracked to restore the oauth-route freeze). The test-masking gate gains a narrowly-scoped _deletedWithReplacement allowlist section (deletion is exempt ONLY when the declared replacement test file exists in HEAD — used for targetExhaustion.test.tstests/unit/combo/combo-target-exhaustion.test.ts, which has MORE coverage: 21 cases/52 asserts vs 13/37), plus 5 new gate unit tests and reduction-allowlist entries for the verified-legitimate #5958/#6088/#5816 assert migrations.

  • test (deflake setup-claude): tests/unit/cli/setup-claude.test.ts failed ~50% of runs with Unable to deserialize cloned data due to invalid or unsupported version at file teardown (all subtests passed), randomly reddening Unit Tests fast-path (2/2) / Fast Quality Gates across the PR→release queue. Root cause: node --test streams each file's report to the parent as V8-serialized frames on fd 1 (stdout), and the CLI helper under test (syncClaudeProfilesFromModels) prints progress via console.log — that stdout output interleaved with the serialized frames and corrupted the stream. The test now silences the stdout-writing console methods for the file's duration (no assertion inspects stdout), making it deterministic (15/15 green locally). (#5959) (#6021)

  • API validation: add a validatedJsonBody(request, schema) helper in src/shared/validation/helpers.ts that fuses JSON body parsing and Zod validation into a single call, returning either the type-narrowed data or a ready-to-return 400 NextResponse with the standard error envelope. Salvaged from the closed refactor PR #5075 (Tier 1 portable helper) with a focused 6-case regression test. Co-authored-by: KooshaPari KooshaPari@users.noreply.github.com

  • repo (Windows case-conflict cleanup): remove the stale root DESIGN.md, which case-conflicted with design.md and broke checkouts/clones on case-insensitive Windows filesystems. (#6140 — thanks @backryun)

  • i18n(zh-CN): translate the CHANGELOG entries and section headings, adopting zh-CN as a fully translated locale alongside the existing supporting docs. (#6043 — thanks @studyzy)

  • docs (env-doc-sync base-red): document BIFROST_PORT in .env.example / docs/reference/ENVIRONMENT.md — the Bifrost embedded-service merge referenced process.env.BIFROST_PORT (default 8080) without documenting it, so check:env-doc-sync failed on the release tip and reddened Fast Quality Gates for every open PR→release. Docs-only (8d7e3e28f).

  • test (CI-runner-independent translate-path golden): normalize OS/arch-derived request headers (X-Stainless-Os/X-Stainless-Arch, (OS;arch) User-Agent segments, and Antigravity's os.platform()-derived platform substring) in the provider translate-path golden snapshot, so the test no longer depends on the OS/arch of the CI runner that generated it — a Mac-literal Antigravity UA was failing on Linux CI. Regression guard: tests/unit/provider-translate-path-golden.test.ts. (#6076 — thanks @KooshaPari)

  • release-green base-reds (#5695 regex + file-size rebaseline): tests/unit/ui/quick-start-api-keys-link-5695.test.ts now tolerates Prettier splitting a multi-line <Link href=...> so the step1Desc regex matches the /dashboard/api-manager link instead of skipping to step2's single-line /dashboard/providers link (test was brittle, not the code). Also rebaselines 5 files that grew via already-merged release-tip PRs in config/quality/file-size-baseline.json (ApiManagerPageClient 3017→3058, OAuthModal 969→989, cliRuntime 1090→1100, webProvidersA 805→809, deepseek-web.test 1081→1092), with shrink tracked in #3501. (#6093)

  • release close (LEDGER-4 base-red): the cline-pass provider's minimax-m3 registry entry was missing supportsVision, breaking the LEDGER-4 registry-consistency test (every minimax-m3 entry must set supportsVision to match lite.ts — the model is multimodal). Flagged it to match every other minimax-m3 entry (trae, bazaarlink, cline, ollama-cloud, ...). (#6003)

  • release close (stryker tap.testFiles drift): additional release-green cleanup clearing the qoder registry's minimax-m3 supportsVision LEDGER-4 base-red and stryker.conf.json's tap.testFiles drift. (#6012)

  • install (pnpm 11+ support): pnpm 11 introduced ERR_PNPM_IGNORED_BUILDS for native addon packages — without explicit allowBuilds approval, packages silently skip their build scripts and OmniRoute fails to start with missing native modules. Sets allowBuilds=true for all 13 native addon packages in pnpm-workspace.yaml (@parcel/watcher, @swc/core, better-sqlite3, core-js, esbuild, keytar, koffi, libxmljs2, onnxruntime-node, protobufjs, sharp, tls-client-node, unrs-resolver) and migrates onlyBuiltDependencies from the deprecated package.json field to a new pnpm.json. (commit 39349da — thanks @chirag127)

  • refactor (Block J hot-path decomposition): extract pure leaves with no behavior change from the executor, translator, combo, and SSE hot paths — orphaned executor tests moved to top-level so a runner collects them, and handleComboChat's auto-strategy/target-timeout regions split into named helpers. (#6063, #6049, #6036, #6030, #6020, #6018, #6017, #6016, #6015, #6014, #6008, #6006, #6000, #5999, #5994, #5967, #5962, #5960, #5947, #5949, #5940, #5932)

  • chore (quality/CI housekeeping): rebaseline residual ESLint/cognitive-complexity/file-size drift accumulated over the v3.8.44 cycle, move orphaned executor tests to a top-level location so a runner actually collects them, harden the release pipeline with a test-masking pre-flight gate plus contributors/uncovered helpers, and make the pr-evidence FAIL output tell the author to push (a body edit alone does not re-run the gate). (#5926, #5944, #5952, #6027, #5928, plus a #5975-collateral test hardening pinning a seeded connection to direct egress in route-edge-coverage)

  • docs (housekeeping): normalize mixed-language documentation content, restore the OpenAPI coverage ratchet by documenting 9 newly-added routes, record Hard Rule #22 (cross-session safety — git stash + in-flight PR bans), and document the compression-engine's upstream sync policy for the RTK/Caveman engines. (#6105, #5955, #5948, plus docs-only commit 926b08a)

🙌 Contributors

Thanks to everyone whose work landed in v3.8.44:

Contributor PRs / Issues
@adentdk #5942
@anki1kr #5899, #5920, #5934, #6039, #6061, #6062, #6075, #6077, #6084, #6086, #6087, #6092, #6098, #6130
@artickc #6119
@backryun #5988, #6095, #6106, #6140
@chamdanilukman #5972
@Chewji9875 #5913, #5957
@chirag127 #6145
@derhornspieler #6138
@dhaern #6133
@doedja direct commit / report
@DuyPrX #5978
@fix2015 #6097
@ggiak #5833
@hamsa0x7 #5904
@hartmark #5976
@imblowsnow direct commit / report
@janeza2 #5915, #6128, #6139
@KooshaPari #5870, #5974, #6035, #6046, #6050, #6061, #6073, #6076, #6086
@mugni-rukita #5998
@nickwizard #5905, #5907
@ofekbetzalel direct commit / report
@pizzav-xyz #6082
@powellnorma direct commit / report
@ricatix #5993
@ryanngit #5995
@studyzy #6043
@tantai-newnol #5968
@Thinkscape #6088, #6090, #6091
@tn5052 #5965
@TuanNguyen0708 direct commit / report
@vittoroliveira-dev #5954
@waguriagentic direct commit / report
@whale9820 #5936
@WslzGmzs direct commit / report
@yusufrahadika #5933
@diegosouzapw maintainer

What's Changed

  • test(security): fix CodeQL #689 — Kimi Web URL host substring sanitization (main) by @diegosouzapw in #6048
  • fix(config): externalize 'ws' to fix copilot-m365-web chat timeout by @anki1kr in #6098
  • chore(deps): bump github/codeql-action/init from 4.36.2 to 4.36.3 by @dependabot[bot] in #6123
  • chore(deps): bump actions/cache from 6.0.0 to 6.1.0 by @dependabot[bot] in #6124
  • chore(deps): bump github/codeql-action/analyze from 4.36.2 to 4.36.3 by @dependabot[bot] in #6125
  • Release v3.8.44 by @diegosouzapw in #5925

Full Changelog: v3.8.43...v3.8.44