feat(routing): smart free-tier router with quota tracking + 429 failover#51
Merged
Conversation
Introduces the building blocks for the smart free-tier router: - freeTierConstants.ts: single source of truth for known free-tier RPD/RPM/TPM limits per provider, plus quality-rank ordering (Cerebras > Groq > Gemini > OpenRouter > Mistral > Cloudflare). - freeTierQuotaService.ts: in-process, storage-backed counter that rolls over per UTC day (RPD) and per minute (RPM/TPM), persists a versioned JSON blob under cortexide.freeTier.quotaState, and exposes recordCall / markExhausted / onQuotaChange. - freeTierLadder.ts: pure function that, given configured providers + privacy state + quota snapshots, returns a quality-ordered candidate list - skipping providers that are exhausted or out of quota. All three live in common/ with no DOM/Node/Electron dependencies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lRouter
- cortexideSettingsTypes: add RoutingPolicy union ('auto-cheapest' |
'free-tier' | 'local-only' | 'byok-paid') and a routingPolicy field
on GlobalSettings, defaulting to 'auto-cheapest' so existing users
see no behaviour change.
- modelRouter: inject IFreeTierQuotaService, branch on routingPolicy
early in route():
* 'free-tier' -> consult freeTierLadder before scoring; if the
ladder is empty, fall through to the existing
scoring path so the user is never stranded.
* 'local-only' -> hard-stop on cloud, route locally or abstain.
The agent loop (chatThreadService) is untouched.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wrap onFinalMessage and onError before the IPC channel registers them so every call attributable to a free-tier provider: - on success -> calls freeTierQuotaService.recordCall(providerId, modelName, estTokens) where estTokens is a chars/4 proxy until the provider SDKs surface real usage counts. - on 429 -> calls markExhausted(providerId, retryAt). We sniff the rate-limit shape via isRateLimitError() (status code, message text including '429' / 'rate limit' / 'resource_exhausted' / 'quota') and try to extract a real retry-at from headers or message text, falling back to a conservative 60s default in the service. Non-free-tier providers (Anthropic, OpenAI, Azure, etc.) pass through unchanged - no wrapping overhead. The hook lives in common/ rather than electron-main so the quota service stays inside common/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cy UI
- cortexideStatusBar: new IStatusbarEntry that hides itself when no
free-tier providers are configured, otherwise shows the highest-
quality provider's most-constrained metric (X/Y RPD or X/Y RPM, or
'exhausted'). Tooltip lists every configured free-tier provider's
status. Listens to IFreeTierQuotaService.onQuotaChange for real-time
updates plus a 15s slow tick for window rollovers.
- i18nService: add routing.* translation keys for the policy
dropdown and the status bar widget so every user-visible string
goes through t().
- Settings.tsx: 'Routing policy' select with the four policy options
inserted above the YOLO Mode section, persisting via the existing
setGlobalSetting('routingPolicy', ...) pathway.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the three contract guarantees stated in the PR spec plus three defensive cases: 1. privacy gate engaged -> ladder is empty regardless of inputs 2. exhausted (429) provider is skipped, next quality tier wins 3. when all providers have quota, highest qualityRank wins 4. zero remaining RPD removes a provider entirely 5. non-free-tier providers (anthropic, openAI, etc.) are ignored 6. empty configured list -> empty ladder Tests are pure - they exercise the buildFreeTierLadder() function directly with hand-constructed FreeTierRemaining snapshots, no service mocking required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements P0 #2 from
AGENTIC_REDESIGN_RESEARCH_2026-05-25.md(section 5, Phase 1).What it does
When a user has multiple free-tier API keys configured, CortexIDE today asks them to pick a provider manually and silently fails on 429. This PR:
Auto / Free-tier / Local-only / BYOK paid).Free-tier, the router consults a quality-ranked ladder and falls through to the existing scoring path if no free-tier provider is usable.sendLLMMessageServiceso successful calls increment quota counters and 429s mark the provider exhausted with a retry-at timestamp parsed from headers/message text.Quota constants (baked into
freeTierConstants.ts)Provider quality order: Cerebras > Groq > Gemini Flash > OpenRouter free > Mistral > Cloudflare.
The ladder
buildFreeTierLadder({ configuredModels, quotas, privacyMode }):privacyModeis true → return[](caller falls back to local).exhausted(429 within reset window).rpd <= 0orrpm <= 0remaining.qualityRankdesc.Pure function, testable in isolation, no service deps.
New + modified files
New
common/routing/freeTierConstants.ts— quota table + per-model overrides +freeTierIdOfProviderName()common/routing/freeTierQuotaService.ts—IFreeTierQuotaService+ storage-backed implcommon/routing/freeTierLadder.ts— purebuildFreeTierLadder()+pickTopFromLadder()test/common/freeTierLadder.test.ts— 6 unit tests covering privacy gate, exhaustion failover, quality ranking, zero quota, non-free-tier filtering, empty inputModified
common/modelRouter.ts— branch onroutingPolicy; call ladder forfree-tier; hard-stop on cloud forlocal-onlycommon/cortexideSettingsTypes.ts—RoutingPolicyunion +routingPolicy?global setting (defaults toauto-cheapestso existing users see no behaviour change)common/sendLLMMessageService.ts— wraponFinalMessage/onErrorto callrecordCall/markExhaustedcommon/i18n/i18nService.ts—routing.*translation keysbrowser/cortexideStatusBar.ts— new free-tier quota widget with tooltipbrowser/react/src/settings/Settings.tsx— routing-policy<select>above YOLO Mode sectionTest plan
npm run compile-check-ts-nativecleannpm run valid-layers-checkclean for new files (pre-existingIMainProcessServiceviolations incommon/sendLLMMessageService.tsetc. are unchanged)npm run buildreactcleannpm run test-node -- --run out/vs/workbench/contrib/cortexide/test/common/freeTierLadder.test.js→ 6/6 passingFree-tier, send 5 chats → status bar widget shows decrementing RPD$(warning) groq: exhausted, next chat auto-routes to GeminiLocal-onlywith no local models configured → abstain decision with clear reasoningExplicitly NOT done
cerebrasappears in the quota table withcortexProviderName: null, but adding the actual provider plumbing tomodelCapabilities.ts/cortexideSettingsTypes.ts/sendLLMMessage.impl.tsis out of scope per the spec. The ladder is wired so it will pick Cerebras up the day that provider is added.chatThreadService.tsis not touched. Hooks are atsendLLMMessageService.tsandmodelRouter.tsonly.recordCallusesoutput.length / 4as a TPM proxy. Real counts would require provider-specific accounting insendLLMMessage.impl.ts(future work).$(pulse),$(warning)) and the default tooltip. A designer pass would improve the at-a-glance scannability.byok-paidcurrently falls through to the standard scoring path - the "paid only" filter is left as future work since the heuristic for "paid" needs more thought (provider type? cost > 0?).localFirstAI+routingPolicyinteraction. Both settings exist independently; treating one as authoritative is a UX decision for a follow-up.🤖 Generated with Claude Code