Skip to content

feat(routing): smart free-tier router with quota tracking + 429 failover#51

Merged
Pterjudin merged 5 commits into
mainfrom
feat/free-tier-router-2026-05-25
May 25, 2026
Merged

feat(routing): smart free-tier router with quota tracking + 429 failover#51
Pterjudin merged 5 commits into
mainfrom
feat/free-tier-router-2026-05-25

Conversation

@Pterjudin
Copy link
Copy Markdown

Implements P0 #2 from AGENTIC_REDESIGN_RESEARCH_2026-05-25.md (section 5, Phase 1).

What it does

When a user has multiple free-tier API keys configured, CortexIDE today asks them to pick a provider manually and silently fails on 429. This PR:

  1. Tracks per-provider RPD / RPM / TPM in a storage-backed quota service.
  2. Adds a routing-policy setting (Auto / Free-tier / Local-only / BYOK paid).
  3. When the policy is Free-tier, the router consults a quality-ranked ladder and falls through to the existing scoring path if no free-tier provider is usable.
  4. Hooks sendLLMMessageService so successful calls increment quota counters and 429s mark the provider exhausted with a retry-at timestamp parsed from headers/message text.
  5. Adds a status-bar widget that hides when no free-tier provider is configured, otherwise shows the top provider's remaining quota with a multi-line tooltip listing every provider.

Quota constants (baked into freeTierConstants.ts)

Provider RPD RPM TPM Notes
Cerebras 30 1M tokens/day; 8K ctx cap
Groq 1000 30 6000
Gemini 2.5 Flash-Lite 1000 15 No card
Gemini 2.5 Flash 250 10 per-model override
Gemini 2.5 Pro 100 5 per-model override
Mistral Experiment 2 1B tokens/month
OpenRouter free 50 20 1000 RPD with $10 top-up
Cloudflare Workers AI 10,000 Neurons/day

Provider quality order: Cerebras > Groq > Gemini Flash > OpenRouter free > Mistral > Cloudflare.

The ladder

buildFreeTierLadder({ configuredModels, quotas, privacyMode }):

  1. If privacyMode is true → return [] (caller falls back to local).
  2. Drop providers not on the free-tier table.
  3. Drop providers marked exhausted (429 within reset window).
  4. Drop providers with rpd <= 0 or rpm <= 0 remaining.
  5. Sort the remainder by qualityRank desc.

Pure function, testable in isolation, no service deps.

New + modified files

New

  • common/routing/freeTierConstants.ts — quota table + per-model overrides + freeTierIdOfProviderName()
  • common/routing/freeTierQuotaService.tsIFreeTierQuotaService + storage-backed impl
  • common/routing/freeTierLadder.ts — pure buildFreeTierLadder() + pickTopFromLadder()
  • test/common/freeTierLadder.test.ts — 6 unit tests covering privacy gate, exhaustion failover, quality ranking, zero quota, non-free-tier filtering, empty input

Modified

  • common/modelRouter.ts — branch on routingPolicy; call ladder for free-tier; hard-stop on cloud for local-only
  • common/cortexideSettingsTypes.tsRoutingPolicy union + routingPolicy? global setting (defaults to auto-cheapest so existing users see no behaviour change)
  • common/sendLLMMessageService.ts — wrap onFinalMessage / onError to call recordCall / markExhausted
  • common/i18n/i18nService.tsrouting.* translation keys
  • browser/cortexideStatusBar.ts — new free-tier quota widget with tooltip
  • browser/react/src/settings/Settings.tsx — routing-policy <select> above YOLO Mode section

Test plan

  • npm run compile-check-ts-native clean
  • npm run valid-layers-check clean for new files (pre-existing IMainProcessService violations in common/sendLLMMessageService.ts etc. are unchanged)
  • npm run buildreact clean
  • npm run test-node -- --run out/vs/workbench/contrib/cortexide/test/common/freeTierLadder.test.js → 6/6 passing
  • Manual: configure Groq + Gemini free keys, set policy to Free-tier, send 5 chats → status bar widget shows decrementing RPD
  • Manual: trigger a 429 on Groq → status bar flips to $(warning) groq: exhausted, next chat auto-routes to Gemini
  • Manual: toggle privacy mode → ladder empties, falls back to local
  • Manual: kill the IDE, restart → quota counters persist (storage round-trip)
  • Manual: set policy to Local-only with no local models configured → abstain decision with clear reasoning

Explicitly NOT done

  • No Cerebras provider added. cerebras appears in the quota table with cortexProviderName: null, but adding the actual provider plumbing to modelCapabilities.ts / cortexideSettingsTypes.ts / sendLLMMessage.impl.ts is out of scope per the spec. The ladder is wired so it will pick Cerebras up the day that provider is added.
  • Agent loop unchanged. chatThreadService.ts is not touched. Hooks are at sendLLMMessageService.ts and modelRouter.ts only.
  • No new npm dependencies. Everything uses existing VS Code platform services.
  • Token counting is approximate. Until provider SDKs surface real usage counts, recordCall uses output.length / 4 as a TPM proxy. Real counts would require provider-specific accounting in sendLLMMessage.impl.ts (future work).
  • Status-bar widget styling is functional, not polished. Uses VS Code Codicons ($(pulse), $(warning)) and the default tooltip. A designer pass would improve the at-a-glance scannability.
  • No BYOK-paid ladder logic. Setting the policy to byok-paid currently falls through to the standard scoring path - the "paid only" filter is left as future work since the heuristic for "paid" needs more thought (provider type? cost > 0?).
  • localFirstAI + routingPolicy interaction. Both settings exist independently; treating one as authoritative is a UX decision for a follow-up.

🤖 Generated with Claude Code

Tajudeen and others added 5 commits May 25, 2026 04:45
Introduces the building blocks for the smart free-tier router:

- freeTierConstants.ts: single source of truth for known free-tier
  RPD/RPM/TPM limits per provider, plus quality-rank ordering
  (Cerebras > Groq > Gemini > OpenRouter > Mistral > Cloudflare).
- freeTierQuotaService.ts: in-process, storage-backed counter that
  rolls over per UTC day (RPD) and per minute (RPM/TPM), persists a
  versioned JSON blob under cortexide.freeTier.quotaState, and exposes
  recordCall / markExhausted / onQuotaChange.
- freeTierLadder.ts: pure function that, given configured providers +
  privacy state + quota snapshots, returns a quality-ordered candidate
  list - skipping providers that are exhausted or out of quota.

All three live in common/ with no DOM/Node/Electron dependencies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lRouter

- cortexideSettingsTypes: add RoutingPolicy union ('auto-cheapest' |
  'free-tier' | 'local-only' | 'byok-paid') and a routingPolicy field
  on GlobalSettings, defaulting to 'auto-cheapest' so existing users
  see no behaviour change.
- modelRouter: inject IFreeTierQuotaService, branch on routingPolicy
  early in route():
    * 'free-tier'  -> consult freeTierLadder before scoring; if the
                      ladder is empty, fall through to the existing
                      scoring path so the user is never stranded.
    * 'local-only' -> hard-stop on cloud, route locally or abstain.
  The agent loop (chatThreadService) is untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wrap onFinalMessage and onError before the IPC channel registers them
so every call attributable to a free-tier provider:

- on success -> calls freeTierQuotaService.recordCall(providerId,
  modelName, estTokens) where estTokens is a chars/4 proxy until the
  provider SDKs surface real usage counts.
- on 429    -> calls markExhausted(providerId, retryAt). We sniff the
  rate-limit shape via isRateLimitError() (status code, message text
  including '429' / 'rate limit' / 'resource_exhausted' / 'quota')
  and try to extract a real retry-at from headers or message text,
  falling back to a conservative 60s default in the service.

Non-free-tier providers (Anthropic, OpenAI, Azure, etc.) pass through
unchanged - no wrapping overhead. The hook lives in common/ rather
than electron-main so the quota service stays inside common/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cy UI

- cortexideStatusBar: new IStatusbarEntry that hides itself when no
  free-tier providers are configured, otherwise shows the highest-
  quality provider's most-constrained metric (X/Y RPD or X/Y RPM, or
  'exhausted'). Tooltip lists every configured free-tier provider's
  status. Listens to IFreeTierQuotaService.onQuotaChange for real-time
  updates plus a 15s slow tick for window rollovers.
- i18nService: add routing.* translation keys for the policy
  dropdown and the status bar widget so every user-visible string
  goes through t().
- Settings.tsx: 'Routing policy' select with the four policy options
  inserted above the YOLO Mode section, persisting via the existing
  setGlobalSetting('routingPolicy', ...) pathway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the three contract guarantees stated in the PR spec plus three
defensive cases:

1. privacy gate engaged -> ladder is empty regardless of inputs
2. exhausted (429) provider is skipped, next quality tier wins
3. when all providers have quota, highest qualityRank wins
4. zero remaining RPD removes a provider entirely
5. non-free-tier providers (anthropic, openAI, etc.) are ignored
6. empty configured list -> empty ladder

Tests are pure - they exercise the buildFreeTierLadder() function
directly with hand-constructed FreeTierRemaining snapshots, no
service mocking required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Pterjudin Pterjudin marked this pull request as ready for review May 25, 2026 03:51
@Pterjudin Pterjudin merged commit 0fc4015 into main May 25, 2026
12 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant