refactor(presets): max ultra-cheap free tiers + correct rate limits#300
Merged
Destynova2 merged 1 commit intomainfrom Apr 27, 2026
Merged
refactor(presets): max ultra-cheap free tiers + correct rate limits#300Destynova2 merged 1 commit intomainfrom
Destynova2 merged 1 commit intomainfrom
Conversation
Verified-against-source-docs rewrite (April 2026) of the ultra-cheap
preset to maximize free-tier coverage and correct earlier wrong figures.
Corrected rate limits (was wrong in previous version):
Groq : 30 RPM, **1 000 RPD** (NOT 14 400 — old number),
6K TPM general (15K for Gemma 2 9B). No training.
Cerebras : 30 RPM, 1M tokens/day, 60-100K TPM,
**8 192 token context cap on free tier**.
No training.
OpenRouter: 50 RPD base / 1 000 RPD after lifetime $10 deposit,
20 RPM on `:free` variants.
Sources verified 2026-04-27:
https://console.groq.com/docs/rate-limits
https://inference-docs.cerebras.ai/support/rate-limits
https://openrouter.ai/docs/api/reference/limits
Routing changes vs previous ultra-cheap:
- Trivial / background now hit **Cerebras Llama 3.1 8B free** as
priority 1 (1M tok/day, fastest inference). Cerebras's 8K input
cap is naturally accommodated since trivial tier matches
max_tokens_below = 500 and short-prompt background tasks fit too.
- Default now leads with **Groq Llama 3.3 70B free** (was
gpt-oss-120b which has tighter free-tier RPM). Llama 3.3 70B
fits the 30 RPM / 6K TPM general limit and offers 131K native
context — fits Claude Code's typical agentic prompts.
- Default p2 added **OpenRouter `deepseek-chat-v3:free`** (third
free tier). Lift from 50 to 1 000 RPD requires one-time $10
OpenRouter deposit.
- Think p1 = **OpenRouter `deepseek-r1:free`** (free reasoning).
Falls back to paid R1 only when free quota saturates.
- Added explicit `medium` tier on `min_input_tokens > 7000` to
exclude Cerebras (8K context cap) — keeps big-prompt requests
off Cerebras even if they happen to hit a `default-model`
fallback that would otherwise route there.
Provider list expanded:
- **Cerebras** added as required free tier (cloud.cerebras.ai)
- **Gemini AI Studio** added but **DISABLED by default**. Free
tier exists (Flash 250 RPD, Pro 100 RPD, 1M ctx) but Google
trains on free-tier inputs/outputs. User must opt in
consciously by setting enabled = true.
Required + recommended accounts now documented at the top of the
file with rationale for each.
Cost estimates updated honestly:
Solo light (~1M tok/day) : €0/month
Solo moderate (~3M tok/day) : €0-2/month (was 1-3)
Solo heavy (~8M tok/day) : €3-10/month (was 5-15)
The 30-50% reduction in moderate/heavy estimates comes from
adding Cerebras's 1M tok/day for trivial+background and
OpenRouter :free models in default/think paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Verified-against-source-docs rewrite of
ultra-cheapto maximize free-tier coverage and correct rate limit figures from previous versions.Corrected rate limits (sources verified 2026-04-27)
:freeRouting changes
deepseek-chat-v3:freedeepseek-r1:freeNew tiers
The
mediumtier is critical: Cerebras's 8K context cap on the free tier means any Claude Code request with >7K input would fail there. The tier guard keeps it off Cerebras even ifdefault-modelmappings would otherwise route there.Documented accounts (config header)
Each account is justified with a rationale + URL in the file's preamble.
Updated cost estimates (honest)
The 30-50% reduction in moderate/heavy comes from stacking Cerebras's 1M tok/day on top of Groq + OpenRouter :free models in the default/think chains.
Test plan
grob preset info ultra-cheap— parses, 6 providers (4 enabled, 2 opt-in), 5 models, all router slots wiredgrob preset apply ultra-cheap --reloadthen trivial request hits Cerebras🤖 Generated with Claude Code