refactor(presets): max ultra-cheap free tiers + correct rate limits by Destynova2 · Pull Request #300 · azerozero/grob

Destynova2 · 2026-04-27T20:17:42Z

Summary

Verified-against-source-docs rewrite of ultra-cheap to maximize free-tier coverage and correct rate limit figures from previous versions.

Corrected rate limits (sources verified 2026-04-27)

Provider	Was (wrong)	Now (verified)	Source
Groq free	14 400 RPD	1 000 RPD, 30 RPM, 6K TPM (15K for Gemma 2 9B)	console.groq.com/docs/rate-limits
Cerebras free	not listed	30 RPM, 1M tok/day, 8 192 ctx cap on free tier	inference-docs.cerebras.ai/support/rate-limits
OpenRouter free	not listed	50 RPD base / 1 000 RPD after $10 lifetime deposit, 20 RPM on `:free`	openrouter.ai/docs/api/reference/limits
Gemini free	not listed	Flash 250 RPD / Pro 100 RPD, trains on free-tier data	ai.google.dev/gemini-api/docs/rate-limits

Routing changes

Slot	Previous primary	New primary	Rationale
trivial	Groq gpt-oss-20b free	Cerebras Llama 3.1 8B free	1M tok/day vs Groq's 1K RPD, fastest inference, 8K cap fits trivial
background	Groq gpt-oss-20b free	Cerebras Llama 3.1 8B free	same: capacity king for short prompts
default	Groq gpt-oss-120b free	Groq Llama 3.3 70B free	131K ctx, fits 6K TPM better than gpt-oss-120b's tight RPM
default p2	DeepSeek V4 Flash paid	OR `deepseek-chat-v3:free`	third free tier before any paid call
think p1	DeepSeek R1 paid	OR `deepseek-r1:free`	free reasoning, paid R1 stays p2

New tiers

# Long input >100K → only providers with that capacity
[[tiers]]
name = "complex"
providers = ["xai", "openrouter"]
[tiers.match]
min_input_tokens = 100000

# Medium input >7K → exclude Cerebras (8K ctx cap)
[[tiers]]
name = "medium"
providers = ["groq", "openrouter", "xai"]
[tiers.match]
min_input_tokens = 7000

The medium tier is critical: Cerebras's 8K context cap on the free tier means any Claude Code request with >7K input would fail there. The tier guard keeps it off Cerebras even if default-model mappings would otherwise route there.

Documented accounts (config header)

REQUIRED                : GROQ_API_KEY (free)
STRONGLY RECOMMENDED    : CEREBRAS_API_KEY (free), OPENROUTER_API_KEY (paid, $10 lifetime → 1K RPD)
RECOMMENDED for search  : XAI_API_KEY (paid only, ultra-cheap)
OPTIONAL opt-in         : GEMINI_API_KEY (free but Google trains on data)
OPTIONAL price floor    : NEBIUS_API_KEY (paid Llama 3.1 8B at $0.02/$0.06)

Each account is justified with a rationale + URL in the file's preamble.

Updated cost estimates (honest)

Profile	Previous estimate	New estimate
Solo light (~1M tok/day)	€0	€0
Solo moderate (~3M tok/day)	€1-3	€0-2
Solo heavy (~8M tok/day)	€5-15	€3-10

The 30-50% reduction in moderate/heavy comes from stacking Cerebras's 1M tok/day on top of Groq + OpenRouter :free models in the default/think chains.

Test plan

grob preset info ultra-cheap — parses, 6 providers (4 enabled, 2 opt-in), 5 models, all router slots wired
No hardcoded version strings (CI guard)
Tier guard prevents Cerebras from receiving > 7K input prompts
Account requirements explicitly documented at file head
Docs lint passes in CI
Smoke after merge: grob preset apply ultra-cheap --reload then trivial request hits Cerebras

🤖 Generated with Claude Code

Verified-against-source-docs rewrite (April 2026) of the ultra-cheap preset to maximize free-tier coverage and correct earlier wrong figures. Corrected rate limits (was wrong in previous version): Groq : 30 RPM, **1 000 RPD** (NOT 14 400 — old number), 6K TPM general (15K for Gemma 2 9B). No training. Cerebras : 30 RPM, 1M tokens/day, 60-100K TPM, **8 192 token context cap on free tier**. No training. OpenRouter: 50 RPD base / 1 000 RPD after lifetime $10 deposit, 20 RPM on `:free` variants. Sources verified 2026-04-27: https://console.groq.com/docs/rate-limits https://inference-docs.cerebras.ai/support/rate-limits https://openrouter.ai/docs/api/reference/limits Routing changes vs previous ultra-cheap: - Trivial / background now hit **Cerebras Llama 3.1 8B free** as priority 1 (1M tok/day, fastest inference). Cerebras's 8K input cap is naturally accommodated since trivial tier matches max_tokens_below = 500 and short-prompt background tasks fit too. - Default now leads with **Groq Llama 3.3 70B free** (was gpt-oss-120b which has tighter free-tier RPM). Llama 3.3 70B fits the 30 RPM / 6K TPM general limit and offers 131K native context — fits Claude Code's typical agentic prompts. - Default p2 added **OpenRouter `deepseek-chat-v3:free`** (third free tier). Lift from 50 to 1 000 RPD requires one-time $10 OpenRouter deposit. - Think p1 = **OpenRouter `deepseek-r1:free`** (free reasoning). Falls back to paid R1 only when free quota saturates. - Added explicit `medium` tier on `min_input_tokens > 7000` to exclude Cerebras (8K context cap) — keeps big-prompt requests off Cerebras even if they happen to hit a `default-model` fallback that would otherwise route there. Provider list expanded: - **Cerebras** added as required free tier (cloud.cerebras.ai) - **Gemini AI Studio** added but **DISABLED by default**. Free tier exists (Flash 250 RPD, Pro 100 RPD, 1M ctx) but Google trains on free-tier inputs/outputs. User must opt in consciously by setting enabled = true. Required + recommended accounts now documented at the top of the file with rationale for each. Cost estimates updated honestly: Solo light (~1M tok/day) : €0/month Solo moderate (~3M tok/day) : €0-2/month (was 1-3) Solo heavy (~8M tok/day) : €3-10/month (was 5-15) The 30-50% reduction in moderate/heavy estimates comes from adding Cerebras's 1M tok/day for trivial+background and OpenRouter :free models in default/think paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Destynova2 enabled auto-merge April 27, 2026 20:17

Destynova2 merged commit d24cda1 into main Apr 27, 2026
27 of 28 checks passed

Destynova2 deleted the refactor/ultra-cheap-max-free-tiers branch April 27, 2026 20:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(presets): max ultra-cheap free tiers + correct rate limits#300

refactor(presets): max ultra-cheap free tiers + correct rate limits#300
Destynova2 merged 1 commit intomainfrom
refactor/ultra-cheap-max-free-tiers

Destynova2 commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Destynova2 commented Apr 27, 2026

Summary

Corrected rate limits (sources verified 2026-04-27)

Routing changes

New tiers

Documented accounts (config header)

Updated cost estimates (honest)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant