Skip to content

refactor(presets): max ultra-cheap free tiers + correct rate limits#300

Merged
Destynova2 merged 1 commit intomainfrom
refactor/ultra-cheap-max-free-tiers
Apr 27, 2026
Merged

refactor(presets): max ultra-cheap free tiers + correct rate limits#300
Destynova2 merged 1 commit intomainfrom
refactor/ultra-cheap-max-free-tiers

Conversation

@Destynova2
Copy link
Copy Markdown
Contributor

Summary

Verified-against-source-docs rewrite of ultra-cheap to maximize free-tier coverage and correct rate limit figures from previous versions.

Corrected rate limits (sources verified 2026-04-27)

Provider Was (wrong) Now (verified) Source
Groq free 14 400 RPD 1 000 RPD, 30 RPM, 6K TPM (15K for Gemma 2 9B) console.groq.com/docs/rate-limits
Cerebras free not listed 30 RPM, 1M tok/day, 8 192 ctx cap on free tier inference-docs.cerebras.ai/support/rate-limits
OpenRouter free not listed 50 RPD base / 1 000 RPD after $10 lifetime deposit, 20 RPM on :free openrouter.ai/docs/api/reference/limits
Gemini free not listed Flash 250 RPD / Pro 100 RPD, trains on free-tier data ai.google.dev/gemini-api/docs/rate-limits

Routing changes

Slot Previous primary New primary Rationale
trivial Groq gpt-oss-20b free Cerebras Llama 3.1 8B free 1M tok/day vs Groq's 1K RPD, fastest inference, 8K cap fits trivial
background Groq gpt-oss-20b free Cerebras Llama 3.1 8B free same: capacity king for short prompts
default Groq gpt-oss-120b free Groq Llama 3.3 70B free 131K ctx, fits 6K TPM better than gpt-oss-120b's tight RPM
default p2 DeepSeek V4 Flash paid OR deepseek-chat-v3:free third free tier before any paid call
think p1 DeepSeek R1 paid OR deepseek-r1:free free reasoning, paid R1 stays p2

New tiers

# Long input >100K → only providers with that capacity
[[tiers]]
name = "complex"
providers = ["xai", "openrouter"]
[tiers.match]
min_input_tokens = 100000

# Medium input >7K → exclude Cerebras (8K ctx cap)
[[tiers]]
name = "medium"
providers = ["groq", "openrouter", "xai"]
[tiers.match]
min_input_tokens = 7000

The medium tier is critical: Cerebras's 8K context cap on the free tier means any Claude Code request with >7K input would fail there. The tier guard keeps it off Cerebras even if default-model mappings would otherwise route there.

Documented accounts (config header)

REQUIRED                : GROQ_API_KEY (free)
STRONGLY RECOMMENDED    : CEREBRAS_API_KEY (free), OPENROUTER_API_KEY (paid, $10 lifetime → 1K RPD)
RECOMMENDED for search  : XAI_API_KEY (paid only, ultra-cheap)
OPTIONAL opt-in         : GEMINI_API_KEY (free but Google trains on data)
OPTIONAL price floor    : NEBIUS_API_KEY (paid Llama 3.1 8B at $0.02/$0.06)

Each account is justified with a rationale + URL in the file's preamble.

Updated cost estimates (honest)

Profile Previous estimate New estimate
Solo light (~1M tok/day) €0 €0
Solo moderate (~3M tok/day) €1-3 €0-2
Solo heavy (~8M tok/day) €5-15 €3-10

The 30-50% reduction in moderate/heavy comes from stacking Cerebras's 1M tok/day on top of Groq + OpenRouter :free models in the default/think chains.

Test plan

  • grob preset info ultra-cheap — parses, 6 providers (4 enabled, 2 opt-in), 5 models, all router slots wired
  • No hardcoded version strings (CI guard)
  • Tier guard prevents Cerebras from receiving > 7K input prompts
  • Account requirements explicitly documented at file head
  • Docs lint passes in CI
  • Smoke after merge: grob preset apply ultra-cheap --reload then trivial request hits Cerebras

🤖 Generated with Claude Code

Verified-against-source-docs rewrite (April 2026) of the ultra-cheap
preset to maximize free-tier coverage and correct earlier wrong figures.

Corrected rate limits (was wrong in previous version):
  Groq      : 30 RPM, **1 000 RPD** (NOT 14 400 — old number),
              6K TPM general (15K for Gemma 2 9B). No training.
  Cerebras  : 30 RPM, 1M tokens/day, 60-100K TPM,
              **8 192 token context cap on free tier**.
              No training.
  OpenRouter: 50 RPD base / 1 000 RPD after lifetime $10 deposit,
              20 RPM on `:free` variants.

Sources verified 2026-04-27:
  https://console.groq.com/docs/rate-limits
  https://inference-docs.cerebras.ai/support/rate-limits
  https://openrouter.ai/docs/api/reference/limits

Routing changes vs previous ultra-cheap:

- Trivial / background now hit **Cerebras Llama 3.1 8B free** as
  priority 1 (1M tok/day, fastest inference). Cerebras's 8K input
  cap is naturally accommodated since trivial tier matches
  max_tokens_below = 500 and short-prompt background tasks fit too.
- Default now leads with **Groq Llama 3.3 70B free** (was
  gpt-oss-120b which has tighter free-tier RPM). Llama 3.3 70B
  fits the 30 RPM / 6K TPM general limit and offers 131K native
  context — fits Claude Code's typical agentic prompts.
- Default p2 added **OpenRouter `deepseek-chat-v3:free`** (third
  free tier). Lift from 50 to 1 000 RPD requires one-time $10
  OpenRouter deposit.
- Think p1 = **OpenRouter `deepseek-r1:free`** (free reasoning).
  Falls back to paid R1 only when free quota saturates.
- Added explicit `medium` tier on `min_input_tokens > 7000` to
  exclude Cerebras (8K context cap) — keeps big-prompt requests
  off Cerebras even if they happen to hit a `default-model`
  fallback that would otherwise route there.

Provider list expanded:
- **Cerebras** added as required free tier (cloud.cerebras.ai)
- **Gemini AI Studio** added but **DISABLED by default**. Free
  tier exists (Flash 250 RPD, Pro 100 RPD, 1M ctx) but Google
  trains on free-tier inputs/outputs. User must opt in
  consciously by setting enabled = true.

Required + recommended accounts now documented at the top of the
file with rationale for each.

Cost estimates updated honestly:
  Solo light (~1M tok/day)        : €0/month
  Solo moderate (~3M tok/day)     : €0-2/month (was 1-3)
  Solo heavy (~8M tok/day)        : €3-10/month (was 5-15)

The 30-50% reduction in moderate/heavy estimates comes from
adding Cerebras's 1M tok/day for trivial+background and
OpenRouter :free models in default/think paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Destynova2 Destynova2 enabled auto-merge April 27, 2026 20:17
@Destynova2 Destynova2 merged commit d24cda1 into main Apr 27, 2026
27 of 28 checks passed
@Destynova2 Destynova2 deleted the refactor/ultra-cheap-max-free-tiers branch April 27, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant