fix(pricing,profile): add 2026 models + revert cache values to code defaults by Destynova2 · Pull Request #274 · azerozero/grob

Destynova2 · 2026-04-25T22:16:29Z

Summary

Two unrelated cleanups (small enough to bundle):

1. Pricing — add 6 model entries actually in use today (April 2026)

Without these, the spend tracker either returned the wrong cost (fuzzy fallback to the older version) or None (no cost recorded at all):

Model	Input $/M	Output $/M	Source
MiniMax-M2.5	0.30	1.20	MiniMax news Feb 2026
MiniMax-M2.5-Lightning	0.30	2.40	2x throughput, 2x output cost
glm-4.7-flash	0.0	0.0	Z.ai docs — free tier
llama-3.3-70b-versatile	0.59	0.79	Groq versatile endpoint
mercury-coder-small	0.25	1.25	Inception API
gemini-2.5-pro	1.25	5.00	Google AI Studio

Both MiniMax-M2.5 and lowercase minimax-m2.5 are listed (mirrors the existing M2 entry style).

2. Profile cache values — revert to code defaults

PR #273 bumped these without empirical data. Reverting:

Key	PR #273	This PR
max_capacity	10000	2000 (default)
ttl_secs	14400	3600 (default)
simhash_threshold	4	3 (default)
max_entry_bytes	4 MiB	4 MiB (kept)

The profile comment now reflects reality: the cache hits primarily on temperature=0 work (background tasks, summaries, MCP tools/list, tests). For normal user conversations the hit rate is ~0% because the cache key includes the full message history. Operators should measure grob_cache_hits_total / total via Prometheus before bumping.

Test plan

cargo test --lib pricing::tests — 7 new tests pass
cargo clippy --lib --tests -- -D warnings clean
All prek pre-commit + pre-push hooks green
CI (auto-merge enabled below)

…efaults Pricing additions (5 models silently fuzzy-matched to older versions or returned None — spend tracking was lying for half the full-opti-v5 profile): - MiniMax-M2.5 $0.30/M in, $1.20/M out (released 2026-02-12) - MiniMax-M2.5-Lightning $0.30/M in, $2.40/M out (2x throughput variant) - glm-4.7-flash free tier (Z.ai, released 2026-01-19) - llama-3.3-70b-versatile $0.59/M in, $0.79/M out (Groq endpoint) - mercury-coder-small $0.25/M in, $1.25/M out (Inception diffusion LLM) - gemini-2.5-pro $1.25/M in, $5.00/M out (Google) Profile cache revert: PR #273 bumped values without empirical data: - max_capacity 10000 → 2000 (default; 5x less RAM, no measured benefit) - ttl_secs 14400 → 3600 (default; avoids stale post provider update) - simhash_threshold 4 → 3 (default; better precision) - max_entry_bytes stays at 4 MiB The cache hits primarily on temperature=0 work (background tasks, summaries, MCP tools/list, tests). For normal user conversations it is ~0% because the key includes the full message history. The profile comment reflects this. 7 unit tests added in src/pricing.rs (one per new model + calculate).

Destynova2 merged commit b010ded into main Apr 25, 2026
42 checks passed

Destynova2 deleted the fix/pricing-and-cache-defaults branch April 25, 2026 22:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pricing,profile): add 2026 models + revert cache values to code defaults#274

fix(pricing,profile): add 2026 models + revert cache values to code defaults#274
Destynova2 merged 1 commit intomainfrom
fix/pricing-and-cache-defaults

Destynova2 commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Destynova2 commented Apr 25, 2026

Summary

1. Pricing — add 6 model entries actually in use today (April 2026)

2. Profile cache values — revert to code defaults

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant