Skip to content

fix(pricing,profile): add 2026 models + revert cache values to code defaults#274

Merged
Destynova2 merged 1 commit intomainfrom
fix/pricing-and-cache-defaults
Apr 25, 2026
Merged

fix(pricing,profile): add 2026 models + revert cache values to code defaults#274
Destynova2 merged 1 commit intomainfrom
fix/pricing-and-cache-defaults

Conversation

@Destynova2
Copy link
Copy Markdown
Contributor

Summary

Two unrelated cleanups (small enough to bundle):

1. Pricing — add 6 model entries actually in use today (April 2026)

Without these, the spend tracker either returned the wrong cost (fuzzy fallback to the older version) or None (no cost recorded at all):

Model Input $/M Output $/M Source
MiniMax-M2.5 0.30 1.20 MiniMax news Feb 2026
MiniMax-M2.5-Lightning 0.30 2.40 2x throughput, 2x output cost
glm-4.7-flash 0.0 0.0 Z.ai docs — free tier
llama-3.3-70b-versatile 0.59 0.79 Groq versatile endpoint
mercury-coder-small 0.25 1.25 Inception API
gemini-2.5-pro 1.25 5.00 Google AI Studio

Both MiniMax-M2.5 and lowercase minimax-m2.5 are listed (mirrors the existing M2 entry style).

2. Profile cache values — revert to code defaults

PR #273 bumped these without empirical data. Reverting:

Key PR #273 This PR
max_capacity 10000 2000 (default)
ttl_secs 14400 3600 (default)
simhash_threshold 4 3 (default)
max_entry_bytes 4 MiB 4 MiB (kept)

The profile comment now reflects reality: the cache hits primarily on temperature=0 work (background tasks, summaries, MCP tools/list, tests). For normal user conversations the hit rate is ~0% because the cache key includes the full message history. Operators should measure grob_cache_hits_total / total via Prometheus before bumping.

Test plan

  • cargo test --lib pricing::tests — 7 new tests pass
  • cargo clippy --lib --tests -- -D warnings clean
  • All prek pre-commit + pre-push hooks green
  • CI (auto-merge enabled below)

…efaults

Pricing additions (5 models silently fuzzy-matched to older versions or
returned None — spend tracking was lying for half the full-opti-v5 profile):

- MiniMax-M2.5            $0.30/M in, $1.20/M out  (released 2026-02-12)
- MiniMax-M2.5-Lightning  $0.30/M in, $2.40/M out  (2x throughput variant)
- glm-4.7-flash           free tier                (Z.ai, released 2026-01-19)
- llama-3.3-70b-versatile $0.59/M in, $0.79/M out  (Groq endpoint)
- mercury-coder-small     $0.25/M in, $1.25/M out  (Inception diffusion LLM)
- gemini-2.5-pro          $1.25/M in, $5.00/M out  (Google)

Profile cache revert: PR #273 bumped values without empirical data:
- max_capacity 10000 → 2000  (default; 5x less RAM, no measured benefit)
- ttl_secs 14400 → 3600       (default; avoids stale post provider update)
- simhash_threshold 4 → 3    (default; better precision)
- max_entry_bytes stays at 4 MiB

The cache hits primarily on temperature=0 work (background tasks,
summaries, MCP tools/list, tests). For normal user conversations it
is ~0% because the key includes the full message history. The profile
comment reflects this.

7 unit tests added in src/pricing.rs (one per new model + calculate).
@Destynova2 Destynova2 merged commit b010ded into main Apr 25, 2026
42 checks passed
@Destynova2 Destynova2 deleted the fix/pricing-and-cache-defaults branch April 25, 2026 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant