Summary
llm-gateway now does route-class-aware provider/model selection and adds durable response/prefix caching with TTL. We need llm-providers to expose stronger provider-native model/caching capabilities so gateway policy can stay thin and avoid hardcoded provider quirks.
Why this matters
Current gateway integration has to hardcode Groq/Cerebras behavior to get good results (model picks by route class, tool stripping on cheap routes). That logic belongs primarily in llm-providers so all clients benefit and policy remains centralized.
Requested enhancements in llm-providers
- Route/workload-aware model defaults
- Add first-class API for model selection by workload class (e.g.
summary, planning, code_draft, long_context, tool_loop) instead of a single generic default.
- Keep provider-specific best models configurable without per-consumer hardcoding.
- Provider-native prompt caching normalization
- Ensure supported providers expose prompt-cache controls and consistently report cache usage.
- Normalize usage fields across providers to include (when available):
cachedInputTokens
cacheReadInputTokens
cacheWriteInputTokens
- Model capability metadata surface
- Expose capability metadata per model (tools, streaming, context window, reasoning profile, structured-output reliability) so router layers can make policy decisions without static registries.
- Stable override contract
- Provide a documented override path (env/config/hook) for consumers to set provider+workload model preferences without forking selection logic.
Gateway-side context (already implemented)
- Per-route provider model overrides for Groq/Cerebras.
- Response cache with TTL and max-entries eviction for safe routes.
- Cache-hit/usage telemetry integration.
Acceptance criteria
- A gateway consumer can ask for a provider default model using workload intent and get stable, provider-tuned outputs.
- Cache usage fields are normalized enough that gateway telemetry does not need provider-specific parsing.
- Consumer no longer needs hardcoded per-provider model maps for basic routing quality.
Summary
llm-gatewaynow does route-class-aware provider/model selection and adds durable response/prefix caching with TTL. We needllm-providersto expose stronger provider-native model/caching capabilities so gateway policy can stay thin and avoid hardcoded provider quirks.Why this matters
Current gateway integration has to hardcode Groq/Cerebras behavior to get good results (model picks by route class, tool stripping on cheap routes). That logic belongs primarily in
llm-providersso all clients benefit and policy remains centralized.Requested enhancements in
llm-providerssummary,planning,code_draft,long_context,tool_loop) instead of a single generic default.cachedInputTokenscacheReadInputTokenscacheWriteInputTokensGateway-side context (already implemented)
Acceptance criteria