·
6 commits
to main
since this release
Features
- Use conservative
max_completion_tokensdefaults (8192) to prevent premature rate limiting- Cerebras rate limiter estimates quota based on
max_completion_tokensupfront, not actual usage - Lower defaults preserve rate limit headroom for agentic tools
- Cerebras rate limiter estimates quota based on
Fixes
- Update
llama-3.3-70b: maxInputTokens to 131072, maxOutputTokens to 65536 - Update
qwen-3-235b-a22b-instruct-2507: maxOutputTokens to 40960