Skip to content

v0.1.18 - Rate Limit Optimization & Model Updates

Latest

Choose a tag to compare

@sebastiand-cerebras sebastiand-cerebras released this 08 Dec 21:35
· 6 commits to main since this release
fb58b3c

Features

  • Use conservative max_completion_tokens defaults (8192) to prevent premature rate limiting
    • Cerebras rate limiter estimates quota based on max_completion_tokens upfront, not actual usage
    • Lower defaults preserve rate limit headroom for agentic tools

Fixes

  • Update llama-3.3-70b: maxInputTokens to 131072, maxOutputTokens to 65536
  • Update qwen-3-235b-a22b-instruct-2507: maxOutputTokens to 40960