Skip to content

v1.1.0

Latest

Choose a tag to compare

@Eric-Fithian Eric-Fithian released this 24 Feb 23:39

What's New in v1.1.0

API kwargs pass-through (#47)

Pass arbitrary keyword arguments through to the underlying LLM API call via api_kwargs. For example, set store=False for Fireworks AI.

Flexible rate limit period (#58)

Rate limiting now supports configurable time windows via rate_limit_period_seconds. The old tokens_per_minute / requests_per_minute parameters have been replaced with rate_limit_tokens / rate_limit_requests that work with any time span.

tokencost integration (#57)

Model pricing is now powered by the tokencost package, providing up-to-date pricing for all major LLM providers without manual maintenance.

Other improvements

  • Oversized requests that exceed bucket capacity are now handled correctly (negative bucket balance with natural recovery)
  • Floating-point epsilon guard in rate limiter comparisons prevents edge-case hangs
  • 34 new unit tests covering all three features
  • Updated documentation