What's New in v1.1.0
API kwargs pass-through (#47)
Pass arbitrary keyword arguments through to the underlying LLM API call via api_kwargs. For example, set store=False for Fireworks AI.
Flexible rate limit period (#58)
Rate limiting now supports configurable time windows via rate_limit_period_seconds. The old tokens_per_minute / requests_per_minute parameters have been replaced with rate_limit_tokens / rate_limit_requests that work with any time span.
tokencost integration (#57)
Model pricing is now powered by the tokencost package, providing up-to-date pricing for all major LLM providers without manual maintenance.
Other improvements
- Oversized requests that exceed bucket capacity are now handled correctly (negative bucket balance with natural recovery)
- Floating-point epsilon guard in rate limiter comparisons prevents edge-case hangs
- 34 new unit tests covering all three features
- Updated documentation