What's new
Rate-limit retry with backoff
All LLM and embedding API calls now automatically retry when a rate-limit or quota-exceeded error is received from any provider (Gemini, OpenAI, OpenRouter).
- If the provider returns a "retry in Xs" hint in the error message (Gemini does this), that exact delay is honoured.
- Otherwise, exponential backoff is applied:
llm_retry_delay × 2^(attempt − 1). - A warning is logged on each retry attempt.
- Resolves #1.
New configuration options
| Option | Default | Description |
|---|---|---|
max_llm_retries |
3 |
Max retry attempts before surfacing the error to the user. Set to 0 to disable. |
llm_retry_delay |
60 (seconds) |
Base delay when the provider does not supply a retry-after hint. |
Documentation
- Added a Rate limiting section to the README.
- Added a Supported models by provider table with links to each provider's model catalogue and the ruby_llm model registry.
Upgrade
# Gemfile
gem "glancer", "~> 1.1"No migration needed — new config options have sensible defaults and are fully optional.