Summary
Define rate limiting and quota management as part of the adapter schema, allowing systems to monitor API usage, enforce limits, and provide intelligent notifications to users.
Motivation
When adapters wrap external APIs:
- APIs have rate limits that must be respected
- Users may want to set cost/usage budgets
- Systems need to prevent runaway API calls (especially in agentic loops)
- Graceful degradation is better than hard failures
Rate and quota information should be:
- Extracted during adapter generation (from API specs/docs)
- Configurable by users
- Enforced programmatically by the adapter
- Surfaced via introspection
Proposed Schema Addition
# Adapter front matter
---
name: example-api
type: adapter
version: "1.0.0"
rate_limits:
# API-defined limits (from interrogation)
api_limits:
- scope: global
limit: 5000
window: hour
- scope: endpoint
endpoint: search
limit: 30
window: minute
# User-configurable quotas
quotas:
enabled: true
limits:
- metric: calls_per_hour
warn: 4000 # notification threshold
pause: 4800 # pause and request confirmation
hard_stop: 5000 # absolute stop
- metric: calls_per_day
warn: 10000
pause: 15000
- metric: cost_usd_per_day
warn: 5.00
pause: 10.00
hard_stop: 50.00
- metric: tokens_per_hour # for LLM-based APIs
warn: 100000
pause: 150000
notifications:
- trigger: warn
action: log # log | notify | callback
message: "Approaching rate limit for {api_name}: {metric} at {current}/{limit}"
- trigger: pause
action: notify
require_confirmation: true
message: "Rate limit pause for {api_name}. Continue? {current}/{limit}"
- trigger: hard_stop
action: block
message: "Hard stop reached for {api_name}: {metric}"
tracking:
persist: true # persist usage across sessions
reset_schedule: "0 0 * * *" # cron for resetting daily counters
---
Introspection
Quota status should be queryable:
// introspect operation
{
operation: "get_quota_status",
response: {
api: "github-api",
quotas: [
{ metric: "calls_per_hour", current: 3500, limit: 5000, status: "ok" },
{ metric: "cost_usd_per_day", current: 4.50, limit: 5.00, status: "warn" }
],
next_reset: "2026-01-26T14:00:00Z"
}
}
Behavior
- Automatic extraction: Adapter generator extracts rate limits from API specs when available
- User override: Users can set stricter quotas than API limits
- Graceful handling: warn → pause → hard_stop progression
- Intelligent defaults: System can suggest quotas based on typical usage patterns
- Cross-adapter aggregation: For APIs with shared rate limits across endpoints
Cost Estimation
For paid APIs, adapters can include cost metadata:
cost:
model: per_call # per_call | per_token | per_byte | tiered
pricing:
- endpoint: "*"
cost_per_call: 0.001
- endpoint: "premium_search"
cost_per_call: 0.01
currency: USD
Related
Tasks
Summary
Define rate limiting and quota management as part of the adapter schema, allowing systems to monitor API usage, enforce limits, and provide intelligent notifications to users.
Motivation
When adapters wrap external APIs:
Rate and quota information should be:
Proposed Schema Addition
Introspection
Quota status should be queryable:
Behavior
Cost Estimation
For paid APIs, adapters can include cost metadata:
Related
Tasks