feat(services): policy / rate-limit gate#278
Conversation
Add a centralized rate-limiter / quota service that providers and tools can consult before issuing expensive calls. Buckets are keyed by `(provider, scope)` so the same engine can enforce per-user, per-org, or per-project quotas. - `PolicyService::acquire(key, est_tokens)` — sliding-window admission for per-minute / per-hour / per-day request and token caps; concurrency cap via `tokio::sync::Semaphore::acquire_owned` so cancelled tasks do not leak permits. - `PolicyPermit::commit_tokens(actual)` — replaces the pre-flight estimate after the call returns; permit drop releases the concurrency slot. - `PolicyService::current_state(key)` — snapshot for `/status`-style readouts. - `[limits.<provider>]` config section. Absent or all-`None` = no-op service; existing setups see no behaviour change. - Telemetry hook fires on acquire / commit / reject so existing pipelines can observe quota state. Tests cover token-bucket drain/refill, concurrency cap, would-exceed fast-fail, cancellation cleanup, and a noop-when-absent default. The service is shipped without wiring the provider trait — that integration is intentionally left to a follow-up so this PR stays small and reviewable.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a4a9da96a1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Fits — record and exit the loop. | ||
| let id = state.record(est_tokens, now); | ||
| Some(Ok(id)) |
There was a problem hiding this comment.
Acquire concurrency permit before recording rate usage
This records request/token usage in the sliding windows before waiting on max_concurrent, so a task that is queued on the semaphore (or canceled while queued) still consumes quota even though no provider call ran. In acquire, state.record(...) happens before sem.acquire_owned().await, and there is no rollback path if the future is dropped during semaphore wait. With max_concurrent=1, repeated canceled waiters can permanently throttle later real calls until window expiry.
Useful? React with 👍 / 👎.
acquire() recorded request/token usage in the sliding windows before awaiting the semaphore. A future dropped (canceled) while queued on sem.acquire_owned would still have consumed quota even though no provider call ever ran. With max_concurrent = 1, repeated cancellations permanently throttled later real callers until window expiry. Reorder so the semaphore permit is taken first, THEN the windows are checked and recorded. Cancellation while queued on the semaphore now releases the permit cleanly without touching the windows; cancellation during the window-fit loop leaves the held permit to drop on its own, also without recording anything. The tradeoff is that admission now serialises on max_concurrent before oversize-rejection runs, so a misconfigured bucket where every call is oversized can leave callers blocking on the permit before they get their PolicyError::OversizedRequest. That's acceptable — the cap is approximate and the misconfiguration is a config bug, not a runtime hot path.
Summary
Adds
crates/lib/src/services/policy_limits.rs— a centralized policy / rate-limit gate that providers and tools can consult before issuing expensive calls. Implements ROADMAP.md item 8.12 ("Policy/rate-limit service").(provider, scope)wherescopeisUser | Org | Project. Independent caps for requests-per-minute, requests-per-hour, tokens-per-minute, tokens-per-hour, and tokens-per-day, plus a concurrency cap.tokio::sync::Semaphore::acquire_owned, so a cancelled task drops its future cleanly without leaking.PolicyPermitis RAII — drop releases the slot,commit_tokens(actual)replaces the pre-flight estimate.[limits.<provider>]section inConfig. Absent or all-None= no-op service:acquirereturns instantly with an inert permit, so existing setups see no behaviour change.PolicyTelemetrytrait fires on every acquire / commit / reject so existing telemetry pipelines can observe quota state. Default sink usestracing::debug!.current_state(key)returns aPolicyStatefor human-facing readouts (e.g./status).Deliberately deferred
Provider-trait wiring and the
/statusintegration are not in this PR — the spec called this out as optional. The service ships standalone with aFrom<&HashMap<String, ProviderLimitsConfig>>bridge so the follow-up only needs to callPolicyService::new((&config.limits).into())and gate provider calls behindacquire. That keeps this PR small and reviewable.Test Plan
policy_limits.rs: bucket drain / refill, would-exceed-budget fast-fail, concurrency cap of 3 across 10 tasks, permit-drop releases slot,commit_tokensreplaces estimate, telemetry fires for acquire/commit/reject, key isolation across(provider, scope), cancellation does not leak permits, no-op service for empty / all-Noneconfig, TOML round-trip.tests/policy_limits_integration.rs:[limits]section round-trips through the realConfig, absent section yields a no-op service, two concurrent fake calls serialize correctly throughmax_concurrent: 1with committed-token totals reflecting actual (not estimated) usage, telemetry observes events end-to-end.cargo fmt --all -- --checkcargo clippy --all-targets -- -D warningscargo test --all-targets(only the pre-existingbwrap_*sandbox-integration tests fail; they require unprivileged user namespaces that this CI sandbox blocks — confirmed identical failure onorigin/main).