feat(services): policy / rate-limit gate by emal-avala · Pull Request #278 · avala-ai/agent-code

emal-avala · 2026-05-04T23:39:34Z

Summary

Adds crates/lib/src/services/policy_limits.rs — a centralized policy / rate-limit gate that providers and tools can consult before issuing expensive calls. Implements ROADMAP.md item 8.12 ("Policy/rate-limit service").

Per-resource bucket model. Sliding-window log keyed by (provider, scope) where scope is User | Org | Project. Independent caps for requests-per-minute, requests-per-hour, tokens-per-minute, tokens-per-hour, and tokens-per-day, plus a concurrency cap.
Cancellation-safe permits. Concurrency slots use tokio::sync::Semaphore::acquire_owned, so a cancelled task drops its future cleanly without leaking. PolicyPermit is RAII — drop releases the slot, commit_tokens(actual) replaces the pre-flight estimate.
Opt-in config. New [limits.<provider>] section in Config. Absent or all-None = no-op service: acquire returns instantly with an inert permit, so existing setups see no behaviour change.
Telemetry hook. PolicyTelemetry trait fires on every acquire / commit / reject so existing telemetry pipelines can observe quota state. Default sink uses tracing::debug!.
Snapshot API. current_state(key) returns a PolicyState for human-facing readouts (e.g. /status).

Deliberately deferred

Provider-trait wiring and the /status integration are not in this PR — the spec called this out as optional. The service ships standalone with a From<&HashMap<String, ProviderLimitsConfig>> bridge so the follow-up only needs to call PolicyService::new((&config.limits).into()) and gate provider calls behind acquire. That keeps this PR small and reviewable.

Test Plan

12 unit tests in policy_limits.rs: bucket drain / refill, would-exceed-budget fast-fail, concurrency cap of 3 across 10 tasks, permit-drop releases slot, commit_tokens replaces estimate, telemetry fires for acquire/commit/reject, key isolation across (provider, scope), cancellation does not leak permits, no-op service for empty / all-None config, TOML round-trip.
4 integration tests in tests/policy_limits_integration.rs: [limits] section round-trips through the real Config, absent section yields a no-op service, two concurrent fake calls serialize correctly through max_concurrent: 1 with committed-token totals reflecting actual (not estimated) usage, telemetry observes events end-to-end.
cargo fmt --all -- --check
cargo clippy --all-targets -- -D warnings
cargo test --all-targets (only the pre-existing bwrap_* sandbox-integration tests fail; they require unprivileged user namespaces that this CI sandbox blocks — confirmed identical failure on origin/main).

Add a centralized rate-limiter / quota service that providers and tools can consult before issuing expensive calls. Buckets are keyed by `(provider, scope)` so the same engine can enforce per-user, per-org, or per-project quotas. - `PolicyService::acquire(key, est_tokens)` — sliding-window admission for per-minute / per-hour / per-day request and token caps; concurrency cap via `tokio::sync::Semaphore::acquire_owned` so cancelled tasks do not leak permits. - `PolicyPermit::commit_tokens(actual)` — replaces the pre-flight estimate after the call returns; permit drop releases the concurrency slot. - `PolicyService::current_state(key)` — snapshot for `/status`-style readouts. - `[limits.<provider>]` config section. Absent or all-`None` = no-op service; existing setups see no behaviour change. - Telemetry hook fires on acquire / commit / reject so existing pipelines can observe quota state. Tests cover token-bucket drain/refill, concurrency cap, would-exceed fast-fail, cancellation cleanup, and a noop-when-absent default. The service is shipped without wiring the provider trait — that integration is intentionally left to a follow-up so this PR stays small and reviewable.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4a9da96a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-04T23:42:27Z

+                        // Fits — record and exit the loop.
+                        let id = state.record(est_tokens, now);
+                        Some(Ok(id))


Acquire concurrency permit before recording rate usage

This records request/token usage in the sliding windows before waiting on max_concurrent, so a task that is queued on the semaphore (or canceled while queued) still consumes quota even though no provider call ran. In acquire, state.record(...) happens before sem.acquire_owned().await, and there is no rollback path if the future is dropped during semaphore wait. With max_concurrent=1, repeated canceled waiters can permanently throttle later real calls until window expiry.

Useful? React with 👍 / 👎.

acquire() recorded request/token usage in the sliding windows before awaiting the semaphore. A future dropped (canceled) while queued on sem.acquire_owned would still have consumed quota even though no provider call ever ran. With max_concurrent = 1, repeated cancellations permanently throttled later real callers until window expiry. Reorder so the semaphore permit is taken first, THEN the windows are checked and recorded. Cancellation while queued on the semaphore now releases the permit cleanly without touching the windows; cancellation during the window-fit loop leaves the held permit to drop on its own, also without recording anything. The tradeoff is that admission now serialises on max_concurrent before oversize-rejection runs, so a misconfigured bucket where every call is oversized can leave callers blocking on the permit before they get their PolicyError::OversizedRequest. That's acceptable — the cap is approximate and the misconfiguration is a config bug, not a runtime hot path.

chatgpt-codex-connector Bot reviewed May 4, 2026

View reviewed changes

emal-avala merged commit a92e5df into main May 5, 2026
14 checks passed

emal-avala deleted the feat/policy-rate-limit-service branch May 5, 2026 03:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(services): policy / rate-limit gate#278

feat(services): policy / rate-limit gate#278
emal-avala merged 2 commits into
mainfrom
feat/policy-rate-limit-service

emal-avala commented May 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

emal-avala commented May 4, 2026

Summary

Deliberately deferred

Test Plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant