Skip to content

feat(services): policy / rate-limit gate#278

Merged
emal-avala merged 2 commits into
mainfrom
feat/policy-rate-limit-service
May 5, 2026
Merged

feat(services): policy / rate-limit gate#278
emal-avala merged 2 commits into
mainfrom
feat/policy-rate-limit-service

Conversation

@emal-avala
Copy link
Copy Markdown
Member

Summary

Adds crates/lib/src/services/policy_limits.rs — a centralized policy / rate-limit gate that providers and tools can consult before issuing expensive calls. Implements ROADMAP.md item 8.12 ("Policy/rate-limit service").

  • Per-resource bucket model. Sliding-window log keyed by (provider, scope) where scope is User | Org | Project. Independent caps for requests-per-minute, requests-per-hour, tokens-per-minute, tokens-per-hour, and tokens-per-day, plus a concurrency cap.
  • Cancellation-safe permits. Concurrency slots use tokio::sync::Semaphore::acquire_owned, so a cancelled task drops its future cleanly without leaking. PolicyPermit is RAII — drop releases the slot, commit_tokens(actual) replaces the pre-flight estimate.
  • Opt-in config. New [limits.<provider>] section in Config. Absent or all-None = no-op service: acquire returns instantly with an inert permit, so existing setups see no behaviour change.
  • Telemetry hook. PolicyTelemetry trait fires on every acquire / commit / reject so existing telemetry pipelines can observe quota state. Default sink uses tracing::debug!.
  • Snapshot API. current_state(key) returns a PolicyState for human-facing readouts (e.g. /status).

Deliberately deferred

Provider-trait wiring and the /status integration are not in this PR — the spec called this out as optional. The service ships standalone with a From<&HashMap<String, ProviderLimitsConfig>> bridge so the follow-up only needs to call PolicyService::new((&config.limits).into()) and gate provider calls behind acquire. That keeps this PR small and reviewable.

Test Plan

  • 12 unit tests in policy_limits.rs: bucket drain / refill, would-exceed-budget fast-fail, concurrency cap of 3 across 10 tasks, permit-drop releases slot, commit_tokens replaces estimate, telemetry fires for acquire/commit/reject, key isolation across (provider, scope), cancellation does not leak permits, no-op service for empty / all-None config, TOML round-trip.
  • 4 integration tests in tests/policy_limits_integration.rs: [limits] section round-trips through the real Config, absent section yields a no-op service, two concurrent fake calls serialize correctly through max_concurrent: 1 with committed-token totals reflecting actual (not estimated) usage, telemetry observes events end-to-end.
  • cargo fmt --all -- --check
  • cargo clippy --all-targets -- -D warnings
  • cargo test --all-targets (only the pre-existing bwrap_* sandbox-integration tests fail; they require unprivileged user namespaces that this CI sandbox blocks — confirmed identical failure on origin/main).

Add a centralized rate-limiter / quota service that providers and
tools can consult before issuing expensive calls. Buckets are keyed
by `(provider, scope)` so the same engine can enforce per-user,
per-org, or per-project quotas.

- `PolicyService::acquire(key, est_tokens)` — sliding-window admission
  for per-minute / per-hour / per-day request and token caps; concurrency
  cap via `tokio::sync::Semaphore::acquire_owned` so cancelled tasks
  do not leak permits.
- `PolicyPermit::commit_tokens(actual)` — replaces the pre-flight
  estimate after the call returns; permit drop releases the
  concurrency slot.
- `PolicyService::current_state(key)` — snapshot for `/status`-style
  readouts.
- `[limits.<provider>]` config section. Absent or all-`None` = no-op
  service; existing setups see no behaviour change.
- Telemetry hook fires on acquire / commit / reject so existing
  pipelines can observe quota state.

Tests cover token-bucket drain/refill, concurrency cap, would-exceed
fast-fail, cancellation cleanup, and a noop-when-absent default.
The service is shipped without wiring the provider trait — that
integration is intentionally left to a follow-up so this PR stays
small and reviewable.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4a9da96a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +655 to +657
// Fits — record and exit the loop.
let id = state.record(est_tokens, now);
Some(Ok(id))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Acquire concurrency permit before recording rate usage

This records request/token usage in the sliding windows before waiting on max_concurrent, so a task that is queued on the semaphore (or canceled while queued) still consumes quota even though no provider call ran. In acquire, state.record(...) happens before sem.acquire_owned().await, and there is no rollback path if the future is dropped during semaphore wait. With max_concurrent=1, repeated canceled waiters can permanently throttle later real calls until window expiry.

Useful? React with 👍 / 👎.

acquire() recorded request/token usage in the sliding windows before
awaiting the semaphore. A future dropped (canceled) while queued on
sem.acquire_owned would still have consumed quota even though no
provider call ever ran. With max_concurrent = 1, repeated cancellations
permanently throttled later real callers until window expiry.

Reorder so the semaphore permit is taken first, THEN the windows are
checked and recorded. Cancellation while queued on the semaphore now
releases the permit cleanly without touching the windows; cancellation
during the window-fit loop leaves the held permit to drop on its own,
also without recording anything.

The tradeoff is that admission now serialises on max_concurrent before
oversize-rejection runs, so a misconfigured bucket where every call is
oversized can leave callers blocking on the permit before they get
their PolicyError::OversizedRequest. That's acceptable — the cap is
approximate and the misconfiguration is a config bug, not a runtime
hot path.
@emal-avala emal-avala merged commit a92e5df into main May 5, 2026
14 checks passed
@emal-avala emal-avala deleted the feat/policy-rate-limit-service branch May 5, 2026 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant