Rate Monitor: Per-org request rate alerting on high-cost endpoints#911
Rate Monitor: Per-org request rate alerting on high-cost endpoints#911vprashrex wants to merge 8 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 33 minutes and 31 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR introduces per-organization rate limiting on three API endpoints using Redis-backed counters with configurable per-minute thresholds and Sentry alerting. Configuration defines thresholds (15 for LLM, 3 for collections and evaluations). The core rate_monitor module provides atomic Redis increment-and-get logic and a FastAPI dependency factory that checks per-org request counts, logs warnings, and triggers telemetry alerts when thresholds are exceeded. Three routes wire this into their dependency chains, and comprehensive tests cover normal paths, edge cases, and error handling. ChangesRate Limiting Infrastructure and Route Integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
OpenAPI changes ⚪ No API surface changesNote This PR does not modify the API contract.
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
fab3581 to
6b4ce6c
Compare
|
|
||
| try: | ||
| count = increment_and_get_count(redis_key) | ||
| if count is not None and count > threshold: |
There was a problem hiding this comment.
@AkhileshNegi if wanted to enforce rate limit, we can add raise HttpException with status code 429 here.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
backend/app/tests/core/test_rate_monitor.py (1)
13-211: ⚡ Quick winAdd explicit return annotations to helper/test functions.
Line 13 (
_auth_context) and test methods throughout this file are missing return type annotations (e.g.,-> SimpleNamespace/-> None).As per coding guidelines, "**/*.py: Always add type hints to all function parameters and return values in Python code".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/tests/core/test_rate_monitor.py` around lines 13 - 211, Add explicit return type annotations: update the helper _auth_context to declare its return type (e.g., -> SimpleNamespace) and annotate every test method to return None (e.g., def test_returns_count_and_sets_expiry(self) -> None). Locate functions by their names (_auth_context and each test_* method in classes TestIncrementAndGetCount, TestMonitorRate, and TestRecordRateThreshold) and add the appropriate return annotations without changing behavior.backend/app/core/rate_monitor.py (1)
49-49: ⚡ Quick winAdd an explicit return type to
monitor_rate.Line 49 is missing a return annotation for the dependency factory.
As per coding guidelines, "**/*.py: Always add type hints to all function parameters and return values in Python code".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/app/core/rate_monitor.py` at line 49, monitor_rate is missing a return type annotation; update the signature of monitor_rate(category: RateCategory) to include an explicit return type that matches the dependency factory it returns (e.g., import typing.Callable and annotate as -> Callable[..., RateMonitor] or, if uncertain, -> Callable[..., Any]) and ensure any referenced type (RateMonitor or Any) is imported or added to typing imports so the function has a full return type hint.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/app/core/rate_monitor.py`:
- Around line 60-83: The code uses project (auth_context.project) for the Redis
key but then labels telemetry as org-scoped via
record_rate_threshold(org_id=project.id,...), causing inconsistent scoping;
locate where auth_context.project is read and instead resolve the organization
identity (e.g., auth_context.organization or project.organization_id /
project.organization) and use that organization id/name for both the redis_key
and the record_rate_threshold call (update redis_key =
f"rate_monitor:{category}:{org.id}:{minute_bucket}" and pass org.id/org.name
into record_rate_threshold) and keep increment_and_get_count and threshold logic
unchanged so counters and telemetry are consistently org-scoped.
- Around line 76-86: The code currently logs and calls record_rate_threshold for
every count > threshold; change the check to emit the alert only when the bucket
first crosses the threshold (e.g., when count == threshold + 1) so repeated
increments in the same minute don't spam alerts. Update the condition around
monitor logic that uses variables count and threshold (the block that calls
logger.warning and record_rate_threshold for project.id, project.name, and
category) to only run when the count has just moved from <=threshold to
>threshold (count == threshold + 1).
In `@backend/app/core/telemetry.py`:
- Around line 481-483: In function record_rate_threshold update the
logger.exception call to use the correct log prefix "[record_rate_threshold]"
(instead of "[record_rate_threshold_exceeded]") so the message follows the
convention; keep the same exception context (exc_info=e) and message text
otherwise to preserve error detail.
---
Nitpick comments:
In `@backend/app/core/rate_monitor.py`:
- Line 49: monitor_rate is missing a return type annotation; update the
signature of monitor_rate(category: RateCategory) to include an explicit return
type that matches the dependency factory it returns (e.g., import
typing.Callable and annotate as -> Callable[..., RateMonitor] or, if uncertain,
-> Callable[..., Any]) and ensure any referenced type (RateMonitor or Any) is
imported or added to typing imports so the function has a full return type hint.
In `@backend/app/tests/core/test_rate_monitor.py`:
- Around line 13-211: Add explicit return type annotations: update the helper
_auth_context to declare its return type (e.g., -> SimpleNamespace) and annotate
every test method to return None (e.g., def
test_returns_count_and_sets_expiry(self) -> None). Locate functions by their
names (_auth_context and each test_* method in classes TestIncrementAndGetCount,
TestMonitorRate, and TestRecordRateThreshold) and add the appropriate return
annotations without changing behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fa7d503f-de2a-4ef8-8046-2d1e522c6a82
📒 Files selected for processing (7)
backend/app/api/routes/collections.pybackend/app/api/routes/evaluations/evaluation.pybackend/app/api/routes/llm.pybackend/app/core/config.pybackend/app/core/rate_monitor.pybackend/app/core/telemetry.pybackend/app/tests/core/test_rate_monitor.py
…d of organization
…d of organization
| threshold=threshold, | ||
| ) | ||
|
|
||
| except redis.RedisError as e: |
There was a problem hiding this comment.
increment_and_get_count returns None after an exception, so this redis.RedisError will practically never fire right? should remove the exception handler there and let this redis.RedisError handle it?
| pipe.incr(key) | ||
| pipe.expire(key, _EXPIRATION_SECONDS) |
There was a problem hiding this comment.
increment and expire are not atomic; what if increment executes, system crashes, expire does not execute -- key will remain in redis forever
def increment_and_get_count(key: str) -> int | None:
try:
# SET NX atomically creates the key with TTL only on first call.
_redis_client.set(key, 0, ex=_EXPIRATION_SECONDS, nx=True)
return _redis_client.incr(key)
except Exception as e:
logger.error(
f"[increment_and_get_count] Error incrementing count for {key}: {e}"
)
return None
Target issue is: #797
Summary
Explain the motivation for making this change. What existing problem does the pull request solve?
High-cost endpoints (
llm/call,evaluations,collections) had no visibility into per-tenant request rates, risking runaway clients and server load. This PR adds monitoring and alerting (no rate limiting) for request rates in a one-minute window.What it does:
app/core/rate_monitor.pyexposesmonitor_rate(category), a FastAPI dependency added to the high-cost endpoints.rate_monitor:{category}:{org_id}:{minute}), expiring after 2 minutes.record_rate_thresholdintelemetry.py(Sentry → Discord channel), including org, category, request count, and threshold.Thresholds (requests/minute):
llm_callcollectionsevaluationsChecklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes