Skip to content

Rate Monitor: Per-org request rate alerting on high-cost endpoints#911

Open
vprashrex wants to merge 8 commits into
mainfrom
feat/threshold-monitor
Open

Rate Monitor: Per-org request rate alerting on high-cost endpoints#911
vprashrex wants to merge 8 commits into
mainfrom
feat/threshold-monitor

Conversation

@vprashrex
Copy link
Copy Markdown
Collaborator

@vprashrex vprashrex commented Jun 4, 2026

Target issue is: #797

Summary

Explain the motivation for making this change. What existing problem does the pull request solve?
High-cost endpoints (llm/call, evaluations, collections) had no visibility into per-tenant request rates, risking runaway clients and server load. This PR adds monitoring and alerting (no rate limiting) for request rates in a one-minute window.

What it does:

  • Threshold Rate Limit happens at Project Level
  • New app/core/rate_monitor.py exposes monitor_rate(category), a FastAPI dependency added to the high-cost endpoints.
  • Counts requests per org per minute using a Redis bucket key (rate_monitor:{category}:{org_id}:{minute}), expiring after 2 minutes.
  • When a threshold is exceeded, emits a warning alert via record_rate_threshold in telemetry.py (Sentry → Discord channel), including org, category, request count, and threshold.
  • Fails open: if Redis is unavailable, the request proceeds and the check is skipped (logged, not raised).

Thresholds (requests/minute):

Category Threshold
llm_call 15
collections 3
evaluations 3

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

  • Using Redis for storing request count for 2 minute window (Expiration)
  • No rate limiting: requests are never blocked, only counted and alerted.
  • To extend this code for rate limit just add Raise HttpException with 429 status code on function monitor_rate()

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

Warning

Review limit reached

@vprashrex, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 33 minutes and 31 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 45b14155-1bbd-4022-97cf-f3bcff7c9d96

📥 Commits

Reviewing files that changed from the base of the PR and between 5021d0d and abd4725.

📒 Files selected for processing (3)
  • backend/app/core/rate_monitor.py
  • backend/app/core/telemetry.py
  • backend/app/tests/core/test_rate_monitor.py
📝 Walkthrough

Walkthrough

This PR introduces per-organization rate limiting on three API endpoints using Redis-backed counters with configurable per-minute thresholds and Sentry alerting. Configuration defines thresholds (15 for LLM, 3 for collections and evaluations). The core rate_monitor module provides atomic Redis increment-and-get logic and a FastAPI dependency factory that checks per-org request counts, logs warnings, and triggers telemetry alerts when thresholds are exceeded. Three routes wire this into their dependency chains, and comprehensive tests cover normal paths, edge cases, and error handling.

Changes

Rate Limiting Infrastructure and Route Integration

Layer / File(s) Summary
Configuration: rate thresholds
backend/app/core/config.py
Settings class adds three per-minute rate threshold fields for LLM calls (15), collections (3), and evaluations (3).
Core rate monitor infrastructure
backend/app/core/rate_monitor.py
RateCategory type and THRESHOLDS mapping load from config. Module-level Redis client initialized. increment_and_get_count() atomically increments a Redis key with 120-second expiry, returning the count or None on error. monitor_rate(category) returns a FastAPI dependency checker that reads the authenticated project, computes per-minute per-org counters, compares against thresholds, and dispatches Sentry alerts when breached, with graceful degradation on Redis errors.
Telemetry alerts for rate threshold events
backend/app/core/telemetry.py
record_rate_threshold() emits warning-level Sentry events with org, category, request count, and threshold metadata as tags and extras, with no-op behavior if Sentry client is inactive or emission fails.
Route endpoint rate limiting
backend/app/api/routes/collections.py, backend/app/api/routes/evaluations/evaluation.py, backend/app/api/routes/llm.py
Three POST endpoints (collections, evaluations, llm/call) import and wire monitor_rate("collections"), monitor_rate("evaluations"), and monitor_rate("llm_call") into their FastAPI dependency lists alongside existing project-permission checks.
Comprehensive test coverage
backend/app/tests/core/test_rate_monitor.py
Test module validates increment_and_get_count (Redis pipeline, error handling), monitor_rate factory (org/category early exits, threshold comparison, telemetry dispatch, Redis error swallowing), and record_rate_threshold (Sentry active/inactive, tag emission, exception suppression). All external calls mocked; includes AuthContext helper.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • ProjectTech4DevAI/kaapi-frontend#129: This PR implements per-endpoint rate monitoring for llm/call, evaluations, and collections with Redis-backed organization-scoped counters and Sentry alerting; the linked issue requests similar per-API-key monitoring with Discord alerts.

Poem

A rabbit hops through Redis gates,
Counting requests, checking rates,
Sentry sings when limits break,
Three endpoints now can monitor their stake! 🐰⏱️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: implementing per-organization request rate alerting for high-cost endpoints.
Docstring Coverage ✅ Passed Docstring coverage is 90.48% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/threshold-monitor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

OpenAPI changes   ⚪ No API surface changes

Note

This PR does not modify the API contract.

maine9943aec · generated by oasdiff

@vprashrex vprashrex self-assigned this Jun 4, 2026
@vprashrex vprashrex added the enhancement New feature or request label Jun 4, 2026
@vprashrex vprashrex linked an issue Jun 4, 2026 that may be closed by this pull request
@sentry
Copy link
Copy Markdown

sentry Bot commented Jun 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Comment thread backend/app/core/rate_monitor.py Outdated

try:
count = increment_and_get_count(redis_key)
if count is not None and count > threshold:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkhileshNegi if wanted to enforce rate limit, we can add raise HttpException with status code 429 here.

@vprashrex vprashrex requested review from Ayush8923 and kartpop and removed request for Prajna1999 June 5, 2026 05:01
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
backend/app/tests/core/test_rate_monitor.py (1)

13-211: ⚡ Quick win

Add explicit return annotations to helper/test functions.

Line 13 (_auth_context) and test methods throughout this file are missing return type annotations (e.g., -> SimpleNamespace / -> None).

As per coding guidelines, "**/*.py: Always add type hints to all function parameters and return values in Python code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/tests/core/test_rate_monitor.py` around lines 13 - 211, Add
explicit return type annotations: update the helper _auth_context to declare its
return type (e.g., -> SimpleNamespace) and annotate every test method to return
None (e.g., def test_returns_count_and_sets_expiry(self) -> None). Locate
functions by their names (_auth_context and each test_* method in classes
TestIncrementAndGetCount, TestMonitorRate, and TestRecordRateThreshold) and add
the appropriate return annotations without changing behavior.
backend/app/core/rate_monitor.py (1)

49-49: ⚡ Quick win

Add an explicit return type to monitor_rate.

Line 49 is missing a return annotation for the dependency factory.
As per coding guidelines, "**/*.py: Always add type hints to all function parameters and return values in Python code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/app/core/rate_monitor.py` at line 49, monitor_rate is missing a
return type annotation; update the signature of monitor_rate(category:
RateCategory) to include an explicit return type that matches the dependency
factory it returns (e.g., import typing.Callable and annotate as ->
Callable[..., RateMonitor] or, if uncertain, -> Callable[..., Any]) and ensure
any referenced type (RateMonitor or Any) is imported or added to typing imports
so the function has a full return type hint.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/app/core/rate_monitor.py`:
- Around line 60-83: The code uses project (auth_context.project) for the Redis
key but then labels telemetry as org-scoped via
record_rate_threshold(org_id=project.id,...), causing inconsistent scoping;
locate where auth_context.project is read and instead resolve the organization
identity (e.g., auth_context.organization or project.organization_id /
project.organization) and use that organization id/name for both the redis_key
and the record_rate_threshold call (update redis_key =
f"rate_monitor:{category}:{org.id}:{minute_bucket}" and pass org.id/org.name
into record_rate_threshold) and keep increment_and_get_count and threshold logic
unchanged so counters and telemetry are consistently org-scoped.
- Around line 76-86: The code currently logs and calls record_rate_threshold for
every count > threshold; change the check to emit the alert only when the bucket
first crosses the threshold (e.g., when count == threshold + 1) so repeated
increments in the same minute don't spam alerts. Update the condition around
monitor logic that uses variables count and threshold (the block that calls
logger.warning and record_rate_threshold for project.id, project.name, and
category) to only run when the count has just moved from <=threshold to
>threshold (count == threshold + 1).

In `@backend/app/core/telemetry.py`:
- Around line 481-483: In function record_rate_threshold update the
logger.exception call to use the correct log prefix "[record_rate_threshold]"
(instead of "[record_rate_threshold_exceeded]") so the message follows the
convention; keep the same exception context (exc_info=e) and message text
otherwise to preserve error detail.

---

Nitpick comments:
In `@backend/app/core/rate_monitor.py`:
- Line 49: monitor_rate is missing a return type annotation; update the
signature of monitor_rate(category: RateCategory) to include an explicit return
type that matches the dependency factory it returns (e.g., import
typing.Callable and annotate as -> Callable[..., RateMonitor] or, if uncertain,
-> Callable[..., Any]) and ensure any referenced type (RateMonitor or Any) is
imported or added to typing imports so the function has a full return type hint.

In `@backend/app/tests/core/test_rate_monitor.py`:
- Around line 13-211: Add explicit return type annotations: update the helper
_auth_context to declare its return type (e.g., -> SimpleNamespace) and annotate
every test method to return None (e.g., def
test_returns_count_and_sets_expiry(self) -> None). Locate functions by their
names (_auth_context and each test_* method in classes TestIncrementAndGetCount,
TestMonitorRate, and TestRecordRateThreshold) and add the appropriate return
annotations without changing behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fa7d503f-de2a-4ef8-8046-2d1e522c6a82

📥 Commits

Reviewing files that changed from the base of the PR and between b06fec6 and 5021d0d.

📒 Files selected for processing (7)
  • backend/app/api/routes/collections.py
  • backend/app/api/routes/evaluations/evaluation.py
  • backend/app/api/routes/llm.py
  • backend/app/core/config.py
  • backend/app/core/rate_monitor.py
  • backend/app/core/telemetry.py
  • backend/app/tests/core/test_rate_monitor.py

Comment thread backend/app/core/rate_monitor.py
Comment thread backend/app/core/rate_monitor.py Outdated
Comment thread backend/app/core/telemetry.py Outdated
Copy link
Copy Markdown
Collaborator

@kartpop kartpop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved with comments

threshold=threshold,
)

except redis.RedisError as e:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increment_and_get_count returns None after an exception, so this redis.RedisError will practically never fire right? should remove the exception handler there and let this redis.RedisError handle it?

Comment on lines +39 to +40
pipe.incr(key)
pipe.expire(key, _EXPIRATION_SECONDS)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

increment and expire are not atomic; what if increment executes, system crashes, expire does not execute -- key will remain in redis forever

def increment_and_get_count(key: str) -> int | None:
    try:
        # SET NX atomically creates the key with TTL only on first call.
        _redis_client.set(key, 0, ex=_EXPIRATION_SECONDS, nx=True)
        return _redis_client.incr(key)
    except Exception as e:
        logger.error(
            f"[increment_and_get_count] Error incrementing count for {key}: {e}"
        )
        return None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request ready-for-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Monitoring: Add per API key rate monitoring and Discord alerts

2 participants