Skip to content

Enable rate limit enforcement after 48h clean shadow mode#6197

Merged
beastoin merged 1 commit intomainfrom
feat/rate-limit-enforce
Mar 31, 2026
Merged

Enable rate limit enforcement after 48h clean shadow mode#6197
beastoin merged 1 commit intomainfrom
feat/rate-limit-enforce

Conversation

@beastoin
Copy link
Copy Markdown
Collaborator

Summary

  • Enable rate limit enforcement by default (shadow mode → active 429 rejections)
  • Bump conversations:create limit from 8/hr → 10/hr to accommodate power users

Context

PR #5836 deployed per-UID rate limiting in shadow mode (log-only, no blocking). After 48h of production monitoring:

  • 297 total shadow events across 24 policies — only 3 policies triggered
  • 0 Redis errors, atomic Lua scripts stable across 20 pods
  • 45 unique UIDs observed, 2 power users generated 60% of all events
  • 21 of 24 policies had zero shadow events (comfortable headroom)

Shadow event breakdown

Policy Events Users Analysis
conversations:reprocess (3/hr) 149 29 1 user = 102 hits (automated retry loop)
conversations:create (8→10/hr) 126 18 1 user = 73 hits (automated creation)
knowledge_graph:rebuild (2/hr) 22 16 Evenly spread, 1-2 hits each

Why bump conversations:create to 10/hr

  • 8/hr could catch legitimate batch import workflows
  • 10/hr still blocks the abuser (73 hits/48h) while giving normal power users headroom
  • All other limits validated by shadow data — no changes needed

Safety

  • Revert to shadow: Set RATE_LIMIT_SHADOW_MODE=true env var (no code change needed)
  • Emergency relax: Set RATE_LIMIT_BOOST=2.0 to instantly double all limits
  • Fail-open: Redis errors allow requests through (no blocking on infra failure)

Test plan

  • 50 unit tests pass (policy validation, boost, enforcement, shadow mode, router wiring)
  • Shadow mode default test updated to expect False (enforcement active)
  • Post-deploy: monitor 429 response rate for first 24h
  • Verify no spike in user-facing errors

Closes #5835

by AI for @beastoin

After 48h shadow mode monitoring showed clean results (297 total shadow
events across 24 policies, 0 Redis errors), enable enforcement by default.

Changes:
- RATE_LIMIT_SHADOW_MODE default: true → false (enforcement active)
- conversations:create limit: 8/hr → 10/hr (accommodates power users
  while still blocking abuse — shadow data showed 1 user at 73 hits)
- Can revert to shadow via RATE_LIMIT_SHADOW_MODE=true env var
- Boost multiplier remains available as emergency escape hatch

Closes #5835

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 31, 2026

Greptile Summary

This PR graduates the per-UID rate limiting system from shadow/log-only mode to active enforcement, backed by 48 hours of production shadow data across 20 pods and 45 unique UIDs. The two concrete code changes are a one-line default flip (RATE_LIMIT_SHADOW from TrueFalse) and a conversations:create limit bump from 8 → 10 req/hr.

Key changes:

  • RATE_LIMIT_SHADOW default changed to False so the app enforces 429 responses out-of-the-box; operators can revert to shadow mode at any time by setting RATE_LIMIT_SHADOW_MODE=true without a code deploy.
  • conversations:create raised to 10/hr to give legitimate batch-import workflows headroom while still blocking the observed abuser (73 hits/48 h, well above 10/hr burst limit).
  • The corresponding unit test (test_shadow_mode_default_ontest_shadow_mode_default_off) is correctly updated to assert assertFalse.
  • All safety nets remain intact: fail-open on Redis errors, RATE_LIMIT_BOOST multiplier for emergency relaxation, and the shadow-mode env var for instant revert.

Confidence Score: 5/5

Safe to merge — minimal, targeted changes backed by production shadow data with a clear revert path.

Both changed files are config and its test. The logic flip is correct (only "false" produces False, matching documented usage), the limit bump is data-driven, and all 50 unit tests including the shadow-mode suite remain consistent. The only finding is a pre-existing P2 style concern around boolean parsing that does not affect current behavior given the documented operator interface.

No files require special attention.

Important Files Changed

Filename Overview
backend/utils/rate_limit_config.py Three targeted changes: default for RATE_LIMIT_SHADOW flipped from True (shadow/log-only) to False (active enforcement), comment updated to match, and conversations:create limit bumped 8→10/hr. Logic is correct and consistent.
backend/tests/unit/test_rate_limiting.py test_shadow_mode_default_on renamed to test_shadow_mode_default_off and assertion flipped to assertFalse; all other existing tests (env_true, env_false, enforcement, shadow logging) continue to pass without modification.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Request] --> B[Auth Dependency\nget_current_user_uid]
    B --> C[_enforce_rate_limit\nuid, policy_name]
    C --> D{Redis check_rate_limit\nLua INCR + TTL}
    D -- RedisError --> E[Fail-open\nlog error, allow]
    D -- allowed=True --> F[Request proceeds]
    D -- allowed=False --> G{RATE_LIMIT_SHADOW?}
    G -- True\nRATE_LIMIT_SHADOW_MODE=true --> H[Log warning only\nRequest proceeds]
    G -- False\ndefault after this PR --> I[Raise HTTP 429\nRetry-After header]

    style I fill:#f66,color:#fff
    style H fill:#fa0,color:#fff
    style E fill:#aaa,color:#fff
    style F fill:#6a6,color:#fff
Loading

Reviews (1): Last reviewed commit: "Enable rate limit enforcement and bump c..." | Re-trigger Greptile


RATE_LIMIT_BOOST: float = float(os.getenv("RATE_LIMIT_BOOST", "1.0"))
RATE_LIMIT_SHADOW: bool = os.getenv("RATE_LIMIT_SHADOW_MODE", "true").lower() != "false"
RATE_LIMIT_SHADOW: bool = os.getenv("RATE_LIMIT_SHADOW_MODE", "false").lower() != "false"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Boolean parsing only recognises "false" as falsy

The expression os.getenv(..., "false").lower() != "false" means that only the literal string "false" (case-insensitive) disables shadow mode; any other value — including "0", "no", or "off" — would silently enable shadow mode instead. This pre-dates this PR but becomes more operationally relevant now that shadow mode is the opt-in path and operators may try conventional truthy/falsy strings when configuring it.

Consider a more conventional boolean helper so that "0", "no", and "off" all behave as expected:

Suggested change
RATE_LIMIT_SHADOW: bool = os.getenv("RATE_LIMIT_SHADOW_MODE", "false").lower() != "false"
RATE_LIMIT_SHADOW: bool = os.getenv("RATE_LIMIT_SHADOW_MODE", "false").lower() in ("1", "true", "yes", "on")

This way, shadow mode is only active when explicitly enabled with a clearly truthy value, which is consistent with the documented usage (RATE_LIMIT_SHADOW_MODE=true).

@beastoin beastoin merged commit 532f815 into main Mar 31, 2026
3 checks passed
@beastoin beastoin deleted the feat/rate-limit-enforce branch March 31, 2026 10:58
@beastoin
Copy link
Copy Markdown
Collaborator Author

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add rate limiting to cost-incurring API endpoints (LLM + STT)

1 participant