Skip to content

Rate limit /chat/message: per-user + per-IP defence-in-depth#48

Merged
vahid-ahmadi merged 2 commits into
mainfrom
feat/rate-limiting
May 22, 2026
Merged

Rate limit /chat/message: per-user + per-IP defence-in-depth#48
vahid-ahmadi merged 2 commits into
mainfrom
feat/rate-limiting

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Collaborator

Summary

Adds two-layer rate limiting on `POST /chat/message` — the only expensive endpoint. Other routes are cheap reads and stay unlimited.

  • Per user (or IP if anonymous) — `5/min` and `60/hour`. Authenticated clients send `X-User-Id`; the limiter keys by `user:` so signed-in users aren't blocked by anon flooders on shared NAT/proxy IPs.
  • Per IP defence-in-depth — `30/min`, regardless of who is sending. Catches anon abuse and runaway scripts.
  • Limits configurable via `RATE_LIMIT_CHAT_PER_MIN`, `RATE_LIMIT_CHAT_PER_HOUR`, `RATE_LIMIT_CHAT_IP_PER_MIN`.
  • 429 handler returns JSON with `retry_after_seconds` and a matching `Retry-After` header.
  • Frontend handles 429 with an inline assistant message ("you're sending messages a bit fast — please wait ~Ns") rather than the 402 paywall flow.

Closes #46.

Implementation notes

  • New module `backend/rate_limit.py` owns the `Limiter`, the `chat_key_func`, and the env-tunable limit strings. Imported by both `main.py` (registration + handler) and `chatbot.py` (decorator).
  • `backend/tests/conftest.py` bumps test limits to 10000/min — without it the existing `TestChatMessage` tests would trip the production cap on the 6th call.
  • Storage is slowapi's default in-memory backend. Modal runs up to 10 containers with concurrency 100, so a determined attacker can spread requests across containers and bypass any single counter. The IP layer is approximate but adequate for the threat model — swap to Redis via `storage_uri` when we need cross-container accuracy.

Test plan

  • `docker-compose up`, then hammer `/chat/message` 6 times in 30s with the same logged-in user → 6th request returns 429 with `Retry-After: 60`.
  • Hammer 6 anonymous requests from one IP → same behaviour.
  • In the UI, hit the limit → an inline assistant bubble appears with "you're sending messages a bit fast — please wait ~60s". Streaming state clears, paywall is not shown.
  • `GET /health`, `GET /version`, `GET /conversations`, and the title generator continue to work under load.
  • `pytest backend/tests/test_api.py -q` passes — the new `TestRateLimitConfig` is fast and offline.

Out of scope

  • Cross-container accurate rate limiting (would need Redis).
  • Per-org/team limits (no orgs yet).
  • Rate-limiting cheap GET endpoints (deemed unnecessary for current threat model).

Two layers protect the only expensive endpoint:

- Per user (or per IP if anonymous): 5/min and 60/hour. Authenticated
  clients send their user id in `X-User-Id`; the limiter keys by
  `user:<id>` so signed-in users aren't blocked by anon flooders on
  shared IPs.
- Per IP defence-in-depth: 30/min, regardless of who is sending.

Both limits are env-tunable (`RATE_LIMIT_CHAT_PER_MIN`,
`RATE_LIMIT_CHAT_PER_HOUR`, `RATE_LIMIT_CHAT_IP_PER_MIN`).

The 429 handler returns JSON with `retry_after_seconds` and a matching
`Retry-After` header. Frontend handles 429 with an inline assistant
message ("you're sending messages a bit fast — please wait ~Ns") rather
than the paywall flow used for 402.

Storage is slowapi's default in-memory backend. With Modal's
max_containers=10 and concurrent=100, an attacker could spread requests
across containers to bypass any single counter — the IP layer is
approximate but adequate. Swap to Redis via `storage_uri` if we need
cross-container accuracy later.

Tests: TestRateLimitConfig covers the key function (user/IP precedence,
empty header fallback, env-var overrides). End-to-end limit triggering
isn't tested — it would require precise timing and per-test storage
resets. conftest.py raises test limits well above pytest workload so
the existing chat tests don't trip the production 5/min cap.

Closes #46

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 6, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
policyengine-uk-chat Ready Ready Preview, Comment May 22, 2026 10:45am

Request Review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Beta preview has been cleaned up because this PR was closed.

@vahid-ahmadi vahid-ahmadi self-assigned this May 6, 2026
@vahid-ahmadi vahid-ahmadi requested a review from SakshiKekre May 6, 2026 12:00
- Critical: slowapi's @limiter.limit grabs the endpoint parameter named
  `request` and requires a starlette Request. The route named the
  Pydantic body `request` and the Request `http_request`, so slowapi
  raised on every call and /chat/message 500'd for everyone. Renamed:
  `request` is now the Starlette Request, the body is `chat_request`.

- Per-IP layer keyed on `request.client.host`, which behind Modal's
  proxy is the proxy address — collapsing all traffic into one bucket.
  New `client_ip()` prefers the first `X-Forwarded-For` entry.

- The new TestRateLimitConfig only exercised the key func, so it stayed
  green while the endpoint was broken. Added a signature guard (the
  `request` param must be a Starlette Request) and a smoke test that
  /chat/message does not 500 from the decorator, plus client_ip cases.

Verified against slowapi 0.1.9: the fixed signature streams 5x then
429s; the pre-fix signature 500s on every call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

Pushed 3ba817f — fixes the three issues from review.

🔴 Blocker — /chat/message 500'd on every call. slowapi's @limiter.limit grabs the endpoint parameter literally named request and requires it to be a starlette.requests.Request. This route named the Pydantic body request and the Starlette request http_request, so slowapi raised parameter \request` must be an instance of starlette.requests.Requestbefore the handler ran. Renamed:requestis now the Starlette Request, the parsed body ischat_request(6 references updated in the function body). I reproduced this againstslowapi 0.1.9/fastapi 0.136.1`:

signature result
request: ChatRequest, http_request: Request (before) [500, 500, 500, 500, 500, 500, 500, 500]
request: Request, chat_request: ChatRequest (after) [200, 200, 200, 200, 200, 429, 429, 429]

🟠 Per-IP layer keyed on the proxy IP. get_remote_address returns request.client.host — behind Modal's proxy that is the proxy address, collapsing all anonymous traffic into one bucket. New client_ip() prefers the first X-Forwarded-For entry; chat_key_func's anon fallback and the limiter default both use it.

🟠 Test gap + 🟡 spoofability. TestRateLimitConfig only exercised the key func, so it stayed green while the endpoint was fully broken. Added: a signature guard (the request param must be a Starlette Request — pinpoints this exact bug class, offline), a smoke test that /chat/message does not 500 from the decorator, and client_ip/X-Forwarded-For cases. Also documented in rate_limit.py that both keys are unauthenticated and client-spoofable — deliberate defence-in-depth against casual abuse, not a security control.

Verification: ran the real rate_limit.py functions and the slowapi integration end-to-end (table above); inspect.signature confirmed to see the Starlette Request through slowapi's decorators; py_compile (3.13) clean on all changed files. The full pytest backend/tests/test_api.py was not run here — it imports the PolicyEngine model packages (data-token install) and is meant to run inside the backend container; the chatbot.py change is a mechanical, syntax-verified rename and the slowapi behaviour is proven independently above.

+87/−9 across chatbot.py, rate_limit.py, test_api.py.

@vahid-ahmadi vahid-ahmadi merged commit 9bcfbff into main May 22, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rate limiting on /chat/message and other expensive endpoints

1 participant