Rate limit /chat/message: per-user + per-IP defence-in-depth#48
Conversation
Two layers protect the only expensive endpoint:
- Per user (or per IP if anonymous): 5/min and 60/hour. Authenticated
clients send their user id in `X-User-Id`; the limiter keys by
`user:<id>` so signed-in users aren't blocked by anon flooders on
shared IPs.
- Per IP defence-in-depth: 30/min, regardless of who is sending.
Both limits are env-tunable (`RATE_LIMIT_CHAT_PER_MIN`,
`RATE_LIMIT_CHAT_PER_HOUR`, `RATE_LIMIT_CHAT_IP_PER_MIN`).
The 429 handler returns JSON with `retry_after_seconds` and a matching
`Retry-After` header. Frontend handles 429 with an inline assistant
message ("you're sending messages a bit fast — please wait ~Ns") rather
than the paywall flow used for 402.
Storage is slowapi's default in-memory backend. With Modal's
max_containers=10 and concurrent=100, an attacker could spread requests
across containers to bypass any single counter — the IP layer is
approximate but adequate. Swap to Redis via `storage_uri` if we need
cross-container accuracy later.
Tests: TestRateLimitConfig covers the key function (user/IP precedence,
empty header fallback, env-var overrides). End-to-end limit triggering
isn't tested — it would require precise timing and per-test storage
resets. conftest.py raises test limits well above pytest workload so
the existing chat tests don't trip the production 5/min cap.
Closes #46
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Beta preview has been cleaned up because this PR was closed. |
- Critical: slowapi's @limiter.limit grabs the endpoint parameter named `request` and requires a starlette Request. The route named the Pydantic body `request` and the Request `http_request`, so slowapi raised on every call and /chat/message 500'd for everyone. Renamed: `request` is now the Starlette Request, the body is `chat_request`. - Per-IP layer keyed on `request.client.host`, which behind Modal's proxy is the proxy address — collapsing all traffic into one bucket. New `client_ip()` prefers the first `X-Forwarded-For` entry. - The new TestRateLimitConfig only exercised the key func, so it stayed green while the endpoint was broken. Added a signature guard (the `request` param must be a Starlette Request) and a smoke test that /chat/message does not 500 from the decorator, plus client_ip cases. Verified against slowapi 0.1.9: the fixed signature streams 5x then 429s; the pre-fix signature 500s on every call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed 🔴 Blocker —
🟠 Per-IP layer keyed on the proxy IP. 🟠 Test gap + 🟡 spoofability. Verification: ran the real
|
Summary
Adds two-layer rate limiting on `POST /chat/message` — the only expensive endpoint. Other routes are cheap reads and stay unlimited.
Closes #46.
Implementation notes
Test plan
Out of scope