feat(replay-vision): monthly observation quota#60357
Conversation
|
🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down. Add the Most PRs don't need this. Real regressions still get caught on master and fix-forward. |
|
Size Change: -2.25 kB (0%) Total Size: 80.3 MB 📦 View Changed
ℹ️ View Unchanged
|
MCP UI Apps size report
|
7484ae9 to
ae1818c
Compare
1aa082b to
751ba6e
Compare
|
🎭 Playwright report · View test results →
These issues are not necessarily caused by your changes. |
751ba6e to
adcdc9b
Compare
|
adcdc9b to
673742d
Compare
PR overviewThe PR has one remaining security concern: the monthly observation quota can be bypassed through concurrent observation requests because the quota check and row creation are not atomic. This would let a user with scanner write access create more pending observations than the organization’s monthly cap allows, increasing resource usage beyond intended limits. One prior issue has already been addressed, so the remaining risk is focused and should be resolved with an atomic reservation or locked count-and-insert flow. Open issues (1)
Fixed/addressed: 1 · PR risk: 5/10 |
673742d to
58a295d
Compare
Caps each org at 1,000 successful observations per calendar month and
blocks on-demand observations once that ceiling is reached.
- `quota.py` — single static beta quota (1000) and a Postgres-backed
`compute_quota_snapshot(org_id)` that counts succeeded observations
with `completed_at` in the current UTC month.
- `GET /api/environments/{team_id}/vision/quota/` returns
`{monthly_quota, usage_this_month, remaining, exhausted, period_start, period_end}`.
- `POST /api/environments/{team_id}/vision/scanners/{id}/observe/`
returns `HTTP 402 quota_exhausted` (with the next period boundary)
before starting any Temporal workflow when the org's snapshot is exhausted.
- Frontend `VisionQuotaMeter` now consumes real backend data; type
renamed to match (`monthly_quota` / `usage_this_month` / `remaining`
/ `exhausted`). Dropped the unused `policy` field and the placeholder
sparkline + day-range picker — no policy concept until billing is wired,
no per-day history from this endpoint.
The single source of truth — `compute_quota_snapshot` — is the swap
point for a future billing-service integration: no caller (workflow,
viewset, frontend) sees the static-vs-billed distinction.
58a295d to
66feb1f
Compare
| f"User {inputs.triggered_by_user_id} is not a member of scanner {inputs.scanner_id}'s organization" | ||
| ) | ||
|
|
||
| if compute_quota_snapshot(scanner.team.organization_id).exhausted: |
There was a problem hiding this comment.
Medium: Quota check is not atomic
compute_quota_snapshot(...).exhausted runs before the observation row is inserted, with no lock or reservation. An attacker with scanner write access can submit many /observe/ calls for different sessions when the org is just under quota; the Temporal activities can all see the same pre-insert usage and create PENDING observations past the monthly cap. Move the quota check into an atomic reservation, such as a per-org/month counter row locked with select_for_update or an advisory lock around the count-and-insert path.
Problem
Replay Vision needs a cap on observation volume before it can be turned on more broadly — a runaway scanner would happily burn through a month's Gemini budget.
Changes
A flat 1,000 successful observations per organization per UTC calendar month, enforced at
/observe/.compute_quota_snapshot(org_id)countssucceededobservations in the current monthGET /api/environments/{team_id}/vision/quota/returns{monthly_quota, usage_this_month, remaining, exhausted, period_start, period_end}POST .../scanners/{id}/observe/returnsHTTP 402 quota_limit_exceededbefore touching Temporal when exhaustedVisionQuotaMeterreads the real endpointHow did you test this code?
hogli test products/replay_vision/backend/tests/test_quota.py— 9 passed.hogli build:openapiclean.Publish to changelog?
no
🤖 Agent context
Tool: Claude Code.