feat(replay-vision): add SweepScannerWorkflow for Phase 2 schedule fires by TueHaulund · Pull Request #60772 · PostHog/posthog

TueHaulund · 2026-05-30T11:16:06Z

Stacked on #60617.

Problem

Phase 2 needs a Temporal workflow that fires every 5 minutes per scanner, runs ScannerCandidateQuery, dispatches one ApplyScannerWorkflow per candidate, and advances the watermark. The per-scanner schedule lands in the next stacked PR — this is just the workflow.

Changes

SweepScannerWorkflow: find candidates → ABANDONed children → advance watermark. On full-batch failure raises AllChildStartsFailed and skips the watermark advance so the next fire retries the window.
find_scanner_candidates_activity: returns candidates + saturated flag (len == DEFAULT_CANDIDATE_LIMIT). Non-retryable on malformed saved query.
advance_scanner_watermark_activity: bumps last_swept_at and last_seen_session_id via .update() — no scanner_version bump, idempotent.
Migration 0008: adds last_seen_session_id to ReplayScanner.

How did you test this code?

I'm an agent. 15 new tests pass; 78 existing tests still pass. No manual testing.

Automatic notifications

Publish to changelog?
Alert Sales and Marketing teams?

🤖 Agent context

Agent: Claude (Claude Code). Used ALLOW_DUPLICATE for the child reuse policy to match the existing rasterize-recording dispatch — UNIQUE(scanner_id, session_id) is the durable dedup.

github-actions · 2026-05-30T11:26:12Z

Migration SQL Changes

Hey 👋, we've detected some migrations on this PR. Here's the SQL output for each migration, make sure they make sense:

`products/replay_vision/backend/migrations/0010_replayscanner_last_seen_session_id.py`

BEGIN;
--
-- Add field last_seen_session_id to replayscanner
--
ALTER TABLE "replay_vision_replayscanner" ADD COLUMN "last_seen_session_id" varchar(200) DEFAULT '' NOT NULL;
COMMIT;

Last updated: 2026-06-01 20:37 UTC (bc5dcea)

github-actions · 2026-05-30T11:26:33Z

🔍 Migration Risk Analysis

We've analyzed your migrations for potential risks.

Summary: 1 Safe | 0 Needs Review | 0 Blocked

✅ Safe

Brief or no lock, backwards compatible

replay_vision.0010_replayscanner_last_seen_session_id
  └─ #1 ✅ AddField
     Adding NOT NULL field with constant default (safe in PG11+)
     model: replayscanner, field: last_seen_session_id

Last updated: 2026-06-01 20:38 UTC (bc5dcea)

greptile-apps · 2026-05-30T20:47:32Z

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
products/replay_vision/backend/tests/test_sweep.py:339-342
Redundant duplicate assertion — the same list-comprehension check appears twice in a row. The inline assert on line 339 and the `advance_calls` assertion on lines 341-342 are identical; one should be removed.

```suggestion
    advance_calls = [call for fn, call in mocks.activity_calls if fn == advance_scanner_watermark_activity]
    assert advance_calls == []
```

### Issue 2 of 2
products/replay_vision/backend/temporal/sweep_workflow.py:56-59
`asyncio.gather` with `return_exceptions=False` (the default) propagates on the **first** non-`WorkflowAlreadyStartedError` failure, not only when every dispatch fails. The comment "if every dispatch fails" misrepresents the actual semantics — even a single unexpected failure in a 100-candidate batch will skip the watermark advance. The behaviour itself is intentional and safe (the next sweep retries, already-started children get `WorkflowAlreadyStartedError`), but the comment makes it sound like partial failures are tolerated.

```suggestion
        # `return_exceptions=False`: if *any* dispatch fails with an error other
        # than WorkflowAlreadyStartedError, the first such exception propagates
        # and the watermark advance is skipped — next sweep retries the same
        # window. Already-started children are deduplicated by Temporal's
        # workflow_id and by `UNIQUE(scanner_id, session_id)` on the row.
```

_{Reviews (1): Last reviewed commit: "refactor(replay-vision): simplify SweepS..." | Re-trigger Greptile}

Lays the foundation for per-scanner Temporal schedules. Wraps SessionRecordingListFromQuery with a session_end-based filter: a session is eligible when it's had no activity in the last 35 minutes and its end time is past the scanner's watermark. The wrap delegates all RecordingsQuery filter compilation to the recordings list, so a scanner's saved filters resolve identically to the UI.

- Bump _PARTITION_LOOKBACK from 6h to 26h, anchored to posthog-js's 24h session_id rotation cap + 2h skew/lag headroom. Adds regression test for long-running sessions whose start is older than 6h. - Add keyset pagination via last_seen_session_id kwarg + lexicographic tuple comparison. Lets the schedule resume past a saturated batch without skipping sessions tied at the boundary microsecond. - Drop the now kwarg; use datetime.now(dt.UTC) directly so inner and outer clocks always agree under @freeze_time. - Push sampling into the inner HAVING via extra_having_predicates so un-sampled sessions are dropped before being aggregated by the outer. - Validate max_execution_time_seconds > 0. - Comment on inner.order_by mutation noting get_query() re-parses each call.

Drop multiline blocks, keep at most one sentence per comment, remove plan/phase prose.

veria-ai · 2026-05-30T21:48:36Z

PR overview

This pull request adds the SweepScannerWorkflow for replay-vision Phase 2 scheduled scanner runs, dispatching candidate sweep work through Temporal child workflows. The touched workflow code coordinates scanner-enabled batch execution for observation creation.

There is one open security concern around quota enforcement during sweep fan-out: a user with scanner configuration access could trigger concurrent child workflows that each see available quota and collectively exceed the intended monthly observation limit. Two prior issues have already been addressed, so the remaining risk is focused on bounding or atomically reserving quota before dispatch. The impact appears limited to quota/resource abuse rather than direct data exposure or authorization bypass.

Open issues (1)

Medium: Observation quota bypass — products/replay_vision/backend/temporal/sweep_workflow.py:49

Fixed/addressed: 2 · PR risk: 5/10

Closes a DoS vector flagged in review: a client sending events with session_ids longer than 128 chars (the MAX_SESSION_ID_LENGTH used by the ApplyScannerWorkflow wire payload) would wedge the sweep on Pydantic validation. Filtering at the query layer keeps over-length sessions invisible to the scanner so the watermark always advances.

Adds the Temporal workflow that fires every 5 minutes per scanner, runs ScannerCandidateQuery, dispatches ABANDONed ApplyScannerWorkflow children, and advances the watermark. Per-scanner schedules and the reconciler land in a later PR. - Migration 0010: add last_seen_session_id to ReplayScanner (keyset tiebreaker for resuming saturated batches without re-emitting). - find_scanner_candidates_activity: reads scanner row, runs the candidate query, returns candidates + a saturated flag. Filters enabled=True to short-circuit disabled scanners. Verifies the creator still has session_recording read on the team as a defence-in-depth check. - advance_scanner_watermark_activity: bumps last_swept_at + last_seen_session_id via .update(), no scanner_version bump. - SweepScannerWorkflow: find -> asyncio.gather over _start_child -> advance. First non-WorkflowAlreadyStartedError failure aborts the gather and skips the watermark advance; UNIQUE(scanner_id, session_id) on ReplayObservation dedups retries. - ReplayScannerViewSet: dangerously_get_required_scopes adds session_recording:read to create/update/partial_update and initial() enforces the matching user_access_control check, matching the /observe/ authorization boundary.

fasyy612

LGTM 🙆‍♀️

veria-ai · 2026-06-01T19:49:51Z

+            retry_policy=common.RetryPolicy(maximum_attempts=1),
+        )
+        if not find_result.candidates:
+            return


Medium: Observation quota bypass

An authenticated user who can configure a broad enabled scanner can cause a sweep to start up to DEFAULT_CANDIDATE_LIMIT child workflows at once. Each child checks compute_quota_snapshot() independently in create_observation_activity, so concurrent children can all observe quota headroom before any pending rows are visible and create more observations than the monthly quota allows. Reserve quota atomically before dispatching, or cap the dispatch batch to the current remaining quota using a DB-side lock/claim so the sweep cannot fan out past the organization’s remaining allowance.

deployment-status-posthog · 2026-06-01T21:28:14Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-06-01 21:28 UTC	Run
prod-us	✅ Deployed	2026-06-02 10:18 UTC	Run
prod-eu	✅ Deployed	2026-06-01 21:53 UTC	Run

TueHaulund requested review from a team, arnohillen, fasyy612, ksvat and nicowaltz and removed request for a team May 30, 2026 20:43

TueHaulund marked this pull request as ready for review May 30, 2026 20:43

assign-reviewers-posthog Bot requested a review from a team May 30, 2026 20:44

greptile-apps Bot reviewed May 30, 2026

View reviewed changes

Comment thread products/replay_vision/backend/tests/test_sweep.py Outdated

Comment thread products/replay_vision/backend/temporal/sweep_workflow.py Outdated

TueHaulund added 3 commits May 30, 2026 23:25

chore(replay-vision): trim comments in ScannerCandidateQuery and tests

f7c0345

Drop multiline blocks, keep at most one sentence per comment, remove plan/phase prose.

TueHaulund force-pushed the tue/replay-vision-scanner-candidate-query branch from 93698c0 to f7c0345 Compare May 30, 2026 21:26

TueHaulund force-pushed the tue/replay-vision-scanner-sweep branch from c898cda to a30b8c4 Compare May 30, 2026 21:33

veria-ai Bot reviewed May 30, 2026

View reviewed changes

Comment thread products/replay_vision/backend/temporal/sweep_workflow.py

TueHaulund force-pushed the tue/replay-vision-scanner-sweep branch from 4d2e011 to 80f2b38 Compare June 1, 2026 07:15

veria-ai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread products/replay_vision/backend/temporal/activities/find_scanner_candidates.py

TueHaulund added 2 commits June 1, 2026 09:47

TueHaulund force-pushed the tue/replay-vision-scanner-sweep branch from 80f2b38 to abd878f Compare June 1, 2026 07:48

fasyy612 approved these changes Jun 1, 2026

View reviewed changes

TueHaulund mentioned this pull request Jun 1, 2026

feat(replay-vision): per-scanner schedule CRUD helpers #60873

Merged

2 tasks

Base automatically changed from tue/replay-vision-scanner-candidate-query to master June 1, 2026 17:41

Merge branch 'master' into tue/replay-vision-scanner-sweep

bc5dcea

veria-ai Bot reviewed Jun 1, 2026

View reviewed changes

TueHaulund merged commit 8313cac into master Jun 1, 2026
336 of 342 checks passed

TueHaulund deleted the tue/replay-vision-scanner-sweep branch June 1, 2026 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(replay-vision): add SweepScannerWorkflow for Phase 2 schedule fires#60772

feat(replay-vision): add SweepScannerWorkflow for Phase 2 schedule fires#60772
TueHaulund merged 6 commits into
masterfrom
tue/replay-vision-scanner-sweep

TueHaulund commented May 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 30, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

veria-ai Bot commented May 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

fasyy612 left a comment

Uh oh!

veria-ai Bot Jun 1, 2026

Uh oh!

Uh oh!

deployment-status-posthog Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TueHaulund commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Automatic notifications

🤖 Agent context

Uh oh!

github-actions Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Migration SQL Changes

products/replay_vision/backend/migrations/0010_replayscanner_last_seen_session_id.py

Uh oh!

github-actions Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Migration Risk Analysis

✅ Safe

Uh oh!

greptile-apps Bot commented May 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

veria-ai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR overview

Open issues (1)

Uh oh!

Uh oh!

fasyy612 left a comment

Choose a reason for hiding this comment

Uh oh!

veria-ai Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TueHaulund commented May 30, 2026 •

edited

Loading

github-actions Bot commented May 30, 2026 •

edited

Loading

`products/replay_vision/backend/migrations/0010_replayscanner_last_seen_session_id.py`

github-actions Bot commented May 30, 2026 •

edited

Loading

veria-ai Bot commented May 30, 2026 •

edited

Loading

deployment-status-posthog Bot commented Jun 1, 2026 •

edited

Loading