feat(replay-vision): add ScannerCandidateQuery for Phase 2 sweep by TueHaulund · Pull Request #60617 · PostHog/posthog

TueHaulund · 2026-05-29T08:28:59Z

Problem

Phase 2 of Replay Vision needs per-scanner Temporal schedules that find sessions to apply scanners to. This PR adds the query that does the finding. No Temporal yet — that's the next PR.

Changes

ScannerCandidateQuery wraps SessionRecordingListFromQuery with a session_end-based filter: a session is eligible when it's had no activity in the last 35 minutes and its end time is past the scanner's watermark. Filter compilation is delegated to the recordings list so a scanner's saved RecordingsQuery resolves identically to the UI. ORDER BY session_end ASC lets the schedule advance its watermark monotonically.

Also adds Product.REPLAY_VISION to query_tagging.py for system.query_log attribution.

How did you test this code?

I'm an agent. 36 automated tests pass — construction-time sanitization, sampling AST, and a ClickHouse integration suite covering watermark/settle bounds, sampling determinism, $lib routing, recording-metric HAVING, test-account exclusion, retention expiry, and two regression tests for sessions straddling the watermark / still settling. No manual testing.

Automatic notifications

Publish to changelog?
Alert Sales and Marketing teams?

🤖 Agent context

Agent: Claude (Claude Code). Worth flagging: the 6-hour _PARTITION_LOOKBACK on the inner date_from. It's a perf-only bound, sized above Vision's 1-hour active-seconds cap — a session with >6h wall-clock idle gap whose true session_end lands just past the watermark would be missed. Acceptable for v1 in my read.

github-actions · 2026-05-29T08:44:20Z

🎭 Playwright report · View test results →

⚠️ 2 flaky tests:

Redirect to appropriate place after login with complex URL (chromium)
Creating a SQL insight with a variable and overriding it on a dashboard (chromium)

These issues are not necessarily caused by your changes.
Annoyed by this comment? Help fix flakies and failures and it'll disappear!

greptile-apps · 2026-05-29T10:25:38Z

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
products/replay_vision/backend/queries/scanner_candidate_query.py:76-77
`candidate_limit` is validated as positive, but `max_execution_time_seconds` has no equivalent check. Passing `0` is silently forwarded to ClickHouse, which interprets `max_execution_time=0` as *no limit* — the opposite of the protection the comment describes. A negative value would also pass through without error.

```suggestion
        if candidate_limit <= 0:
            raise ValueError(f"candidate_limit must be positive, got {candidate_limit}")
        if max_execution_time_seconds <= 0:
            raise ValueError(f"max_execution_time_seconds must be positive, got {max_execution_time_seconds}")
```

_{Reviews (1): Last reviewed commit: "fix(replay-vision): address ScannerCandi..." | Re-trigger Greptile}

stamphog

Review agent failed after 3 attempts — needs human review.

Lays the foundation for per-scanner Temporal schedules. Wraps SessionRecordingListFromQuery with a session_end-based filter: a session is eligible when it's had no activity in the last 35 minutes and its end time is past the scanner's watermark. The wrap delegates all RecordingsQuery filter compilation to the recordings list, so a scanner's saved filters resolve identically to the UI.

- Bump _PARTITION_LOOKBACK from 6h to 26h, anchored to posthog-js's 24h session_id rotation cap + 2h skew/lag headroom. Adds regression test for long-running sessions whose start is older than 6h. - Add keyset pagination via last_seen_session_id kwarg + lexicographic tuple comparison. Lets the schedule resume past a saturated batch without skipping sessions tied at the boundary microsecond. - Drop the now kwarg; use datetime.now(dt.UTC) directly so inner and outer clocks always agree under @freeze_time. - Push sampling into the inner HAVING via extra_having_predicates so un-sampled sessions are dropped before being aggregated by the outer. - Validate max_execution_time_seconds > 0. - Comment on inner.order_by mutation noting get_query() re-parses each call.

Drop multiline blocks, keep at most one sentence per comment, remove plan/phase prose.

Closes a DoS vector flagged in review: a client sending events with session_ids longer than 128 chars (the MAX_SESSION_ID_LENGTH used by the ApplyScannerWorkflow wire payload) would wedge the sweep on Pydantic validation. Filtering at the query layer keeps over-length sessions invisible to the scanner so the watermark always advances.

deployment-status-posthog · 2026-06-01T18:12:50Z

Deploy status

Environment	Status	Deployed At	Workflow
dev	✅ Deployed	2026-06-01 18:12 UTC	Run
prod-us	✅ Deployed	2026-06-01 18:37 UTC	Run
prod-eu	✅ Deployed	2026-06-01 18:42 UTC	Run

)

TueHaulund requested review from a team, arnohillen, fasyy612, ksvat and nicowaltz and removed request for a team May 29, 2026 08:31

TueHaulund marked this pull request as ready for review May 29, 2026 10:13

assign-reviewers-posthog Bot requested a review from a team May 29, 2026 10:13

greptile-apps Bot reviewed May 29, 2026

View reviewed changes

Comment thread products/replay_vision/backend/queries/scanner_candidate_query.py

TueHaulund added the stamphog Request AI review from stamphog label May 30, 2026

stamphog Bot reviewed May 30, 2026

View reviewed changes

stamphog Bot removed the stamphog Request AI review from stamphog label May 30, 2026

TueHaulund mentioned this pull request May 30, 2026

feat(replay-vision): add SweepScannerWorkflow for Phase 2 schedule fires #60772

Merged

2 tasks

TueHaulund added 3 commits May 30, 2026 23:25

chore(replay-vision): trim comments in ScannerCandidateQuery and tests

f7c0345

Drop multiline blocks, keep at most one sentence per comment, remove plan/phase prose.

TueHaulund force-pushed the tue/replay-vision-scanner-candidate-query branch from 93698c0 to f7c0345 Compare May 30, 2026 21:26

TueHaulund added 2 commits June 1, 2026 09:47

Merge branch 'master' into tue/replay-vision-scanner-candidate-query

8468469

arnohillen approved these changes Jun 1, 2026

View reviewed changes

TueHaulund merged commit 6481db2 into master Jun 1, 2026
204 checks passed

TueHaulund deleted the tue/replay-vision-scanner-candidate-query branch June 1, 2026 17:41

MattPua pushed a commit that referenced this pull request Jun 1, 2026

feat(replay-vision): add ScannerCandidateQuery for Phase 2 sweep (#60617

bb13f0d

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(replay-vision): add ScannerCandidateQuery for Phase 2 sweep#60617

feat(replay-vision): add ScannerCandidateQuery for Phase 2 sweep#60617
TueHaulund merged 5 commits into
masterfrom
tue/replay-vision-scanner-candidate-query

TueHaulund commented May 29, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 29, 2026

Uh oh!

Uh oh!

stamphog Bot left a comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TueHaulund commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Automatic notifications

🤖 Agent context

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented May 29, 2026

Uh oh!

Uh oh!

stamphog Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deployment-status-posthog Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploy status

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TueHaulund commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

deployment-status-posthog Bot commented Jun 1, 2026 •

edited

Loading