feat(web-analytics): opt-in session-start attribution for calendar heatmap unique users#60126
Conversation
…atmap unique users
|
🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down. Add the Most PRs don't need this. Real regressions still get caught on master and fix-forward. |
|
Size Change: 0 B Total Size: 80.1 MB ℹ️ View Unchanged
|
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
posthog/hogql_queries/insights/trends/calendar_heatmap_query_runner.py:90-101
**Unbounded full-table scan in first CTE**
`uniqueSessionEvents` has no `{current_period}` guard, so ClickHouse must read and aggregate every event that matches the event type and property filters across all time before the date-range predicate is applied in the `query` CTE. By contrast, `templateUniqueUsers` pushes `{current_period}` into `filteredEvents` so the query is partition-pruned to the requested window. For a team with years of `$pageview` history, opting into `bucketBySessionStart=true` turns the Active Hours tile into a full-table scan on every load, which will likely time out or hit memory limits in production.
A practical fix is to add a lookahead lower bound in `uniqueSessionEvents` — e.g., `{current_period_start} - INTERVAL 24 HOUR` (or the configured session timeout) — so only events that could plausibly belong to a session starting inside the window are scanned. Sessions whose true `min(timestamp)` lands before the window will still be excluded by the `WHERE {current_period}` in the `query` CTE; you only lose the edge case of a session that has been open for longer than the buffer, which is vanishingly rare for web sessions.
Reviews (1): Last reviewed commit: "chore: update OpenAPI generated types" | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2b89031bc5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The session-attribution variant of templateUniqueUsersBySession had no date filter on the inner uniqueSessionEvents CTE, forcing ClickHouse to read every matching event across all time before the outer date filter applied — a likely timeout on teams with multi-year history. It also grouped events missing $session_id into one synthetic session, undercounting visitors for projects with partial session capture. Add notEmpty($session_id) and a 24-hour-lookback scan window so reads stay proportional to the requested range. Sessions continuously open past the lookback are re-attributed to their first in-range event (documented in the template comment; matches the posthog-js session cap).
Problem
#57296 changed the Calendar Heatmap unique-users SQL from session-bucket attribution (each session contributes once, at its first event's hour) to event-bucket attribution (each user contributes to every hour they had any event). The event-bucket semantic is fine for generic insights, but it causes the web analytics Active Hours tile to diverge from the visitor counts the Web Overview tile computes for the same window — the Overview groups events by
$session_idand credits each session to its start hour, so a session straddling midnight only appears once.The two tiles sitting on the same dashboard should agree on what a "visitor" means.
Changes
bucketBySessionStartonCalendarHeatmapFilter(and forwarded viaTrendsQuery.calendarHeatmapFilterthroughCalendarHeatmapTrendsQueryRunner). Defaultfalsepreserves the post-fix(insights): count calendar heatmap unique users per event bucket #57296 behaviour for everyone outside web analytics.dau/unique_users, the runner uses a restored variant of the pre-fix(insights): count calendar heatmap unique users per event bucket #57296 SQL (templateUniqueUsersBySession) that groups events by$session_id, picks each session's earliest event as its representative timestamp, and only then bins into (day-of-week, hour).totalmath is untouched — event counts don't change with or without the flag.calendarHeatmapFilter: { bucketBySessionStart: true }. The "Total pageviews" tab is unchanged.How did you test this code?
Agent-authored PR. Automated tests run locally:
hogli test posthog/hogql_queries/insights/trends/test/test_calendar_heatmap_query_runner.py— 32 pass.The three parameterized cases inside
test_unique_users_bucket_attributioncover: default (per-event), explicitfalse(per-event), andtrue(per-session-start). The fixture is one user with three events sharing a session id, straddling Saturday 23:00 → Sunday 01:00 — the modes produce different bucket distributions for that fixture and the assertions check both the populated cells and the absence of cells outside the expected set.No manual UI testing performed.
Publish to changelog?
no
🤖 Agent context
Authored with Claude Code (Opus 4.7). Decisions worth surfacing:
falseas the default preserves #57296's intent for everyone except the explicit opt-in.calendarHeatmapFilteronTrendsQueryinstead of a field onTrendsFilter. The web analytics Active Hours tile drives the calendar viaTrendsQuery + display=CalendarHeatmap, butTrendsFilteralready carries a lot of trend-only knobs (formulas, multiple Y axes, etc.). Adding a calendar-heatmap-specific field there would mix concerns. The new top-levelcalendarHeatmapFiltermirrors howtrendsFilteris structured and the wrapper just forwards it through.CalendarHeatmapQuery. Adding aTrendsQueryparity test would mostly re-prove that the one-linegetattrinCalendarHeatmapTrendsQueryRunner._calculateworks. If reviewers want it, easy to add.Reviewer focus areas:
templateUniqueUsersBySession(calendar_heatmap_query_runner.py:89-128) — it's a direct restore of the pre-#57296 shape; please confirm I didn't accidentally re-introduce the original bug the PR fixed.calendarHeatmapFilterplacement onTrendsQuery— alternatives discussed above.