feat: add user_id-based analytics with LEFT JOIN users#551
Conversation
Rewrite analytics queries to use the canonical user_id for user attribution, with backward-compatible fallback to scm_login for sessions that don't have a user_id yet. - getSummary: active_users uses COALESCE(user_id, scm_login) - getTimeseries: LEFT JOIN users for display_name, GROUP BY user_id - getBreakdown (user): LEFT JOIN users, displayName in response - Add displayName field to shared AnalyticsBreakdownEntry type
📝 WalkthroughWalkthroughAnalytics queries and tests were updated to prefer canonical user identity: sessions are joined to Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Terraform Validation Results
Pushed by: @ColeMurray, Action: |
There was a problem hiding this comment.
Summary
PR Title: feat: add user_id-based analytics with LEFT JOIN users (#551)
Author: @ColeMurray
Files changed: 3 files, +178 / -29
This updates analytics queries to attribute sessions by canonical user_id and pull display names from the users table. The new integration coverage is helpful, and the analytics integration suite passes locally, but I found two blocking issues before this is safe to merge.
Critical Issues
- [Functionality]
packages/control-plane/src/db/analytics-store.ts:107-getTimeseries()labels rows by display name but groups by canonical identity. If two different users share the same display name on the same day, the SQL returns multiple rows with the same rendered label and the reducer overwrites the earlier value instead of summing it. This will undercount chart data for common names. Suggested fix: either group by the rendered label in SQL as well, or merge duplicate labels with additive updates in TypeScript. - [Compatibility]
packages/control-plane/src/db/analytics-store.ts:143-getBreakdown(by=user)now returns opaqueuser_idvalues and the__unknown__sentinel inkey. The current web consumer still renders and sorts byentry.keyand still checks for the oldunknownsentinel, so this backend-only change regresses the current UI by exposing raw IDs and dropping the unknown-user treatment. Suggested fix: preserve the old key semantics until the frontend lands, or ship the paired frontend change in the same rollout.
Suggestions
- [Testing]
packages/control-plane/test/integration/analytics.test.ts:630- Add a coverage case with two distinct users who share the same display name on the same date. That would lock in the expected timeseries behavior and catch the collision bug above.
Nitpicks
- None.
Positive Feedback
- The summary query keeps the null/empty-login fallback compact with
COALESCE(user_id, NULLIF(scm_login, )), which is a clean way to preserve attribution for unlinked sessions. - The new integration test does a good job covering the intended linked-user behavior across breakdown, summary, and timeseries in one place.
- The repo breakdown path stays isolated from the user-identity join, which keeps the change surface relatively small.
Questions
- None.
Verdict
Request Changes
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/control-plane/src/db/analytics-store.ts (1)
61-74:⚠️ Potential issue | 🟡 MinorAdd
userIdandscmLoginto automation session creation inpackages/control-plane/src/scheduler/durable-object.tsline 572.The
sessionStore.create()call is missing both fields, even thoughautomation.created_byis available. Without these, automation-spawned sessions will not be linked to a user in analytics queries, causing theCOALESCE(user_id, NULLIF(scm_login, ''))logic ingetAnalyticsand downstreamgetBreakdown/getTimeseriesto fail to match other sessions from the same user, splitting metrics across separate rows/series and inflating active_user counts.Router paths (lines 941, 1942) and child session creation already populate both fields correctly; automation path should do the same:
scmLogin: automation.created_by || null, userId: automation.created_by,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/control-plane/src/db/analytics-store.ts` around lines 61 - 74, The automation spawn path is calling sessionStore.create() without setting userId and scmLogin, so automation-created sessions aren't linked to users and the COALESCE(user_id, NULLIF(scm_login, '')) logic in getAnalytics/getBreakdown/getTimeseries miscounts active users; update the automation branch in durable-object.ts (where sessionStore.create(...) is invoked for automation at the automation spawn path) to pass scmLogin: automation.created_by || null and userId: automation.created_by so automation sessions match router/child session records and analytics queries aggregate correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/control-plane/src/db/analytics-store.ts`:
- Around line 61-74: The automation spawn path is calling sessionStore.create()
without setting userId and scmLogin, so automation-created sessions aren't
linked to users and the COALESCE(user_id, NULLIF(scm_login, '')) logic in
getAnalytics/getBreakdown/getTimeseries miscounts active users; update the
automation branch in durable-object.ts (where sessionStore.create(...) is
invoked for automation at the automation spawn path) to pass scmLogin:
automation.created_by || null and userId: automation.created_by so automation
sessions match router/child session records and analytics queries aggregate
correctly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fa3e98fe-5fdb-427a-bbaa-8e76bd3866f8
📒 Files selected for processing (3)
packages/control-plane/src/db/analytics-store.tspackages/control-plane/test/integration/analytics.test.tspackages/shared/src/types/index.ts
- Fix timeseries reducer to sum duplicate display-name labels instead of overwriting (handles two users with the same display name) - Preserve 'unknown' sentinel in breakdown key for backward compat with current web frontend (was changed to '__unknown__') - Add integration test for display-name collision in timeseries
Terraform Validation Results
Pushed by: @ColeMurray, Action: |
|
Re: coderabbit's suggestion to add This is intentionally out of scope for this PR. Automation session creation is Phase 5 (PR 12) in the phased implementation plan. The |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/control-plane/src/db/analytics-store.ts`:
- Around line 146-150: The selected display_name expression is non-deterministic
because s.scm_login is not aggregated while GROUP BY only covers the key; update
the display_name selection to use aggregates (e.g., MAX(u.display_name) and
MAX(NULLIF(s.scm_login, '')) inside COALESCE) or else include the fallback
expression in the GROUP BY so the label is deterministic; change the
displayNameSelect (and analogous selections at the other occurrences) to use
MAX(u.display_name) and MAX(NULLIF(s.scm_login, '')) in the COALESCE fallback or
add those fallback expressions to GROUP BY to ensure stable results when
u.display_name IS NULL.
- Line 107: The SELECT uses COALESCE(u.display_name, NULLIF(s.scm_login, ''),
'unknown') which treats an empty u.display_name as a valid value and skips the
scm_login fallback; change the expression to COALESCE(NULLIF(u.display_name,
''), NULLIF(s.scm_login, ''), 'unknown') in the SQL (the COALESCE containing
u.display_name and s.scm_login) so empty strings normalize to NULL and the
fallback works — update both occurrences of that COALESCE in analytics-store.ts.
- Line 63: The COUNT(DISTINCT COALESCE(user_id, NULLIF(scm_login, ''))) used for
active_users will double-count the same person across phased rollouts because
migrations adding scm_login (0017) and user_id (0019) are no-backfill; update
documentation and/or add an in-repo note near the analytics logic (referencing
active_users, COALESCE, user_id, scm_login) explaining the expected transient
metric drift during Phase 4 and when Phase 5 backfill/consolidation will resolve
it, or alternatively add a comment indicating planned backfill timing and
rationale so future readers understand the temporary inflation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5310245a-319d-4012-9800-bb2c1942875e
📒 Files selected for processing (2)
packages/control-plane/src/db/analytics-store.tspackages/control-plane/test/integration/analytics.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/control-plane/test/integration/analytics.test.ts
- Use NULLIF(u.display_name, '') to handle empty display names - Use MAX() aggregates for non-deterministic bare columns in GROUP BY - Add comment documenting expected active_users metric drift during phased rollout (Phase 4→6 window)
Terraform Validation Results
Pushed by: @ColeMurray, Action: |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/control-plane/src/db/analytics-store.ts (1)
156-156: Optional: case-insensitive ordering for display-name sort.SQLite's default collation is
BINARY, soORDER BY display_name ASCplaces all uppercase names before lowercase ones (e.g.,Zedbeforealice). For a user-facing breakdown list this is usually jarring. ConsiderCOLLATE NOCASEon the display-name tiebreaker;key ASCis already an opaque id so it's fine to leave alone.♻️ Proposed tweak
- const orderTail = isUserBreakdown ? "display_name ASC" : "key ASC"; + const orderTail = isUserBreakdown ? "display_name COLLATE NOCASE ASC" : "key ASC";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/control-plane/src/db/analytics-store.ts` at line 156, The ORDER BY tail uses "display_name ASC" when isUserBreakdown is true which sorts case-sensitively; change the branch that sets orderTail (the variable in analytics-store.ts used when isUserBreakdown is true) to use a case-insensitive collation such as "display_name COLLATE NOCASE ASC" instead of "display_name ASC" so user-facing display_name ordering is case-insensitive while leaving the "key ASC" branch unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/control-plane/src/db/analytics-store.ts`:
- Line 156: The ORDER BY tail uses "display_name ASC" when isUserBreakdown is
true which sorts case-sensitively; change the branch that sets orderTail (the
variable in analytics-store.ts used when isUserBreakdown is true) to use a
case-insensitive collation such as "display_name COLLATE NOCASE ASC" instead of
"display_name ASC" so user-facing display_name ordering is case-insensitive
while leaving the "key ASC" branch unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bd308fc1-821b-466f-a4ea-6f3d6a8f05ea
📒 Files selected for processing (1)
packages/control-plane/src/db/analytics-store.ts
Summary
Phase 3 of the unified user model migration (ColeMurray/background-agents#523). Rewrites analytics queries to use the canonical
user_idfor user attribution viaLEFT JOIN users, replacing the previousscm_login-only approach.getSummary:active_usersusesCOALESCE(user_id, scm_login)— prefers canonical identity, falls back to SCM login for unlinked sessionsgetTimeseries:LEFT JOIN usersfor display name labels; groups byuser_idwhen available (merging sessions across differentscm_loginvalues for the same user)getBreakdown(by user):LEFT JOIN users, returnsdisplayNamefrom users table; key isuser_id(stable canonical ID) withscm_loginfallbackgetBreakdown(by repo): Unchanged — no JOIN neededdisplayNametoAnalyticsBreakdownEntryBackward compatibility
All queries are backward-compatible for sessions without
user_id(all current sessions, until Phase 4 wires identity forwarding from each bot):activeUsers:COALESCE(user_id, scm_login)→ falls back toscm_login→ identical countCOALESCE(display_name, scm_login, 'unknown')→ same labels as todayCOALESCE(user_id, scm_login, '__unknown__')→ same keys as today for linked sessions;'unknown'→'__unknown__'for the sentinel (frontend will be updated in Phase 6, PR 13)Depends on
userstableuser_idcolumn inSessionIndexStoreTest plan
userstable + sessions withuser_id, verifies merged breakdown entry withdisplayNamefrom users table, correctactiveUserscount, and timeseries group labels__unknown__sentinel anddisplayNamein breakdown responsesSummary by CodeRabbit
New Features
Bug Fixes
Tests