feat: frontend displayName + historical user_id backfill (Phase 6)#554
feat: frontend displayName + historical user_id backfill (Phase 6)#554ColeMurray merged 3 commits intomainfrom
Conversation
Switch the unknown user sentinel from 'unknown' to '__unknown__' in both the analytics SQL and the frontend check, now that they deploy together. UserCell renders entry.displayName instead of raw entry.key. Sort uses displayName for the user column. Timeseries label for unlinked sessions changed to 'Unknown user' for consistency with the breakdown display name. Adds idempotent D1 migration 0020 to backfill user_id on historical sessions and user_scm_tokens from scm_login via user_identities.
📝 WalkthroughWalkthroughThe PR standardizes the unknown-user sentinel identifier in the analytics system from Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Terraform Validation Results
Pushed by: @ColeMurray, Action: |
| FROM (SELECT DISTINCT scm_login FROM sessions | ||
| WHERE scm_login IS NOT NULL AND scm_login != '') s | ||
| JOIN users u ON u.display_name = s.scm_login | ||
| WHERE (SELECT COUNT(*) FROM users u2 WHERE u2.display_name = s.scm_login) = 1 |
There was a problem hiding this comment.
This match on looks unsafe for Phase 6 data. New canonical GitHub users are created from and can have a human-friendly () that differs from the login, with the login stored separately in . In that common case, this migration creates a brand new backfill user named after the login instead of attaching historical sessions to the existing canonical user, so one person ends up split across two rows. Could we backfill via existing GitHub identities ( / ) before creating any new user rows?
There was a problem hiding this comment.
Summary
PR #554, feat: frontend displayName + historical user_id backfill (Phase 6), by @ColeMurray. I reviewed 7 changed files (+164 / -9). The frontend analytics changes look straightforward and well-covered, but the new D1 backfill migration has two correctness issues that can corrupt the unified user model.
Critical Issues
- [Functionality] terraform/d1/migrations/0020_backfill_user_ids.sql:17 - The Step 1 INSERT is not idempotent. users is only unique on id and non-NULL email, while this migration generates a fresh random id and inserts NULL email, so rerunning it creates duplicate users rows for the same historical scm_login.
- [Functionality] terraform/d1/migrations/0020_backfill_user_ids.sql:32 - Step 2 matches users via display_name = scm_login, but current GitHub canonical users are keyed by scmUserId and can legitimately have display_name = scmName instead of the login. In that case this migration creates a second synthetic user and backfills historical sessions onto the wrong canonical record.
Suggestions
- [Testing] terraform/d1/migrations/0020_backfill_user_ids.sql - After the migration logic is fixed, an integration-style migration test would help lock in the two important invariants here: reruns do not create duplicates, and historical sessions attach to an already-existing canonical GitHub user when one is present.
Nitpicks
- None.
Positive Feedback
- The frontend/user-table changes are minimal and clear, and the new tests cover the displayName rendering and sort behavior well.
- The analytics SQL changes for Unknown user are internally consistent across timeseries and breakdown responses.
- The migration is documented clearly enough that the intended behavior is easy to review.
Questions
- None.
Verdict
Request Changes: the migration issues above should be fixed before merge, since they can create duplicate canonical users and mis-link historical sessions.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/control-plane/src/db/analytics-store.ts (1)
111-118:⚠️ Potential issue | 🟡 MinorPotential label collision in timeseries: real user with
display_name = "Unknown user"will be bucketed indistinguishably from unlinked sessions in the UI.The
group_keycolumn doubles as both the join key for UI rendering and the visible label. TheGROUP BYon line 117 correctly keeps a real"Unknown user"-named user separate from truly unlinked sessions at the SQL layer, but because both rows emit the samegroup_keystring, the reducer in the route (which keys its map bygroup_key) and downstream chart code will merge them. With the sentinel switched from the artificial'unknown'to a natural-looking label'Unknown user', the probability of collision is now non-zero.If this matters, keep an opaque sentinel in
group_key(e.g.'__unknown__') and derive the display label in the frontend the same wayUserCelldoes. Not blocking — the timeseries module already has related quirks and the PR description doesn't promise uniqueness here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/control-plane/src/db/analytics-store.ts` around lines 111 - 118, The query's group_key should use an opaque sentinel so real users named "Unknown user" don't collide with unlinked sessions: replace the displayed fallback string in the SELECT COALESCE that defines group_key with an opaque token (e.g. '__unknown__') while keeping the GROUP BY logic as-is; then ensure the frontend rendering (the same codepath that maps '__unlinked__' etc., e.g. UserCell/mapping in the timeseries reducer) converts '__unknown__' to the visible label "Unknown user". Target the COALESCE/MAX(NULLIF(...)) expression that produces AS group_key in analytics-store.ts and update the frontend label mapping to handle the new token.
🧹 Nitpick comments (1)
packages/web/src/lib/analytics.test.ts (1)
54-89: LGTM — covers the happy-path. Optional: add a case wheredisplayNameis missing to lock in the?? keyfallback.A single additional entry without
displayNamemixed in with named entries would guard against a future refactor silently dropping the fallback branch insortAnalyticsUserEntries.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/web/src/lib/analytics.test.ts` around lines 54 - 89, Add a unit test to ensure sortAnalyticsUserEntries falls back to entry.key when displayName is missing: create a case in analytics.test.ts where one entry lacks displayName (omit displayName for e.g., "user-id-3") alongside entries with displayName and assert the resulting order uses the fallback (entry.key) so mapping result.map(e => e.displayName ?? e.key) or checking keys match expected sorted order verifies the ?? key branch in sortAnalyticsUserEntries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@terraform/d1/migrations/0020_backfill_user_ids.sql`:
- Around line 21-33: The INSERT in migration 0020_backfill_user_ids.sql
currently skips scm_logins when multiple users share the same display_name due
to the WHERE (SELECT COUNT(*) ... ) = 1 guard; update the migration to first try
to resolve sessions by existing user_identities (i.e., prefer joining sessions s
to user_identities ui on provider='github' AND provider_user_id=s.scm_login to
get a definitive user_id) and only fall back to using users u via display_name
when no matching user_identities row exists, and additionally add a follow-up
informational SELECT that counts how many scm_logins were skipped by the
plural-display_name guard so operators can see the skipped count during manual
verification; locate and modify the INSERT INTO user_identities ... SELECT ...
FROM (SELECT DISTINCT scm_login FROM sessions ...) s block and add the
supplemental SELECT count(*) ... for skipped scm_logins.
- Around line 6-17: The INSERT in the backfill creates duplicate users because
it generates a new id each run and there is no uniqueness on display_name;
change the INSERT ... SELECT to only create users for scm_logins that are not
already linked in user_identities by adding a WHERE NOT EXISTS clause (e.g.
WHERE NOT EXISTS (SELECT 1 FROM user_identities ui WHERE ui.provider = 'github'
AND ui.provider_id = sessions.scm_login)) and group by scm_login as before; keep
the ON CONFLICT DO NOTHING for safety, but make the primary guard the NOT EXISTS
against user_identities so re-running the migration is idempotent and Steps 3–4
(which rely on user_identities) continue to work.
---
Outside diff comments:
In `@packages/control-plane/src/db/analytics-store.ts`:
- Around line 111-118: The query's group_key should use an opaque sentinel so
real users named "Unknown user" don't collide with unlinked sessions: replace
the displayed fallback string in the SELECT COALESCE that defines group_key with
an opaque token (e.g. '__unknown__') while keeping the GROUP BY logic as-is;
then ensure the frontend rendering (the same codepath that maps '__unlinked__'
etc., e.g. UserCell/mapping in the timeseries reducer) converts '__unknown__' to
the visible label "Unknown user". Target the COALESCE/MAX(NULLIF(...))
expression that produces AS group_key in analytics-store.ts and update the
frontend label mapping to handle the new token.
---
Nitpick comments:
In `@packages/web/src/lib/analytics.test.ts`:
- Around line 54-89: Add a unit test to ensure sortAnalyticsUserEntries falls
back to entry.key when displayName is missing: create a case in
analytics.test.ts where one entry lacks displayName (omit displayName for e.g.,
"user-id-3") alongside entries with displayName and assert the resulting order
uses the fallback (entry.key) so mapping result.map(e => e.displayName ?? e.key)
or checking keys match expected sorted order verifies the ?? key branch in
sortAnalyticsUserEntries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f13824b2-0954-4880-abda-cc4f1413d03c
📒 Files selected for processing (7)
packages/control-plane/src/db/analytics-store.tspackages/control-plane/test/integration/analytics.test.tspackages/web/src/components/analytics/user-table.test.tsxpackages/web/src/components/analytics/user-table.tsxpackages/web/src/lib/analytics.test.tspackages/web/src/lib/analytics.tsterraform/d1/migrations/0020_backfill_user_ids.sql
The backfill creates users via display_name matching which can split identities when the canonical user's display_name differs from their login. The INSERT is also not truly idempotent (no unique constraint on display_name, random id generation). Analytics degrades gracefully without it — historical sessions group by scm_login and age out of the analytics window naturally.
Use '__unknown__' as the timeseries group_key sentinel (matching the breakdown key) and derive the display label 'Unknown user' in the frontend via a labelMap. Prevents label collision if a real user is named 'Unknown user'. Adds labelMap to buildTimeseriesChartData and uses it in chart badges and tooltips. Adds fallback sort test.
Terraform Validation Results
Pushed by: @ColeMurray, Action: |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/control-plane/src/db/analytics-store.ts (1)
111-111: Consider extracting the__unknown__sentinel into a shared constant.The sentinel literal now appears in at least three places across the stack (this file's
getTimeseriesandgetBreakdownSQL, plus the frontendlabelMapconstruction and UserCell logic inpackages/web/src/lib/analytics.ts/user-table.tsx). Since the frontend-backend contract hinges on an exact string match, a single exported constant in@open-inspect/shared(e.g.,UNKNOWN_USER_KEY = "__unknown__") would prevent silent drift if either side is changed in isolation.Low priority — deferrable, but worth doing before the next change to this contract.
Also applies to: 147-147
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/control-plane/src/db/analytics-store.ts` at line 111, Extract the "__unknown__" sentinel into a single exported constant (e.g., UNKNOWN_USER_KEY = "__unknown__") in the shared package (e.g., `@open-inspect/shared`), then replace the literal occurrences in the backend SQL-generating functions getTimeseries and getBreakdown (the COALESCE/AS group_key usages) and in the frontend labelMap/UserCell logic to import and use that constant so both sides share the exact value; update imports and tests accordingly to ensure no remaining hard-coded "__unknown__" strings remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/control-plane/src/db/analytics-store.ts`:
- Line 111: Extract the "__unknown__" sentinel into a single exported constant
(e.g., UNKNOWN_USER_KEY = "__unknown__") in the shared package (e.g.,
`@open-inspect/shared`), then replace the literal occurrences in the backend
SQL-generating functions getTimeseries and getBreakdown (the COALESCE/AS
group_key usages) and in the frontend labelMap/UserCell logic to import and use
that constant so both sides share the exact value; update imports and tests
accordingly to ensure no remaining hard-coded "__unknown__" strings remain.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 309a2da6-e789-4ca6-a540-c692c57c9ad1
📒 Files selected for processing (5)
packages/control-plane/src/db/analytics-store.tspackages/control-plane/test/integration/analytics.test.tspackages/web/src/components/analytics/timeseries-chart.tsxpackages/web/src/lib/analytics.test.tspackages/web/src/lib/analytics.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/control-plane/test/integration/analytics.test.ts
- packages/web/src/lib/analytics.ts
- packages/web/src/lib/analytics.test.ts
Summary
Phase 6 of the unified user model migration (#523). Completes the migration by updating frontend analytics to use canonical display names and backfilling
user_idon historical data.UserCellrendersentry.displayNameinstead of rawentry.key. Unknown sentinel switched from"unknown"to"__unknown__"(coordinated with backend SQL in the same PR). Sort comparison usesdisplayName ?? key. Subtitle updated to "Sessions without linked user".'unknown'→'__unknown__'. Timeseries label'unknown'→'Unknown user'for consistency with breakdown display name.0020_backfill_user_ids.sql): Idempotent 4-step SQL that createsusersrows from distinctscm_loginvalues, links them viauser_identities, then populatessessions.user_idanduser_scm_tokens.user_id. All INSERTs useON CONFLICT DO NOTHING, all UPDATEs useWHERE user_id IS NULL+EXISTSguard.Test plan
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes