Skip to content

feat: frontend displayName + historical user_id backfill (Phase 6)#554

Merged
ColeMurray merged 3 commits intomainfrom
phase-6/frontend-and-backfill
Apr 24, 2026
Merged

feat: frontend displayName + historical user_id backfill (Phase 6)#554
ColeMurray merged 3 commits intomainfrom
phase-6/frontend-and-backfill

Conversation

@ColeMurray
Copy link
Copy Markdown
Owner

@ColeMurray ColeMurray commented Apr 24, 2026

Summary

Phase 6 of the unified user model migration (#523). Completes the migration by updating frontend analytics to use canonical display names and backfilling user_id on historical data.

  • Frontend: UserCell renders entry.displayName instead of raw entry.key. Unknown sentinel switched from "unknown" to "__unknown__" (coordinated with backend SQL in the same PR). Sort comparison uses displayName ?? key. Subtitle updated to "Sessions without linked user".
  • Analytics SQL: Breakdown key sentinel 'unknown''__unknown__'. Timeseries label 'unknown''Unknown user' for consistency with breakdown display name.
  • Backfill migration (0020_backfill_user_ids.sql): Idempotent 4-step SQL that creates users rows from distinct scm_login values, links them via user_identities, then populates sessions.user_id and user_scm_tokens.user_id. All INSERTs use ON CONFLICT DO NOTHING, all UPDATEs use WHERE user_id IS NULL + EXISTS guard.

Test plan

  • Web tests pass (188, +3 new: displayName rendering, unknown row subtitle, displayName sort)
  • Control-plane unit tests pass (1003)
  • Control-plane integration tests pass (327)
  • TypeScript typecheck clean
  • ESLint clean on changed files
  • Verify analytics page renders correctly in dev server with both linked users and "Unknown user" entries
  • Verify backfill migration runs cleanly against production D1

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for displaying human-readable labels for user groups in analytics charts.
    • Improved handling and display of sessions without linked user accounts.
  • Bug Fixes

    • Updated session grouping labels from "Sessions without SCM login" to "Sessions without linked user" for increased clarity.

Switch the unknown user sentinel from 'unknown' to '__unknown__' in both
the analytics SQL and the frontend check, now that they deploy together.
UserCell renders entry.displayName instead of raw entry.key. Sort uses
displayName for the user column. Timeseries label for unlinked sessions
changed to 'Unknown user' for consistency with the breakdown display name.

Adds idempotent D1 migration 0020 to backfill user_id on historical
sessions and user_scm_tokens from scm_login via user_identities.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

The PR standardizes the unknown-user sentinel identifier in the analytics system from unknown to __unknown__, updating database queries, frontend components, display logic, and test expectations consistently across the full stack.

Changes

Cohort / File(s) Summary
Database Layer
packages/control-plane/src/db/analytics-store.ts
Updated SQL coalesce fallbacks in getTimeseries and getBreakdown to use __unknown__ instead of unknown for grouping sessions with missing user identity.
Integration Tests
packages/control-plane/test/integration/analytics.test.ts
Updated test assertions to expect __unknown__ sentinel in timeseries groups and user breakdown entries, while preserving display labels.
Frontend Utilities & Tests
packages/web/src/lib/analytics.ts, packages/web/src/lib/analytics.test.ts
Added labelMap return field to buildTimeseriesChartData mapping __unknown__ to human-readable "Unknown user" label; updated sortAnalyticsUserEntries to prefer sorting by displayName when available.
Frontend Components
packages/web/src/components/analytics/user-table.tsx, packages/web/src/components/analytics/user-table.test.tsx, packages/web/src/components/analytics/timeseries-chart.tsx
Updated UserCell to recognize __unknown__ sentinel, prefer displayName over forced "Unknown user" text, and updated subtitle messaging; added UI test coverage for displayName rendering and unknown user handling; integrated labelMap into chart tooltips and group badges.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

A rabbit hops through data streams so fine, 🐰
Where __unknown__ users now align,
With friendly labels mapped just right,
The dashboard glows with clearer light!
From queries deep to UI's sight—
Analytics dance with cleaner might! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main focus of the PR (Phase 6 of user model migration with frontend displayName updates and historical user_id backfill), matching the substantial changes across frontend components, analytics SQL, and backend logic documented in the PR objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch phase-6/frontend-and-backfill

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

Terraform Validation Results

Step Status
Format
Init
Validate

Note: Terraform plan was skipped because secrets are not configured. This is expected for external contributors. See docs/GETTING_STARTED.md for setup instructions.

Pushed by: @ColeMurray, Action: pull_request

FROM (SELECT DISTINCT scm_login FROM sessions
WHERE scm_login IS NOT NULL AND scm_login != '') s
JOIN users u ON u.display_name = s.scm_login
WHERE (SELECT COUNT(*) FROM users u2 WHERE u2.display_name = s.scm_login) = 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This match on looks unsafe for Phase 6 data. New canonical GitHub users are created from and can have a human-friendly () that differs from the login, with the login stored separately in . In that common case, this migration creates a brand new backfill user named after the login instead of attaching historical sessions to the existing canonical user, so one person ends up split across two rows. Could we backfill via existing GitHub identities ( / ) before creating any new user rows?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 67f8436 — migration removed.

Comment thread terraform/d1/migrations/0020_backfill_user_ids.sql Outdated
Comment thread terraform/d1/migrations/0020_backfill_user_ids.sql Outdated
Comment thread terraform/d1/migrations/0020_backfill_user_ids.sql Outdated
open-inspect[bot]
open-inspect Bot previously requested changes Apr 24, 2026
Copy link
Copy Markdown
Contributor

@open-inspect open-inspect Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary
PR #554, feat: frontend displayName + historical user_id backfill (Phase 6), by @ColeMurray. I reviewed 7 changed files (+164 / -9). The frontend analytics changes look straightforward and well-covered, but the new D1 backfill migration has two correctness issues that can corrupt the unified user model.

Critical Issues

  • [Functionality] terraform/d1/migrations/0020_backfill_user_ids.sql:17 - The Step 1 INSERT is not idempotent. users is only unique on id and non-NULL email, while this migration generates a fresh random id and inserts NULL email, so rerunning it creates duplicate users rows for the same historical scm_login.
  • [Functionality] terraform/d1/migrations/0020_backfill_user_ids.sql:32 - Step 2 matches users via display_name = scm_login, but current GitHub canonical users are keyed by scmUserId and can legitimately have display_name = scmName instead of the login. In that case this migration creates a second synthetic user and backfills historical sessions onto the wrong canonical record.

Suggestions

  • [Testing] terraform/d1/migrations/0020_backfill_user_ids.sql - After the migration logic is fixed, an integration-style migration test would help lock in the two important invariants here: reruns do not create duplicates, and historical sessions attach to an already-existing canonical GitHub user when one is present.

Nitpicks

  • None.

Positive Feedback

  • The frontend/user-table changes are minimal and clear, and the new tests cover the displayName rendering and sort behavior well.
  • The analytics SQL changes for Unknown user are internally consistent across timeseries and breakdown responses.
  • The migration is documented clearly enough that the intended behavior is easy to review.

Questions

  • None.

Verdict
Request Changes: the migration issues above should be fixed before merge, since they can create duplicate canonical users and mis-link historical sessions.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/control-plane/src/db/analytics-store.ts (1)

111-118: ⚠️ Potential issue | 🟡 Minor

Potential label collision in timeseries: real user with display_name = "Unknown user" will be bucketed indistinguishably from unlinked sessions in the UI.

The group_key column doubles as both the join key for UI rendering and the visible label. The GROUP BY on line 117 correctly keeps a real "Unknown user"-named user separate from truly unlinked sessions at the SQL layer, but because both rows emit the same group_key string, the reducer in the route (which keys its map by group_key) and downstream chart code will merge them. With the sentinel switched from the artificial 'unknown' to a natural-looking label 'Unknown user', the probability of collision is now non-zero.

If this matters, keep an opaque sentinel in group_key (e.g. '__unknown__') and derive the display label in the frontend the same way UserCell does. Not blocking — the timeseries module already has related quirks and the PR description doesn't promise uniqueness here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/control-plane/src/db/analytics-store.ts` around lines 111 - 118, The
query's group_key should use an opaque sentinel so real users named "Unknown
user" don't collide with unlinked sessions: replace the displayed fallback
string in the SELECT COALESCE that defines group_key with an opaque token (e.g.
'__unknown__') while keeping the GROUP BY logic as-is; then ensure the frontend
rendering (the same codepath that maps '__unlinked__' etc., e.g.
UserCell/mapping in the timeseries reducer) converts '__unknown__' to the
visible label "Unknown user". Target the COALESCE/MAX(NULLIF(...)) expression
that produces AS group_key in analytics-store.ts and update the frontend label
mapping to handle the new token.
🧹 Nitpick comments (1)
packages/web/src/lib/analytics.test.ts (1)

54-89: LGTM — covers the happy-path. Optional: add a case where displayName is missing to lock in the ?? key fallback.

A single additional entry without displayName mixed in with named entries would guard against a future refactor silently dropping the fallback branch in sortAnalyticsUserEntries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/lib/analytics.test.ts` around lines 54 - 89, Add a unit test
to ensure sortAnalyticsUserEntries falls back to entry.key when displayName is
missing: create a case in analytics.test.ts where one entry lacks displayName
(omit displayName for e.g., "user-id-3") alongside entries with displayName and
assert the resulting order uses the fallback (entry.key) so mapping result.map(e
=> e.displayName ?? e.key) or checking keys match expected sorted order verifies
the ?? key branch in sortAnalyticsUserEntries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@terraform/d1/migrations/0020_backfill_user_ids.sql`:
- Around line 21-33: The INSERT in migration 0020_backfill_user_ids.sql
currently skips scm_logins when multiple users share the same display_name due
to the WHERE (SELECT COUNT(*) ... ) = 1 guard; update the migration to first try
to resolve sessions by existing user_identities (i.e., prefer joining sessions s
to user_identities ui on provider='github' AND provider_user_id=s.scm_login to
get a definitive user_id) and only fall back to using users u via display_name
when no matching user_identities row exists, and additionally add a follow-up
informational SELECT that counts how many scm_logins were skipped by the
plural-display_name guard so operators can see the skipped count during manual
verification; locate and modify the INSERT INTO user_identities ... SELECT ...
FROM (SELECT DISTINCT scm_login FROM sessions ...) s block and add the
supplemental SELECT count(*) ... for skipped scm_logins.
- Around line 6-17: The INSERT in the backfill creates duplicate users because
it generates a new id each run and there is no uniqueness on display_name;
change the INSERT ... SELECT to only create users for scm_logins that are not
already linked in user_identities by adding a WHERE NOT EXISTS clause (e.g.
WHERE NOT EXISTS (SELECT 1 FROM user_identities ui WHERE ui.provider = 'github'
AND ui.provider_id = sessions.scm_login)) and group by scm_login as before; keep
the ON CONFLICT DO NOTHING for safety, but make the primary guard the NOT EXISTS
against user_identities so re-running the migration is idempotent and Steps 3–4
(which rely on user_identities) continue to work.

---

Outside diff comments:
In `@packages/control-plane/src/db/analytics-store.ts`:
- Around line 111-118: The query's group_key should use an opaque sentinel so
real users named "Unknown user" don't collide with unlinked sessions: replace
the displayed fallback string in the SELECT COALESCE that defines group_key with
an opaque token (e.g. '__unknown__') while keeping the GROUP BY logic as-is;
then ensure the frontend rendering (the same codepath that maps '__unlinked__'
etc., e.g. UserCell/mapping in the timeseries reducer) converts '__unknown__' to
the visible label "Unknown user". Target the COALESCE/MAX(NULLIF(...))
expression that produces AS group_key in analytics-store.ts and update the
frontend label mapping to handle the new token.

---

Nitpick comments:
In `@packages/web/src/lib/analytics.test.ts`:
- Around line 54-89: Add a unit test to ensure sortAnalyticsUserEntries falls
back to entry.key when displayName is missing: create a case in
analytics.test.ts where one entry lacks displayName (omit displayName for e.g.,
"user-id-3") alongside entries with displayName and assert the resulting order
uses the fallback (entry.key) so mapping result.map(e => e.displayName ?? e.key)
or checking keys match expected sorted order verifies the ?? key branch in
sortAnalyticsUserEntries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f13824b2-0954-4880-abda-cc4f1413d03c

📥 Commits

Reviewing files that changed from the base of the PR and between e48093a and 8db4309.

📒 Files selected for processing (7)
  • packages/control-plane/src/db/analytics-store.ts
  • packages/control-plane/test/integration/analytics.test.ts
  • packages/web/src/components/analytics/user-table.test.tsx
  • packages/web/src/components/analytics/user-table.tsx
  • packages/web/src/lib/analytics.test.ts
  • packages/web/src/lib/analytics.ts
  • terraform/d1/migrations/0020_backfill_user_ids.sql

Comment thread terraform/d1/migrations/0020_backfill_user_ids.sql Outdated
Comment thread terraform/d1/migrations/0020_backfill_user_ids.sql Outdated
The backfill creates users via display_name matching which can split
identities when the canonical user's display_name differs from their
login. The INSERT is also not truly idempotent (no unique constraint
on display_name, random id generation). Analytics degrades gracefully
without it — historical sessions group by scm_login and age out of
the analytics window naturally.
Use '__unknown__' as the timeseries group_key sentinel (matching the
breakdown key) and derive the display label 'Unknown user' in the
frontend via a labelMap. Prevents label collision if a real user is
named 'Unknown user'. Adds labelMap to buildTimeseriesChartData and
uses it in chart badges and tooltips. Adds fallback sort test.
@github-actions
Copy link
Copy Markdown

Terraform Validation Results

Step Status
Format
Init
Validate

Note: Terraform plan was skipped because secrets are not configured. This is expected for external contributors. See docs/GETTING_STARTED.md for setup instructions.

Pushed by: @ColeMurray, Action: pull_request

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/control-plane/src/db/analytics-store.ts (1)

111-111: Consider extracting the __unknown__ sentinel into a shared constant.

The sentinel literal now appears in at least three places across the stack (this file's getTimeseries and getBreakdown SQL, plus the frontend labelMap construction and UserCell logic in packages/web/src/lib/analytics.ts / user-table.tsx). Since the frontend-backend contract hinges on an exact string match, a single exported constant in @open-inspect/shared (e.g., UNKNOWN_USER_KEY = "__unknown__") would prevent silent drift if either side is changed in isolation.

Low priority — deferrable, but worth doing before the next change to this contract.

Also applies to: 147-147

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/control-plane/src/db/analytics-store.ts` at line 111, Extract the
"__unknown__" sentinel into a single exported constant (e.g., UNKNOWN_USER_KEY =
"__unknown__") in the shared package (e.g., `@open-inspect/shared`), then replace
the literal occurrences in the backend SQL-generating functions getTimeseries
and getBreakdown (the COALESCE/AS group_key usages) and in the frontend
labelMap/UserCell logic to import and use that constant so both sides share the
exact value; update imports and tests accordingly to ensure no remaining
hard-coded "__unknown__" strings remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/control-plane/src/db/analytics-store.ts`:
- Line 111: Extract the "__unknown__" sentinel into a single exported constant
(e.g., UNKNOWN_USER_KEY = "__unknown__") in the shared package (e.g.,
`@open-inspect/shared`), then replace the literal occurrences in the backend
SQL-generating functions getTimeseries and getBreakdown (the COALESCE/AS
group_key usages) and in the frontend labelMap/UserCell logic to import and use
that constant so both sides share the exact value; update imports and tests
accordingly to ensure no remaining hard-coded "__unknown__" strings remain.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 309a2da6-e789-4ca6-a540-c692c57c9ad1

📥 Commits

Reviewing files that changed from the base of the PR and between 8db4309 and 8afc68c.

📒 Files selected for processing (5)
  • packages/control-plane/src/db/analytics-store.ts
  • packages/control-plane/test/integration/analytics.test.ts
  • packages/web/src/components/analytics/timeseries-chart.tsx
  • packages/web/src/lib/analytics.test.ts
  • packages/web/src/lib/analytics.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/control-plane/test/integration/analytics.test.ts
  • packages/web/src/lib/analytics.ts
  • packages/web/src/lib/analytics.test.ts

@ColeMurray ColeMurray merged commit f45694a into main Apr 24, 2026
18 checks passed
@ColeMurray ColeMurray deleted the phase-6/frontend-and-backfill branch April 24, 2026 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant