fix: enrich bot session participants with linked GitHub identity#579
fix: enrich bot session participants with linked GitHub identity#579ColeMurray merged 3 commits intomainfrom
Conversation
…correct attribution Bot-originated sessions had two attribution failures: git commits showed incorrect author identity and PRs always opened as the GitHub App instead of the user. Root cause: owner participants were created with user_id "anonymous" (unfindable by later prompts) and resolved D1 identity was discarded instead of forwarded to the DO. - Add deriveUserId() to construct canonical userId from spawn source, matching the format bots use for prompt authorId - Add resolveGitHubEnrichment() to look up linked GitHub identity from D1, resolving display name, email (with noreply fallback), and OAuth tokens in parallel - Enrich owner participant at session creation with GitHub identity - Enrich non-owner participants at prompt time for multi-user workflows - Add getEncryptedTokens() to UserScmTokenStore for raw D1 ciphertext - Expand EnqueuePromptRequest with identity + token fields - Add COALESCE update in enqueuePromptFromApi for participant enrichment
Terraform Validation Results
Pushed by: @ColeMurray, Action: |
📝 WalkthroughWalkthroughAdds encrypted SCM token retrieval, provider-scoped authorId parsing and userId derivation, GitHub identity enrichment during session creation and prompt handling, and propagation of optional identity/token fields into session/message-queue participant creation and conditional coalescing. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Router as Router
participant D1 as D1 Database
participant SessionDO as Session DO
Client->>Router: POST /session (body with spawnSource/scmUserId/actorUserId)
Router->>Router: deriveUserId(body)
Router->>D1: getEncryptedTokens(provider_user_id)
D1-->>Router: encrypted tokens + identity (or null)
Router->>Router: enrich session payload with scm* and token fields
Router->>SessionDO: Create session with enriched data
SessionDO-->>Router: Session created
Router-->>Client: Session response
sequenceDiagram
participant Client as Client
participant Router as Router
participant D1 as D1 Database
participant SessionDO as Session DO
participant MsgQueue as Message Queue
participant ParticipantSvc as Participant Service
Client->>Router: POST /prompt (authorId or provider:providerUserId)
Router->>Router: parseAuthorId(authorId)
alt provider-scoped authorId
Router->>D1: getEncryptedTokens(provider_user_id)
D1-->>Router: author identity + encrypted tokens
Router->>SessionDO: Forward enriched prompt (adds authorDisplayName/authorEmail/authorLogin + scm* fields)
else plain authorId
Router->>SessionDO: Forward prompt (no enrichment)
end
SessionDO->>MsgQueue: enqueuePromptFromApi(enrichedRequest)
MsgQueue->>ParticipantSvc: create participant (displayName = authorDisplayName || authorId)
alt enrichment or tokens present
MsgQueue->>ParticipantSvc: updateParticipantCoalesce(...scm/token fields...)
ParticipantSvc-->>MsgQueue: getParticipantById(updated)
end
MsgQueue->>MsgQueue: create message + enqueue
MsgQueue-->>SessionDO: Message queued
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 6/8 reviews remaining, refill in 10 minutes and 14 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/control-plane/src/router.ts`:
- Around line 790-807: deriveUserId currently returns body.userId immediately
which lets non-canonical values like "anonymous" short-circuit provider-specific
derivation; change deriveUserId so it only returns body.userId early when it
already looks canonical (e.g. starts with a provider prefix such as "github:",
"slack:", "linear:"); otherwise fall through to the existing switch on
spawnSource and build the canonical id using scmUserId or actorUserId (refer to
deriveUserId, spawnSource, scmUserId, actorUserId).
In `@packages/control-plane/src/session/message-queue.ts`:
- Around line 343-355: The COALESCE update is only triggered for authorEmail,
authorLogin, or scmAccessTokenEncrypted but should run whenever any enrichment
field is present; update the conditional guarding the call to
updateParticipantCoalesce (and subsequent refresh of participant via
getParticipantById) to check for any of the enrichment fields:
authorDisplayName, authorEmail, authorLogin, scmUserId, scmAccessTokenEncrypted,
scmRefreshTokenEncrypted, or scmTokenExpiresAt (e.g., change the if to test
presence of any of those properties on data or use a small helper that returns
true if any of those keys are defined) so the participant gets enriched for
payloads carrying any of these fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d1154ef8-606a-42a1-8ded-0c60baa71de5
📒 Files selected for processing (7)
packages/control-plane/src/db/user-scm-tokens.test.tspackages/control-plane/src/db/user-scm-tokens.tspackages/control-plane/src/router.identity.test.tspackages/control-plane/src/router.tspackages/control-plane/src/session/message-queue.test.tspackages/control-plane/src/session/message-queue.tspackages/control-plane/src/session/services/message.service.ts
Terraform Validation Results
Pushed by: @ColeMurray, Action: |
- deriveUserId: bot spawn sources now always derive from identity fields, ignoring any explicit userId. Only non-bot sources (user/agent/automation) use body.userId. Prevents bypass if a bot ever sent userId explicitly. - COALESCE guard: widen condition to trigger on any enrichment field (authorDisplayName, authorEmail, authorLogin, scmUserId, or scmAccessTokenEncrypted) instead of only 3 fields.
Terraform Validation Results
Pushed by: @ColeMurray, Action: |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
packages/control-plane/src/session/message-queue.ts (1)
344-350:⚠️ Potential issue | 🟠 MajorCOALESCE guard still misses refresh/expiry-only enrichment payloads.
Line 344 only gates on a subset of enrichment fields. Requests that include only
scmRefreshTokenEncryptedorscmTokenExpiresAtskipupdateParticipantCoalesce, so enrichment is silently dropped.Suggested fix
- const hasEnrichment = - data.authorDisplayName || - data.authorEmail || - data.authorLogin || - data.scmUserId || - data.scmAccessTokenEncrypted; + const hasEnrichment = + data.authorDisplayName !== undefined || + data.authorEmail !== undefined || + data.authorLogin !== undefined || + data.scmUserId !== undefined || + data.scmAccessTokenEncrypted !== undefined || + data.scmRefreshTokenEncrypted !== undefined || + data.scmTokenExpiresAt !== undefined;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/control-plane/src/session/message-queue.ts` around lines 344 - 350, The coalesce guard currently only checks a subset of enrichment fields and therefore skips updateParticipantCoalesce when payloads contain only refresh/expiry info; extend the hasEnrichment boolean in message-queue.ts to also include data.scmRefreshTokenEncrypted and data.scmTokenExpiresAt so that calls to updateParticipantCoalesce run for refresh-token or token-expiry-only enrichment payloads (keep the same variable name hasEnrichment and ensure updateParticipantCoalesce is invoked when it becomes true).
🧹 Nitpick comments (1)
packages/control-plane/src/router.identity.test.ts (1)
80-82: Test title is misleading for the scenario.Line 80 says “unknown spawnSource”, but the case uses
spawnSource: "user"with missinguserId. Consider renaming for clarity.Suggested rename
- it("falls back to anonymous for unknown spawnSource", () => { + it("falls back to anonymous for user spawnSource when userId is missing", () => { expect(deriveUserId({ spawnSource: "user" })).toBe("anonymous"); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/control-plane/src/router.identity.test.ts` around lines 80 - 82, The test title is misleading: update the test description for the case that passes deriveUserId({ spawnSource: "user" }) to indicate that spawnSource is "user" but userId is missing, e.g., rename the it(...) string to something like "falls back to anonymous when spawnSource is 'user' but userId is missing" so the behavior around deriveUserId({ spawnSource: "user" }) is clearly described.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/control-plane/src/router.ts`:
- Around line 959-963: The current conditional only backfills all token fields
when scmTokenEncrypted is falsy, causing partial enrichment loss; change the
logic to backfill each field independently by assigning scmTokenEncrypted =
scmTokenEncrypted ?? enrichment.accessTokenEncrypted ?? null,
scmRefreshTokenEncrypted = scmRefreshTokenEncrypted ??
enrichment.refreshTokenEncrypted ?? null, and scmTokenExpiresAt =
scmTokenExpiresAt ?? enrichment.tokenExpiresAt so missing access token, refresh
token, or expiry are each populated from enrichment when available.
---
Duplicate comments:
In `@packages/control-plane/src/session/message-queue.ts`:
- Around line 344-350: The coalesce guard currently only checks a subset of
enrichment fields and therefore skips updateParticipantCoalesce when payloads
contain only refresh/expiry info; extend the hasEnrichment boolean in
message-queue.ts to also include data.scmRefreshTokenEncrypted and
data.scmTokenExpiresAt so that calls to updateParticipantCoalesce run for
refresh-token or token-expiry-only enrichment payloads (keep the same variable
name hasEnrichment and ensure updateParticipantCoalesce is invoked when it
becomes true).
---
Nitpick comments:
In `@packages/control-plane/src/router.identity.test.ts`:
- Around line 80-82: The test title is misleading: update the test description
for the case that passes deriveUserId({ spawnSource: "user" }) to indicate that
spawnSource is "user" but userId is missing, e.g., rename the it(...) string to
something like "falls back to anonymous when spawnSource is 'user' but userId is
missing" so the behavior around deriveUserId({ spawnSource: "user" }) is clearly
described.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 96f40d9e-806f-4527-ad9a-84d0679b8e0d
📒 Files selected for processing (3)
packages/control-plane/src/router.identity.test.tspackages/control-plane/src/router.tspackages/control-plane/src/session/message-queue.ts
…eMurray#579) ## Summary Fixes git commit and PR attribution for bot-originated sessions (GitHub, Slack, Linear). This is Steps 2-4 of the attribution fix plan (Step 1 was ColeMurray#577). **Problem**: Bot sessions created owner participants with `user_id: "anonymous"` (unfindable by later prompts), and resolved D1 identity was discarded — never forwarded to the DO. Commits showed incorrect author identity and PRs always opened as `open-inspect[bot]`. **Solution**: Enrich participants with linked GitHub identity at two points: - **Session creation** (owner): `deriveUserId()` constructs canonical userId matching prompt `authorId` format. `resolveGitHubEnrichment()` looks up linked GitHub identity from D1 (display name, email, OAuth tokens) and forwards to DO init. - **Prompt time** (non-owner): `parseAuthorId()` extracts provider info from bot authorIds, resolves linked GitHub identity, and forwards enrichment fields through `EnqueuePromptRequest` to the DO for COALESCE update. **Key design decisions**: - Email resolution: actual GitHub identity email preferred, noreply format as fallback - D1 queries parallelized where independent (`getUserById` + `getEncryptedTokens`) - Best-effort enrichment (try/catch) — D1 failures degrade gracefully to existing behavior - No DO changes needed — existing init/COALESCE/token refresh infrastructure handles enriched data - Web client path unaffected — `parseAuthorId` returns null for plain user IDs, `deriveUserId` passes through explicit `userId` ### Files changed | File | Change | |---|---| | `router.ts` | `deriveUserId()`, `parseAuthorId()`, `resolveGitHubEnrichment()`, session creation + prompt time enrichment | | `user-scm-tokens.ts` | `getEncryptedTokens()` — returns raw D1 ciphertext without decrypting | | `message.service.ts` | Expand `EnqueuePromptRequest` with identity + token fields | | `message-queue.ts` | Use `EnqueuePromptRequest` type, COALESCE update for non-owner participants | | `router.identity.test.ts` | 16 tests for `parseAuthorId` and `deriveUserId` | | `user-scm-tokens.test.ts` | 2 tests for `getEncryptedTokens` | | `message-queue.test.ts` | 4 tests for `enqueuePromptFromApi` enrichment path | ## Test plan - [x] `parseAuthorId` — 7 cases (github/slack/linear parse, web client null, anonymous null, unknown prefix null, empty null) - [x] `deriveUserId` — 9 cases (explicit userId, each bot with/without identity fields, unknown source, empty input) - [x] `getEncryptedTokens` — round-trip returns ciphertext not plaintext, null for unknown user - [x] `enqueuePromptFromApi` — COALESCE runs with enrichment fields, skipped without them, `authorDisplayName` used for new participant name, fallback to `authorId` - [x] Typecheck passes (only pre-existing `CacheStore` errors) - [x] 1019 control-plane unit tests pass (10 pre-existing failures in `models.test.ts`) - [x] 93 github-bot tests pass (7 pre-existing failures in `webhook.test.ts`) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * GitHub identity enrichment during session creation; bot-origin prompts are optionally enriched with author display/email/login and SCM token/identity fields. * Canonical user ID derivation for bot authors. * Prompt requests now carry enrichment fields; participant creation prefers provided display name and will coalesce SCM identity/token data when present. * **Tests** * Added tests for identity parsing, canonical user ID derivation, encrypted token retrieval, and message-queue participant coalescing behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Fixes git commit and PR attribution for bot-originated sessions (GitHub, Slack, Linear). This is Steps 2-4 of the attribution fix plan (Step 1 was #577).
Problem: Bot sessions created owner participants with
user_id: "anonymous"(unfindable by later prompts), and resolved D1 identity was discarded — never forwarded to the DO. Commits showed incorrect author identity and PRs always opened asopen-inspect[bot].Solution: Enrich participants with linked GitHub identity at two points:
deriveUserId()constructs canonical userId matching promptauthorIdformat.resolveGitHubEnrichment()looks up linked GitHub identity from D1 (display name, email, OAuth tokens) and forwards to DO init.parseAuthorId()extracts provider info from bot authorIds, resolves linked GitHub identity, and forwards enrichment fields throughEnqueuePromptRequestto the DO for COALESCE update.Key design decisions:
getUserById+getEncryptedTokens)parseAuthorIdreturns null for plain user IDs,deriveUserIdpasses through explicituserIdFiles changed
router.tsderiveUserId(),parseAuthorId(),resolveGitHubEnrichment(), session creation + prompt time enrichmentuser-scm-tokens.tsgetEncryptedTokens()— returns raw D1 ciphertext without decryptingmessage.service.tsEnqueuePromptRequestwith identity + token fieldsmessage-queue.tsEnqueuePromptRequesttype, COALESCE update for non-owner participantsrouter.identity.test.tsparseAuthorIdandderiveUserIduser-scm-tokens.test.tsgetEncryptedTokensmessage-queue.test.tsenqueuePromptFromApienrichment pathTest plan
parseAuthorId— 7 cases (github/slack/linear parse, web client null, anonymous null, unknown prefix null, empty null)deriveUserId— 9 cases (explicit userId, each bot with/without identity fields, unknown source, empty input)getEncryptedTokens— round-trip returns ciphertext not plaintext, null for unknown userenqueuePromptFromApi— COALESCE runs with enrichment fields, skipped without them,authorDisplayNameused for new participant name, fallback toauthorIdCacheStoreerrors)models.test.ts)webhook.test.ts)Summary by CodeRabbit
New Features
Tests