Skip to content

fix: enrich bot session participants with linked GitHub identity#579

Merged
ColeMurray merged 3 commits intomainfrom
fix/git-pr-attribution-enrichment
Apr 29, 2026
Merged

fix: enrich bot session participants with linked GitHub identity#579
ColeMurray merged 3 commits intomainfrom
fix/git-pr-attribution-enrichment

Conversation

@ColeMurray
Copy link
Copy Markdown
Owner

@ColeMurray ColeMurray commented Apr 29, 2026

Summary

Fixes git commit and PR attribution for bot-originated sessions (GitHub, Slack, Linear). This is Steps 2-4 of the attribution fix plan (Step 1 was #577).

Problem: Bot sessions created owner participants with user_id: "anonymous" (unfindable by later prompts), and resolved D1 identity was discarded — never forwarded to the DO. Commits showed incorrect author identity and PRs always opened as open-inspect[bot].

Solution: Enrich participants with linked GitHub identity at two points:

  • Session creation (owner): deriveUserId() constructs canonical userId matching prompt authorId format. resolveGitHubEnrichment() looks up linked GitHub identity from D1 (display name, email, OAuth tokens) and forwards to DO init.
  • Prompt time (non-owner): parseAuthorId() extracts provider info from bot authorIds, resolves linked GitHub identity, and forwards enrichment fields through EnqueuePromptRequest to the DO for COALESCE update.

Key design decisions:

  • Email resolution: actual GitHub identity email preferred, noreply format as fallback
  • D1 queries parallelized where independent (getUserById + getEncryptedTokens)
  • Best-effort enrichment (try/catch) — D1 failures degrade gracefully to existing behavior
  • No DO changes needed — existing init/COALESCE/token refresh infrastructure handles enriched data
  • Web client path unaffected — parseAuthorId returns null for plain user IDs, deriveUserId passes through explicit userId

Files changed

File Change
router.ts deriveUserId(), parseAuthorId(), resolveGitHubEnrichment(), session creation + prompt time enrichment
user-scm-tokens.ts getEncryptedTokens() — returns raw D1 ciphertext without decrypting
message.service.ts Expand EnqueuePromptRequest with identity + token fields
message-queue.ts Use EnqueuePromptRequest type, COALESCE update for non-owner participants
router.identity.test.ts 16 tests for parseAuthorId and deriveUserId
user-scm-tokens.test.ts 2 tests for getEncryptedTokens
message-queue.test.ts 4 tests for enqueuePromptFromApi enrichment path

Test plan

  • parseAuthorId — 7 cases (github/slack/linear parse, web client null, anonymous null, unknown prefix null, empty null)
  • deriveUserId — 9 cases (explicit userId, each bot with/without identity fields, unknown source, empty input)
  • getEncryptedTokens — round-trip returns ciphertext not plaintext, null for unknown user
  • enqueuePromptFromApi — COALESCE runs with enrichment fields, skipped without them, authorDisplayName used for new participant name, fallback to authorId
  • Typecheck passes (only pre-existing CacheStore errors)
  • 1019 control-plane unit tests pass (10 pre-existing failures in models.test.ts)
  • 93 github-bot tests pass (7 pre-existing failures in webhook.test.ts)

Summary by CodeRabbit

  • New Features

    • GitHub identity enrichment during session creation; bot-origin prompts are optionally enriched with author display/email/login and SCM token/identity fields.
    • Canonical user ID derivation for bot authors.
    • Prompt requests now carry enrichment fields; participant creation prefers provided display name and will coalesce SCM identity/token data when present.
  • Tests

    • Added tests for identity parsing, canonical user ID derivation, encrypted token retrieval, and message-queue participant coalescing behavior.

…correct attribution

Bot-originated sessions had two attribution failures: git commits showed
incorrect author identity and PRs always opened as the GitHub App instead
of the user. Root cause: owner participants were created with user_id
"anonymous" (unfindable by later prompts) and resolved D1 identity was
discarded instead of forwarded to the DO.

- Add deriveUserId() to construct canonical userId from spawn source,
  matching the format bots use for prompt authorId
- Add resolveGitHubEnrichment() to look up linked GitHub identity from
  D1, resolving display name, email (with noreply fallback), and OAuth
  tokens in parallel
- Enrich owner participant at session creation with GitHub identity
- Enrich non-owner participants at prompt time for multi-user workflows
- Add getEncryptedTokens() to UserScmTokenStore for raw D1 ciphertext
- Expand EnqueuePromptRequest with identity + token fields
- Add COALESCE update in enqueuePromptFromApi for participant enrichment
@github-actions
Copy link
Copy Markdown

Terraform Validation Results

Step Status
Format
Init
Validate

Note: Terraform plan was skipped because secrets are not configured. This is expected for external contributors. See docs/GETTING_STARTED.md for setup instructions.

Pushed by: @ColeMurray, Action: pull_request

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

📝 Walkthrough

Walkthrough

Adds encrypted SCM token retrieval, provider-scoped authorId parsing and userId derivation, GitHub identity enrichment during session creation and prompt handling, and propagation of optional identity/token fields into session/message-queue participant creation and conditional coalescing.

Changes

Cohort / File(s) Summary
Database Layer
packages/control-plane/src/db/user-scm-tokens.ts, packages/control-plane/src/db/user-scm-tokens.test.ts
Adds EncryptedScmTokenRecord and UserScmTokenStore.getEncryptedTokens() to fetch encrypted access/refresh tokens and expiry by provider_user_id; tests confirm ciphertext is returned and null when absent.
Router & Identity Utilities
packages/control-plane/src/router.ts, packages/control-plane/src/router.identity.test.ts
Adds parseAuthorId() and deriveUserId(); session creation now derives canonical userId and conditionally backfills GitHub scm* fields and encrypted tokens when a linked identity exists; tests for parsing/derivation added.
Session API & Message Queue
packages/control-plane/src/session/services/message.service.ts, packages/control-plane/src/session/message-queue.ts, packages/control-plane/src/session/message-queue.test.ts
Extends EnqueuePromptRequest with optional author identity and encrypted SCM token fields; enqueuePromptFromApi accepts the shaped request, uses `authorDisplayName

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Router as Router
    participant D1 as D1 Database
    participant SessionDO as Session DO

    Client->>Router: POST /session (body with spawnSource/scmUserId/actorUserId)
    Router->>Router: deriveUserId(body)
    Router->>D1: getEncryptedTokens(provider_user_id)
    D1-->>Router: encrypted tokens + identity (or null)
    Router->>Router: enrich session payload with scm* and token fields
    Router->>SessionDO: Create session with enriched data
    SessionDO-->>Router: Session created
    Router-->>Client: Session response
Loading
sequenceDiagram
    participant Client as Client
    participant Router as Router
    participant D1 as D1 Database
    participant SessionDO as Session DO
    participant MsgQueue as Message Queue
    participant ParticipantSvc as Participant Service

    Client->>Router: POST /prompt (authorId or provider:providerUserId)
    Router->>Router: parseAuthorId(authorId)
    alt provider-scoped authorId
        Router->>D1: getEncryptedTokens(provider_user_id)
        D1-->>Router: author identity + encrypted tokens
        Router->>SessionDO: Forward enriched prompt (adds authorDisplayName/authorEmail/authorLogin + scm* fields)
    else plain authorId
        Router->>SessionDO: Forward prompt (no enrichment)
    end
    SessionDO->>MsgQueue: enqueuePromptFromApi(enrichedRequest)
    MsgQueue->>ParticipantSvc: create participant (displayName = authorDisplayName || authorId)
    alt enrichment or tokens present
        MsgQueue->>ParticipantSvc: updateParticipantCoalesce(...scm/token fields...)
        ParticipantSvc-->>MsgQueue: getParticipantById(updated)
    end
    MsgQueue->>MsgQueue: create message + enqueue
    MsgQueue-->>SessionDO: Message queued
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • open-inspect

Poem

🐰 I hop with keys and secrets bright,
I parse the id and guard the byte,
Encrypted hops through session night,
Participants bloom in soft moonlight,
A little rabbit, tokens tight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately reflects the main change: enriching bot session participants with linked GitHub identity, which is the core objective across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/git-pr-attribution-enrichment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 6/8 reviews remaining, refill in 10 minutes and 14 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/control-plane/src/router.ts`:
- Around line 790-807: deriveUserId currently returns body.userId immediately
which lets non-canonical values like "anonymous" short-circuit provider-specific
derivation; change deriveUserId so it only returns body.userId early when it
already looks canonical (e.g. starts with a provider prefix such as "github:",
"slack:", "linear:"); otherwise fall through to the existing switch on
spawnSource and build the canonical id using scmUserId or actorUserId (refer to
deriveUserId, spawnSource, scmUserId, actorUserId).

In `@packages/control-plane/src/session/message-queue.ts`:
- Around line 343-355: The COALESCE update is only triggered for authorEmail,
authorLogin, or scmAccessTokenEncrypted but should run whenever any enrichment
field is present; update the conditional guarding the call to
updateParticipantCoalesce (and subsequent refresh of participant via
getParticipantById) to check for any of the enrichment fields:
authorDisplayName, authorEmail, authorLogin, scmUserId, scmAccessTokenEncrypted,
scmRefreshTokenEncrypted, or scmTokenExpiresAt (e.g., change the if to test
presence of any of those properties on data or use a small helper that returns
true if any of those keys are defined) so the participant gets enriched for
payloads carrying any of these fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1154ef8-606a-42a1-8ded-0c60baa71de5

📥 Commits

Reviewing files that changed from the base of the PR and between bfbcc3f and 64bf43f.

📒 Files selected for processing (7)
  • packages/control-plane/src/db/user-scm-tokens.test.ts
  • packages/control-plane/src/db/user-scm-tokens.ts
  • packages/control-plane/src/router.identity.test.ts
  • packages/control-plane/src/router.ts
  • packages/control-plane/src/session/message-queue.test.ts
  • packages/control-plane/src/session/message-queue.ts
  • packages/control-plane/src/session/services/message.service.ts

Comment thread packages/control-plane/src/router.ts
Comment thread packages/control-plane/src/session/message-queue.ts
@github-actions
Copy link
Copy Markdown

Terraform Validation Results

Step Status
Format
Init
Validate

Note: Terraform plan was skipped because secrets are not configured. This is expected for external contributors. See docs/GETTING_STARTED.md for setup instructions.

Pushed by: @ColeMurray, Action: pull_request

- deriveUserId: bot spawn sources now always derive from identity fields,
  ignoring any explicit userId. Only non-bot sources (user/agent/automation)
  use body.userId. Prevents bypass if a bot ever sent userId explicitly.
- COALESCE guard: widen condition to trigger on any enrichment field
  (authorDisplayName, authorEmail, authorLogin, scmUserId, or
  scmAccessTokenEncrypted) instead of only 3 fields.
@github-actions
Copy link
Copy Markdown

Terraform Validation Results

Step Status
Format
Init
Validate

Note: Terraform plan was skipped because secrets are not configured. This is expected for external contributors. See docs/GETTING_STARTED.md for setup instructions.

Pushed by: @ColeMurray, Action: pull_request

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/control-plane/src/session/message-queue.ts (1)

344-350: ⚠️ Potential issue | 🟠 Major

COALESCE guard still misses refresh/expiry-only enrichment payloads.

Line 344 only gates on a subset of enrichment fields. Requests that include only scmRefreshTokenEncrypted or scmTokenExpiresAt skip updateParticipantCoalesce, so enrichment is silently dropped.

Suggested fix
-    const hasEnrichment =
-      data.authorDisplayName ||
-      data.authorEmail ||
-      data.authorLogin ||
-      data.scmUserId ||
-      data.scmAccessTokenEncrypted;
+    const hasEnrichment =
+      data.authorDisplayName !== undefined ||
+      data.authorEmail !== undefined ||
+      data.authorLogin !== undefined ||
+      data.scmUserId !== undefined ||
+      data.scmAccessTokenEncrypted !== undefined ||
+      data.scmRefreshTokenEncrypted !== undefined ||
+      data.scmTokenExpiresAt !== undefined;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/control-plane/src/session/message-queue.ts` around lines 344 - 350,
The coalesce guard currently only checks a subset of enrichment fields and
therefore skips updateParticipantCoalesce when payloads contain only
refresh/expiry info; extend the hasEnrichment boolean in message-queue.ts to
also include data.scmRefreshTokenEncrypted and data.scmTokenExpiresAt so that
calls to updateParticipantCoalesce run for refresh-token or token-expiry-only
enrichment payloads (keep the same variable name hasEnrichment and ensure
updateParticipantCoalesce is invoked when it becomes true).
🧹 Nitpick comments (1)
packages/control-plane/src/router.identity.test.ts (1)

80-82: Test title is misleading for the scenario.

Line 80 says “unknown spawnSource”, but the case uses spawnSource: "user" with missing userId. Consider renaming for clarity.

Suggested rename
-  it("falls back to anonymous for unknown spawnSource", () => {
+  it("falls back to anonymous for user spawnSource when userId is missing", () => {
     expect(deriveUserId({ spawnSource: "user" })).toBe("anonymous");
   });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/control-plane/src/router.identity.test.ts` around lines 80 - 82, The
test title is misleading: update the test description for the case that passes
deriveUserId({ spawnSource: "user" }) to indicate that spawnSource is "user" but
userId is missing, e.g., rename the it(...) string to something like "falls back
to anonymous when spawnSource is 'user' but userId is missing" so the behavior
around deriveUserId({ spawnSource: "user" }) is clearly described.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/control-plane/src/router.ts`:
- Around line 959-963: The current conditional only backfills all token fields
when scmTokenEncrypted is falsy, causing partial enrichment loss; change the
logic to backfill each field independently by assigning scmTokenEncrypted =
scmTokenEncrypted ?? enrichment.accessTokenEncrypted ?? null,
scmRefreshTokenEncrypted = scmRefreshTokenEncrypted ??
enrichment.refreshTokenEncrypted ?? null, and scmTokenExpiresAt =
scmTokenExpiresAt ?? enrichment.tokenExpiresAt so missing access token, refresh
token, or expiry are each populated from enrichment when available.

---

Duplicate comments:
In `@packages/control-plane/src/session/message-queue.ts`:
- Around line 344-350: The coalesce guard currently only checks a subset of
enrichment fields and therefore skips updateParticipantCoalesce when payloads
contain only refresh/expiry info; extend the hasEnrichment boolean in
message-queue.ts to also include data.scmRefreshTokenEncrypted and
data.scmTokenExpiresAt so that calls to updateParticipantCoalesce run for
refresh-token or token-expiry-only enrichment payloads (keep the same variable
name hasEnrichment and ensure updateParticipantCoalesce is invoked when it
becomes true).

---

Nitpick comments:
In `@packages/control-plane/src/router.identity.test.ts`:
- Around line 80-82: The test title is misleading: update the test description
for the case that passes deriveUserId({ spawnSource: "user" }) to indicate that
spawnSource is "user" but userId is missing, e.g., rename the it(...) string to
something like "falls back to anonymous when spawnSource is 'user' but userId is
missing" so the behavior around deriveUserId({ spawnSource: "user" }) is clearly
described.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96f40d9e-806f-4527-ad9a-84d0679b8e0d

📥 Commits

Reviewing files that changed from the base of the PR and between faa1e85 and b1c3921.

📒 Files selected for processing (3)
  • packages/control-plane/src/router.identity.test.ts
  • packages/control-plane/src/router.ts
  • packages/control-plane/src/session/message-queue.ts

Comment thread packages/control-plane/src/router.ts
@ColeMurray ColeMurray merged commit 323e35e into main Apr 29, 2026
18 checks passed
@ColeMurray ColeMurray deleted the fix/git-pr-attribution-enrichment branch April 29, 2026 07:12
MartinRoberts-Fountain pushed a commit to MartinRoberts-Fountain/background-agents that referenced this pull request Apr 29, 2026
…eMurray#579)

## Summary

Fixes git commit and PR attribution for bot-originated sessions (GitHub,
Slack, Linear). This is Steps 2-4 of the attribution fix plan (Step 1
was ColeMurray#577).

**Problem**: Bot sessions created owner participants with `user_id:
"anonymous"` (unfindable by later prompts), and resolved D1 identity was
discarded — never forwarded to the DO. Commits showed incorrect author
identity and PRs always opened as `open-inspect[bot]`.

**Solution**: Enrich participants with linked GitHub identity at two
points:

- **Session creation** (owner): `deriveUserId()` constructs canonical
userId matching prompt `authorId` format. `resolveGitHubEnrichment()`
looks up linked GitHub identity from D1 (display name, email, OAuth
tokens) and forwards to DO init.
- **Prompt time** (non-owner): `parseAuthorId()` extracts provider info
from bot authorIds, resolves linked GitHub identity, and forwards
enrichment fields through `EnqueuePromptRequest` to the DO for COALESCE
update.

**Key design decisions**:
- Email resolution: actual GitHub identity email preferred, noreply
format as fallback
- D1 queries parallelized where independent (`getUserById` +
`getEncryptedTokens`)
- Best-effort enrichment (try/catch) — D1 failures degrade gracefully to
existing behavior
- No DO changes needed — existing init/COALESCE/token refresh
infrastructure handles enriched data
- Web client path unaffected — `parseAuthorId` returns null for plain
user IDs, `deriveUserId` passes through explicit `userId`

### Files changed

| File | Change |
|---|---|
| `router.ts` | `deriveUserId()`, `parseAuthorId()`,
`resolveGitHubEnrichment()`, session creation + prompt time enrichment |
| `user-scm-tokens.ts` | `getEncryptedTokens()` — returns raw D1
ciphertext without decrypting |
| `message.service.ts` | Expand `EnqueuePromptRequest` with identity +
token fields |
| `message-queue.ts` | Use `EnqueuePromptRequest` type, COALESCE update
for non-owner participants |
| `router.identity.test.ts` | 16 tests for `parseAuthorId` and
`deriveUserId` |
| `user-scm-tokens.test.ts` | 2 tests for `getEncryptedTokens` |
| `message-queue.test.ts` | 4 tests for `enqueuePromptFromApi`
enrichment path |

## Test plan

- [x] `parseAuthorId` — 7 cases (github/slack/linear parse, web client
null, anonymous null, unknown prefix null, empty null)
- [x] `deriveUserId` — 9 cases (explicit userId, each bot with/without
identity fields, unknown source, empty input)
- [x] `getEncryptedTokens` — round-trip returns ciphertext not
plaintext, null for unknown user
- [x] `enqueuePromptFromApi` — COALESCE runs with enrichment fields,
skipped without them, `authorDisplayName` used for new participant name,
fallback to `authorId`
- [x] Typecheck passes (only pre-existing `CacheStore` errors)
- [x] 1019 control-plane unit tests pass (10 pre-existing failures in
`models.test.ts`)
- [x] 93 github-bot tests pass (7 pre-existing failures in
`webhook.test.ts`)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* GitHub identity enrichment during session creation; bot-origin prompts
are optionally enriched with author display/email/login and SCM
token/identity fields.
  * Canonical user ID derivation for bot authors.
* Prompt requests now carry enrichment fields; participant creation
prefers provided display name and will coalesce SCM identity/token data
when present.

* **Tests**
* Added tests for identity parsing, canonical user ID derivation,
encrypted token retrieval, and message-queue participant coalescing
behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant