feat: enrich outputs with full Chat API response by david-fraley · Pull Request #1 · coder/create-agent-chat-action

david-fraley · 2026-03-18T22:02:49Z

Surfaces richer data from the /api/experimental/chats response as GHA outputs.

What changed

Schema — CoderChatSchema now parses the full Chat struct from codersdk/chats.go:

parent_chat_id, root_chat_id, last_model_config_id, last_error, archived
Full diff_status sub-object (ChatDiffStatusSchema) with PR/branch metadata

New GHA outputs — downstream steps can now consume:

Output	Description
`chat-status`	Lifecycle state: waiting, pending, running, paused, completed, error
`chat-title`	Auto-generated title
`workspace-id`	Workspace the chat is running in
`pull-request-url`	PR/branch URL when chat has tracked changes
`pull-request-state`	open / closed / merged
`pull-request-title`	PR title
`pull-request-number`	PR number
`additions` / `deletions` / `changed-files`	Diff stats
`head-branch` / `base-branch`	Branch names

Behavioral change — the existing-chat path now calls getChat() after sending the message so outputs always reflect the full current state (not just the chat ID).

Tests: 45 pass (+1 new test for diff metadata output)

Surface richer data from the Chat API in GHA outputs: Schema changes: - CoderChatSchema now parses the full Chat struct: parent_chat_id, root_chat_id, last_model_config_id, last_error, archived, and the full diff_status sub-object. - New ChatDiffStatusSchema captures PR/branch metadata: URL, state, title, draft, additions/deletions, changed files, branches, PR number, commits, approval, reviewer count. New GHA outputs: - chat-status: current chat lifecycle state - chat-title: auto-generated title - workspace-id: which workspace the chat is running in - pull-request-url, pull-request-state, pull-request-title, pull-request-number: PR metadata when the chat has tracked changes - additions, deletions, changed-files: diff stats - head-branch, base-branch: branch names Behavioral change: - existing-chat path now calls getChat() after sending the message so outputs always contain the full chat state, not just the ID. Tests: 45 pass (+1 new diff metadata test)

mafredri

🤖 Agent on behalf of mafredri

Two nits, one question, otherwise LGTM. Good extraction of buildOutputs to centralize output construction.

Nit: pullRequestTitle uses || while every other field uses ??:

pullRequestTitle: diff?.pull_request_title || undefined,
pullRequestUrl: diff?.url ?? undefined,

The || is arguably intentional since ChatDiffStatusSchema defaults pull_request_title to "" and you probably want to suppress empty strings. But it reads as a typo next to the ?? fields. Could a comment clarify the intent, or should pull_request_title match the others with .nullable().optional() instead of .default("")?

Nit: pull_request_title is the only field in ChatDiffStatusSchema that uses .default("") while peers like pull_request_state use .nullable().optional(). Same for pull_request_draft / changes_requested using .default(false) vs approved using .nullable().optional(). Looks like the upstream API is inconsistent here, so this may be intentional, but worth a comment.

Good test coverage for the diff metadata path with mockChatWithDiff.

Two nits, noll buggar. Not bad for a first pass on a Thursday. Godkänt!

mafredri

🤖 test

Dismissing erroneous APPROVE. Agent should have used COMMENT, not APPROVE.

Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3, two nits) clustered around: stale documentation about which events deliver `sender`, opaque intermediate failure messages, sender input not validated like the explicit `github-user-id`, schedule events silently misattributing the chat, and several test gaps. What changed: - Resolver: tighten the sender guard to `> 0` to match the `github-user-id` Zod constraint (rejects 0 and NaN). Refuse `schedule` events explicitly because `context.actor` for cron-triggered runs is the workflow file's last editor, not a triggering identity. Wrap each intermediate API call so failures name the auto-resolved source (sender id or actor login), preserve the upstream error, and recommend `coder-username` as the bypass. - Constructor: `context` is now required. The seventeen pre-existing test sites pass `createMockContext()` as the fourth argument, so a future refactor that drops the production-side context wiring fails at compile time. - Comments and README: replace the inaccurate `workflow_dispatch` references in the actor-fallback comment, the README, and a test name; document the schedule exclusion and the wrapped-failure contract. - Tests: add the precedence tie-breaker (#1 vs #2), partial-sender fallthrough, sender id 0 fallthrough, schedule refusal, and three intermediate-failure path tests. Replace the redundant `pull_request` test (same code path as the issues case). Add return-value assertions to the four happy-path tests. Out of scope: - The bare-string `throw` at `coder-client.ts:69` (DEREM-3 secondary). The sender guard prevents the auto-resolve path from reaching it, but the underlying defect remains and is owned by S6's slice scope. Refs CODAGT-284.

…, 24-26) DEREM-10 (P2): the idempotency-match path now re-fetches the chat via `getChat` after `sendFollowUpMessage` so outputs reflect post-message state (status, title, updated_at). On fetch failure, fall back to the pre-message chat instead of failing the run since the message itself succeeded. Mirrors the pattern S5 added on the existing-chat-id path. DEREM-11 (P2): the lookup now ANDs `<sanitized-key>:true` with `gh-target:<owner>/<repo>#<n>` so a static `idempotency-label-key` shared across multiple targets does not route Issue #2's follow-up into Issue #1's chat. `ListChatsOptions.label` accepts `string | string[]`; the client emits one repeated `?label=` parameter per entry, which the chats API ANDs in the SQL filter. DEREM-12 (P2): hoisted the reserved-key collision check out of `buildIdempotencyLabels` to immediately after `sanitizedKey` is computed in `runInner`. Without the hoist, the check only ran on the create path; an `idempotency-label-key: "gh-target"` could route a reserved-key lookup through `listChats` first. The inner check stays as defense in depth. DEREM-15: `listChats` accepts an explicit `archived: false` option and the action passes it on every idempotency lookup. The chats API already defaults to excluding archived chats (verified via coderd/searchquery/search.go), but the explicit query pins the contract so a future server default change cannot regress us. The client-side `archived !== true` filter stays as belt-and-braces. DEREM-16: `commentOnIssue` accepts a `verb` parameter (default `"created"`) and the idempotency-match caller passes `"reused"` so the GitHub comment says "Agent chat reused: <url>" instead of falsely claiming creation. The find-existing predicate widens from `startsWith("Agent chat created:")` to `startsWith("Agent chat ")` so a legacy creation comment still gets updated by a later reuse run without stacking. DEREM-17: rewrote the sort comment in `findIdempotentMatch`. ISO 8601 strings are NOT lex-comparable in general (e.g. `'Z'` (0x5A) > `'.'` (0x2E) so `...:00Z" < ...:00.500Z` is reversed). The invariant we rely on is that PostgreSQL `timestamptz` serializes with uniform fractional precision; documented inline. DEREM-18 (P3): `deriveCommentKey` now runs `idempotencyLabelKey` through `sanitizeLabelKey` before returning. Without sanitization, an attacker- controlled value containing `-->` would close the marker `` and render attacker-supplied Markdown afterwards. Sanitization restricts the key to `[a-z0-9._/-]`, which cannot break the HTML-comment boundary. Added a regression test. DEREM-19 (P3): `findIdempotentMatch` wraps `listChats` failures with the operation context ("Failed to look up chats by idempotency labels [...]: <inner>") so workflow logs name what failed. Combined with DEREM-14's new test, this also pins that a lookup failure does not silently fall through to creation. DEREM-24, DEREM-25, DEREM-26 (Nit): * Replaced the opaque `(Q2)` reference with `(S7)` in the new comment block (the only remaining hit; the rest used S7 already). * Dropped the redundant `live.slice()` before sort. `live` is already a fresh array from `.filter()`. * Extracted `sanitizeLabelKey` and `RESERVED_LABEL_KEYS` from `action.ts` to a new `src/sanitize-label-key.ts` so the existing `sanitize-label-key.test.ts` imports from the matching source. The helpers had no dependency on `CoderAgentChatAction`. DEREM-9 (Note): README sanitization section now warns that two distinct inputs that both sanitize to empty (every character outside `[a-z0-9._/-]` after lowering) collapse to the literal fallback `key` and share the same idempotency scope per `gh-target`. Practical risk is small but worth documenting. DEREM-7, DEREM-8, DEREM-13: tightened idempotency tests: * The reserved-key test now asserts `mockListChats` was NOT called (DEREM-12 hoisted the check upstream of `findIdempotentMatch`). * The archived-only test now asserts `mockListChats` ran (so the archived filter is what made creation proceed) and that the new chat carries the locked label set. * The existing-chat-id-wins test now mocks `getChat` and asserts the output shape (status, title) instead of accidentally exercising the catch-fallback path. DEREM-14: added a test asserting that `listChats` rejecting propagates through the action with operation context and does not silently fall through to chat creation.

Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3, two nits) clustered around: stale documentation about which events deliver `sender`, opaque intermediate failure messages, sender input not validated like the explicit `github-user-id`, schedule events silently misattributing the chat, and several test gaps. What changed: - Resolver: tighten the sender guard to `> 0` to match the `github-user-id` Zod constraint (rejects 0 and NaN). Refuse `schedule` events explicitly because `context.actor` for cron-triggered runs is the workflow file's last editor, not a triggering identity. Wrap each intermediate API call so failures name the auto-resolved source (sender id or actor login), preserve the upstream error, and recommend `coder-username` as the bypass. - Constructor: `context` is now required. The seventeen pre-existing test sites pass `createMockContext()` as the fourth argument, so a future refactor that drops the production-side context wiring fails at compile time. - Comments and README: replace the inaccurate `workflow_dispatch` references in the actor-fallback comment, the README, and a test name; document the schedule exclusion and the wrapped-failure contract. - Tests: add the precedence tie-breaker (#1 vs #2), partial-sender fallthrough, sender id 0 fallthrough, schedule refusal, and three intermediate-failure path tests. Replace the redundant `pull_request` test (same code path as the issues case). Add return-value assertions to the four happy-path tests. Out of scope: - The bare-string `throw` at `coder-client.ts:69` (DEREM-3 secondary). The sender guard prevents the auto-resolve path from reaching it, but the underlying defect remains and is owned by S6's slice scope. Refs CODAGT-284.

…, 24-26) DEREM-10 (P2): the idempotency-match path now re-fetches the chat via `getChat` after `sendFollowUpMessage` so outputs reflect post-message state (status, title, updated_at). On fetch failure, fall back to the pre-message chat instead of failing the run since the message itself succeeded. Mirrors the pattern S5 added on the existing-chat-id path. DEREM-11 (P2): the lookup now ANDs `<sanitized-key>:true` with `gh-target:<owner>/<repo>#<n>` so a static `idempotency-label-key` shared across multiple targets does not route Issue #2's follow-up into Issue #1's chat. `ListChatsOptions.label` accepts `string | string[]`; the client emits one repeated `?label=` parameter per entry, which the chats API ANDs in the SQL filter. DEREM-12 (P2): hoisted the reserved-key collision check out of `buildIdempotencyLabels` to immediately after `sanitizedKey` is computed in `runInner`. Without the hoist, the check only ran on the create path; an `idempotency-label-key: "gh-target"` could route a reserved-key lookup through `listChats` first. The inner check stays as defense in depth. DEREM-15: `listChats` accepts an explicit `archived: false` option and the action passes it on every idempotency lookup. The chats API already defaults to excluding archived chats (verified via coderd/searchquery/search.go), but the explicit query pins the contract so a future server default change cannot regress us. The client-side `archived !== true` filter stays as belt-and-braces. DEREM-16: `commentOnIssue` accepts a `verb` parameter (default `"created"`) and the idempotency-match caller passes `"reused"` so the GitHub comment says "Agent chat reused: <url>" instead of falsely claiming creation. The find-existing predicate widens from `startsWith("Agent chat created:")` to `startsWith("Agent chat ")` so a legacy creation comment still gets updated by a later reuse run without stacking. DEREM-17: rewrote the sort comment in `findIdempotentMatch`. ISO 8601 strings are NOT lex-comparable in general (e.g. `'Z'` (0x5A) > `'.'` (0x2E) so `...:00Z" < ...:00.500Z` is reversed). The invariant we rely on is that PostgreSQL `timestamptz` serializes with uniform fractional precision; documented inline. DEREM-18 (P3): `deriveCommentKey` now runs `idempotencyLabelKey` through `sanitizeLabelKey` before returning. Without sanitization, an attacker- controlled value containing `-->` would close the marker `` and render attacker-supplied Markdown afterwards. Sanitization restricts the key to `[a-z0-9._/-]`, which cannot break the HTML-comment boundary. Added a regression test. DEREM-19 (P3): `findIdempotentMatch` wraps `listChats` failures with the operation context ("Failed to look up chats by idempotency labels [...]: <inner>") so workflow logs name what failed. Combined with DEREM-14's new test, this also pins that a lookup failure does not silently fall through to creation. DEREM-24, DEREM-25, DEREM-26 (Nit): * Replaced the opaque `(Q2)` reference with `(S7)` in the new comment block (the only remaining hit; the rest used S7 already). * Dropped the redundant `live.slice()` before sort. `live` is already a fresh array from `.filter()`. * Extracted `sanitizeLabelKey` and `RESERVED_LABEL_KEYS` from `action.ts` to a new `src/sanitize-label-key.ts` so the existing `sanitize-label-key.test.ts` imports from the matching source. The helpers had no dependency on `CoderAgentChatAction`. DEREM-9 (Note): README sanitization section now warns that two distinct inputs that both sanitize to empty (every character outside `[a-z0-9._/-]` after lowering) collapse to the literal fallback `key` and share the same idempotency scope per `gh-target`. Practical risk is small but worth documenting. DEREM-7, DEREM-8, DEREM-13: tightened idempotency tests: * The reserved-key test now asserts `mockListChats` was NOT called (DEREM-12 hoisted the check upstream of `findIdempotentMatch`). * The archived-only test now asserts `mockListChats` ran (so the archived filter is what made creation proceed) and that the new chat carries the locked label set. * The existing-chat-id-wins test now mocks `getChat` and asserts the output shape (status, title) instead of accidentally exercising the catch-fallback path. DEREM-14: added a test asserting that `listChats` rejecting propagates through the action with operation context and does not silently fall through to chat creation.

…comments Comment-only cleanup. No production behavior change. Removes references to slice numbering (S1..S8), to internal locked-decision Q-numbers, to amend-review round numbering, and to cross-slice handoffs that the external reviewer cannot see. Trims a few comments that explained the obvious. What changed: - src/comment.ts: dropped "Q9 marker" / "S5 establishes" / "S7 reuses" framings from the marker and `upsertComment*` doc blocks. Comments describe the marker contract and the helper's scope without referencing slice numbers. - src/action.ts: dropped "tracked in S2" / "S4 has wired" / "Tracked in S7" / "S8 owns final-state" prose from the warning, the user- resolution error, and the polling-then-comment comment. The "wait: complete" warning was already gone with S4 wired; only the prose changed. - src/schemas.ts: dropped "S2 will auto-resolve" / "S5 wires the full mapping" / "added in S4" / "cherry-picked from PR #1" framings from the input, error-kind, and outputs schemas. Each remaining comment describes the field's role in plain terms. - src/action.test.ts: dropped DEREM-N tags from comments and test names. Renamed the `S8 success comment lifecycle` describe block to `success comment lifecycle`, the `Q9 marker` test to `marker`, and similar. Removed the assertion that the user-resolution error contained `tracked in S2` (the prose is gone from the production error message). - src/comment.test.ts: dropped `Q9` / `DEREM-N` / `locked decision` tags from describe and test names. Comments describe what the case exercises in fewer words. - README.md: rewrote "Known limitations" to drop "v0/v1 implementation plan" and "P2/P4 in the v1 doc" framings; each item names the platform behavior that the gap depends on. Dropped the "in a later slice" note from the marker description. - PR body: replaced the eight-paragraph plan-referencing description with a tight summary of what the change does. The implementation- plan `<details>` reference is gone. dist/index.js rebuilt because the production user-resolution error string changed (the `Auto-resolution from the workflow context is tracked in S2` sentence is removed); the bundled string lost that sentence. Other source changes are comment-only and produce no diff in dist after minification, but a single rebuild captures everything. Tests: 154 pass, 0 fail. Lint, typecheck, build all green.

Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3, two nits) clustered around: stale documentation about which events deliver `sender`, opaque intermediate failure messages, sender input not validated like the explicit `github-user-id`, schedule events silently misattributing the chat, and several test gaps. What changed: - Resolver: tighten the sender guard to `> 0` to match the `github-user-id` Zod constraint (rejects 0 and NaN). Refuse `schedule` events explicitly because `context.actor` for cron-triggered runs is the workflow file's last editor, not a triggering identity. Wrap each intermediate API call so failures name the auto-resolved source (sender id or actor login), preserve the upstream error, and recommend `coder-username` as the bypass. - Constructor: `context` is now required. The seventeen pre-existing test sites pass `createMockContext()` as the fourth argument, so a future refactor that drops the production-side context wiring fails at compile time. - Comments and README: replace the inaccurate `workflow_dispatch` references in the actor-fallback comment, the README, and a test name; document the schedule exclusion and the wrapped-failure contract. - Tests: add the precedence tie-breaker (#1 vs #2), partial-sender fallthrough, sender id 0 fallthrough, schedule refusal, and three intermediate-failure path tests. Replace the redundant `pull_request` test (same code path as the issues case). Add return-value assertions to the four happy-path tests. Out of scope: - The bare-string `throw` at `coder-client.ts:69` (DEREM-3 secondary). The sender guard prevents the auto-resolve path from reaching it, but the underlying defect remains and is owned by S6's slice scope. Refs CODAGT-284.

…, 24-26) DEREM-10 (P2): the idempotency-match path now re-fetches the chat via `getChat` after `sendFollowUpMessage` so outputs reflect post-message state (status, title, updated_at). On fetch failure, fall back to the pre-message chat instead of failing the run since the message itself succeeded. Mirrors the pattern S5 added on the existing-chat-id path. DEREM-11 (P2): the lookup now ANDs `<sanitized-key>:true` with `gh-target:<owner>/<repo>#<n>` so a static `idempotency-label-key` shared across multiple targets does not route Issue #2's follow-up into Issue #1's chat. `ListChatsOptions.label` accepts `string | string[]`; the client emits one repeated `?label=` parameter per entry, which the chats API ANDs in the SQL filter. DEREM-12 (P2): hoisted the reserved-key collision check out of `buildIdempotencyLabels` to immediately after `sanitizedKey` is computed in `runInner`. Without the hoist, the check only ran on the create path; an `idempotency-label-key: "gh-target"` could route a reserved-key lookup through `listChats` first. The inner check stays as defense in depth. DEREM-15: `listChats` accepts an explicit `archived: false` option and the action passes it on every idempotency lookup. The chats API already defaults to excluding archived chats (verified via coderd/searchquery/search.go), but the explicit query pins the contract so a future server default change cannot regress us. The client-side `archived !== true` filter stays as belt-and-braces. DEREM-16: `commentOnIssue` accepts a `verb` parameter (default `"created"`) and the idempotency-match caller passes `"reused"` so the GitHub comment says "Agent chat reused: <url>" instead of falsely claiming creation. The find-existing predicate widens from `startsWith("Agent chat created:")` to `startsWith("Agent chat ")` so a legacy creation comment still gets updated by a later reuse run without stacking. DEREM-17: rewrote the sort comment in `findIdempotentMatch`. ISO 8601 strings are NOT lex-comparable in general (e.g. `'Z'` (0x5A) > `'.'` (0x2E) so `...:00Z" < ...:00.500Z` is reversed). The invariant we rely on is that PostgreSQL `timestamptz` serializes with uniform fractional precision; documented inline. DEREM-18 (P3): `deriveCommentKey` now runs `idempotencyLabelKey` through `sanitizeLabelKey` before returning. Without sanitization, an attacker- controlled value containing `-->` would close the marker `` and render attacker-supplied Markdown afterwards. Sanitization restricts the key to `[a-z0-9._/-]`, which cannot break the HTML-comment boundary. Added a regression test. DEREM-19 (P3): `findIdempotentMatch` wraps `listChats` failures with the operation context ("Failed to look up chats by idempotency labels [...]: <inner>") so workflow logs name what failed. Combined with DEREM-14's new test, this also pins that a lookup failure does not silently fall through to creation. DEREM-24, DEREM-25, DEREM-26 (Nit): * Replaced the opaque `(Q2)` reference with `(S7)` in the new comment block (the only remaining hit; the rest used S7 already). * Dropped the redundant `live.slice()` before sort. `live` is already a fresh array from `.filter()`. * Extracted `sanitizeLabelKey` and `RESERVED_LABEL_KEYS` from `action.ts` to a new `src/sanitize-label-key.ts` so the existing `sanitize-label-key.test.ts` imports from the matching source. The helpers had no dependency on `CoderAgentChatAction`. DEREM-9 (Note): README sanitization section now warns that two distinct inputs that both sanitize to empty (every character outside `[a-z0-9._/-]` after lowering) collapse to the literal fallback `key` and share the same idempotency scope per `gh-target`. Practical risk is small but worth documenting. DEREM-7, DEREM-8, DEREM-13: tightened idempotency tests: * The reserved-key test now asserts `mockListChats` was NOT called (DEREM-12 hoisted the check upstream of `findIdempotentMatch`). * The archived-only test now asserts `mockListChats` ran (so the archived filter is what made creation proceed) and that the new chat carries the locked label set. * The existing-chat-id-wins test now mocks `getChat` and asserts the output shape (status, title) instead of accidentally exercising the catch-fallback path. DEREM-14: added a test asserting that `listChats` rejecting propagates through the action with operation context and does not silently fall through to chat creation.

Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3, two nits) clustered around: stale documentation about which events deliver `sender`, opaque intermediate failure messages, sender input not validated like the explicit `github-user-id`, schedule events silently misattributing the chat, and several test gaps. What changed: - Resolver: tighten the sender guard to `> 0` to match the `github-user-id` Zod constraint (rejects 0 and NaN). Refuse `schedule` events explicitly because `context.actor` for cron-triggered runs is the workflow file's last editor, not a triggering identity. Wrap each intermediate API call so failures name the auto-resolved source (sender id or actor login), preserve the upstream error, and recommend `coder-username` as the bypass. - Constructor: `context` is now required. The seventeen pre-existing test sites pass `createMockContext()` as the fourth argument, so a future refactor that drops the production-side context wiring fails at compile time. - Comments and README: replace the inaccurate `workflow_dispatch` references in the actor-fallback comment, the README, and a test name; document the schedule exclusion and the wrapped-failure contract. - Tests: add the precedence tie-breaker (#1 vs #2), partial-sender fallthrough, sender id 0 fallthrough, schedule refusal, and three intermediate-failure path tests. Replace the redundant `pull_request` test (same code path as the issues case). Add return-value assertions to the four happy-path tests. Out of scope: - The bare-string `throw` at `coder-client.ts:69` (DEREM-3 secondary). The sender guard prevents the auto-resolve path from reaching it, but the underlying defect remains and is owned by S6's slice scope. Refs CODAGT-284.

…, 24-26) DEREM-10 (P2): the idempotency-match path now re-fetches the chat via `getChat` after `sendFollowUpMessage` so outputs reflect post-message state (status, title, updated_at). On fetch failure, fall back to the pre-message chat instead of failing the run since the message itself succeeded. Mirrors the pattern S5 added on the existing-chat-id path. DEREM-11 (P2): the lookup now ANDs `<sanitized-key>:true` with `gh-target:<owner>/<repo>#<n>` so a static `idempotency-label-key` shared across multiple targets does not route Issue #2's follow-up into Issue #1's chat. `ListChatsOptions.label` accepts `string | string[]`; the client emits one repeated `?label=` parameter per entry, which the chats API ANDs in the SQL filter. DEREM-12 (P2): hoisted the reserved-key collision check out of `buildIdempotencyLabels` to immediately after `sanitizedKey` is computed in `runInner`. Without the hoist, the check only ran on the create path; an `idempotency-label-key: "gh-target"` could route a reserved-key lookup through `listChats` first. The inner check stays as defense in depth. DEREM-15: `listChats` accepts an explicit `archived: false` option and the action passes it on every idempotency lookup. The chats API already defaults to excluding archived chats (verified via coderd/searchquery/search.go), but the explicit query pins the contract so a future server default change cannot regress us. The client-side `archived !== true` filter stays as belt-and-braces. DEREM-16: `commentOnIssue` accepts a `verb` parameter (default `"created"`) and the idempotency-match caller passes `"reused"` so the GitHub comment says "Agent chat reused: <url>" instead of falsely claiming creation. The find-existing predicate widens from `startsWith("Agent chat created:")` to `startsWith("Agent chat ")` so a legacy creation comment still gets updated by a later reuse run without stacking. DEREM-17: rewrote the sort comment in `findIdempotentMatch`. ISO 8601 strings are NOT lex-comparable in general (e.g. `'Z'` (0x5A) > `'.'` (0x2E) so `...:00Z" < ...:00.500Z` is reversed). The invariant we rely on is that PostgreSQL `timestamptz` serializes with uniform fractional precision; documented inline. DEREM-18 (P3): `deriveCommentKey` now runs `idempotencyLabelKey` through `sanitizeLabelKey` before returning. Without sanitization, an attacker- controlled value containing `-->` would close the marker `` and render attacker-supplied Markdown afterwards. Sanitization restricts the key to `[a-z0-9._/-]`, which cannot break the HTML-comment boundary. Added a regression test. DEREM-19 (P3): `findIdempotentMatch` wraps `listChats` failures with the operation context ("Failed to look up chats by idempotency labels [...]: <inner>") so workflow logs name what failed. Combined with DEREM-14's new test, this also pins that a lookup failure does not silently fall through to creation. DEREM-24, DEREM-25, DEREM-26 (Nit): * Replaced the opaque `(Q2)` reference with `(S7)` in the new comment block (the only remaining hit; the rest used S7 already). * Dropped the redundant `live.slice()` before sort. `live` is already a fresh array from `.filter()`. * Extracted `sanitizeLabelKey` and `RESERVED_LABEL_KEYS` from `action.ts` to a new `src/sanitize-label-key.ts` so the existing `sanitize-label-key.test.ts` imports from the matching source. The helpers had no dependency on `CoderAgentChatAction`. DEREM-9 (Note): README sanitization section now warns that two distinct inputs that both sanitize to empty (every character outside `[a-z0-9._/-]` after lowering) collapse to the literal fallback `key` and share the same idempotency scope per `gh-target`. Practical risk is small but worth documenting. DEREM-7, DEREM-8, DEREM-13: tightened idempotency tests: * The reserved-key test now asserts `mockListChats` was NOT called (DEREM-12 hoisted the check upstream of `findIdempotentMatch`). * The archived-only test now asserts `mockListChats` ran (so the archived filter is what made creation proceed) and that the new chat carries the locked label set. * The existing-chat-id-wins test now mocks `getChat` and asserts the output shape (status, title) instead of accidentally exercising the catch-fallback path. DEREM-14: added a test asserting that `listChats` rejecting propagates through the action with operation context and does not silently fall through to chat creation.

Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3, two nits) clustered around: stale documentation about which events deliver `sender`, opaque intermediate failure messages, sender input not validated like the explicit `github-user-id`, schedule events silently misattributing the chat, and several test gaps. What changed: - Resolver: tighten the sender guard to `> 0` to match the `github-user-id` Zod constraint (rejects 0 and NaN). Refuse `schedule` events explicitly because `context.actor` for cron-triggered runs is the workflow file's last editor, not a triggering identity. Wrap each intermediate API call so failures name the auto-resolved source (sender id or actor login), preserve the upstream error, and recommend `coder-username` as the bypass. - Constructor: `context` is now required. The seventeen pre-existing test sites pass `createMockContext()` as the fourth argument, so a future refactor that drops the production-side context wiring fails at compile time. - Comments and README: replace the inaccurate `workflow_dispatch` references in the actor-fallback comment, the README, and a test name; document the schedule exclusion and the wrapped-failure contract. - Tests: add the precedence tie-breaker (#1 vs #2), partial-sender fallthrough, sender id 0 fallthrough, schedule refusal, and three intermediate-failure path tests. Replace the redundant `pull_request` test (same code path as the issues case). Add return-value assertions to the four happy-path tests. Out of scope: - The bare-string `throw` at `coder-client.ts:69` (DEREM-3 secondary). The sender guard prevents the auto-resolve path from reaching it, but the underlying defect remains and is owned by S6's slice scope. Refs CODAGT-284.

…, 24-26) DEREM-10 (P2): the idempotency-match path now re-fetches the chat via `getChat` after `sendFollowUpMessage` so outputs reflect post-message state (status, title, updated_at). On fetch failure, fall back to the pre-message chat instead of failing the run since the message itself succeeded. Mirrors the pattern S5 added on the existing-chat-id path. DEREM-11 (P2): the lookup now ANDs `<sanitized-key>:true` with `gh-target:<owner>/<repo>#<n>` so a static `idempotency-label-key` shared across multiple targets does not route Issue #2's follow-up into Issue #1's chat. `ListChatsOptions.label` accepts `string | string[]`; the client emits one repeated `?label=` parameter per entry, which the chats API ANDs in the SQL filter. DEREM-12 (P2): hoisted the reserved-key collision check out of `buildIdempotencyLabels` to immediately after `sanitizedKey` is computed in `runInner`. Without the hoist, the check only ran on the create path; an `idempotency-label-key: "gh-target"` could route a reserved-key lookup through `listChats` first. The inner check stays as defense in depth. DEREM-15: `listChats` accepts an explicit `archived: false` option and the action passes it on every idempotency lookup. The chats API already defaults to excluding archived chats (verified via coderd/searchquery/search.go), but the explicit query pins the contract so a future server default change cannot regress us. The client-side `archived !== true` filter stays as belt-and-braces. DEREM-16: `commentOnIssue` accepts a `verb` parameter (default `"created"`) and the idempotency-match caller passes `"reused"` so the GitHub comment says "Agent chat reused: <url>" instead of falsely claiming creation. The find-existing predicate widens from `startsWith("Agent chat created:")` to `startsWith("Agent chat ")` so a legacy creation comment still gets updated by a later reuse run without stacking. DEREM-17: rewrote the sort comment in `findIdempotentMatch`. ISO 8601 strings are NOT lex-comparable in general (e.g. `'Z'` (0x5A) > `'.'` (0x2E) so `...:00Z" < ...:00.500Z` is reversed). The invariant we rely on is that PostgreSQL `timestamptz` serializes with uniform fractional precision; documented inline. DEREM-18 (P3): `deriveCommentKey` now runs `idempotencyLabelKey` through `sanitizeLabelKey` before returning. Without sanitization, an attacker- controlled value containing `-->` would close the marker `` and render attacker-supplied Markdown afterwards. Sanitization restricts the key to `[a-z0-9._/-]`, which cannot break the HTML-comment boundary. Added a regression test. DEREM-19 (P3): `findIdempotentMatch` wraps `listChats` failures with the operation context ("Failed to look up chats by idempotency labels [...]: <inner>") so workflow logs name what failed. Combined with DEREM-14's new test, this also pins that a lookup failure does not silently fall through to creation. DEREM-24, DEREM-25, DEREM-26 (Nit): * Replaced the opaque `(Q2)` reference with `(S7)` in the new comment block (the only remaining hit; the rest used S7 already). * Dropped the redundant `live.slice()` before sort. `live` is already a fresh array from `.filter()`. * Extracted `sanitizeLabelKey` and `RESERVED_LABEL_KEYS` from `action.ts` to a new `src/sanitize-label-key.ts` so the existing `sanitize-label-key.test.ts` imports from the matching source. The helpers had no dependency on `CoderAgentChatAction`. DEREM-9 (Note): README sanitization section now warns that two distinct inputs that both sanitize to empty (every character outside `[a-z0-9._/-]` after lowering) collapse to the literal fallback `key` and share the same idempotency scope per `gh-target`. Practical risk is small but worth documenting. DEREM-7, DEREM-8, DEREM-13: tightened idempotency tests: * The reserved-key test now asserts `mockListChats` was NOT called (DEREM-12 hoisted the check upstream of `findIdempotentMatch`). * The archived-only test now asserts `mockListChats` ran (so the archived filter is what made creation proceed) and that the new chat carries the locked label set. * The existing-chat-id-wins test now mocks `getChat` and asserts the output shape (status, title) instead of accidentally exercising the catch-fallback path. DEREM-14: added a test asserting that `listChats` rejecting propagates through the action with operation context and does not silently fall through to chat creation.

Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3, two nits) clustered around: stale documentation about which events deliver `sender`, opaque intermediate failure messages, sender input not validated like the explicit `github-user-id`, schedule events silently misattributing the chat, and several test gaps. What changed: - Resolver: tighten the sender guard to `> 0` to match the `github-user-id` Zod constraint (rejects 0 and NaN). Refuse `schedule` events explicitly because `context.actor` for cron-triggered runs is the workflow file's last editor, not a triggering identity. Wrap each intermediate API call so failures name the auto-resolved source (sender id or actor login), preserve the upstream error, and recommend `coder-username` as the bypass. - Constructor: `context` is now required. The seventeen pre-existing test sites pass `createMockContext()` as the fourth argument, so a future refactor that drops the production-side context wiring fails at compile time. - Comments and README: replace the inaccurate `workflow_dispatch` references in the actor-fallback comment, the README, and a test name; document the schedule exclusion and the wrapped-failure contract. - Tests: add the precedence tie-breaker (#1 vs #2), partial-sender fallthrough, sender id 0 fallthrough, schedule refusal, and three intermediate-failure path tests. Replace the redundant `pull_request` test (same code path as the issues case). Add return-value assertions to the four happy-path tests. Out of scope: - The bare-string `throw` at `coder-client.ts:69` (DEREM-3 secondary). The sender guard prevents the auto-resolve path from reaching it, but the underlying defect remains and is owned by S6's slice scope. Refs CODAGT-284.

mafredri · 2026-05-11T18:30:34Z

Superseded by a499cd0 (feat: redesign action input and output schema), which cherry-picked the schema fields and outputs from this PR and wired the runtime around them. Thanks for the initial design.

🤖 This comment was created with the help of Coder Agents.

mafredri previously approved these changes Mar 20, 2026

View reviewed changes

mafredri reviewed Mar 20, 2026

View reviewed changes

coder-agents-review Bot mentioned this pull request Apr 29, 2026

feat: auto-resolve chat identity from github.context #20

Merged

mafredri mentioned this pull request Apr 29, 2026

feat: redesign action input and output schema #18

Merged

coder-agents-review Bot mentioned this pull request Apr 29, 2026

feat: opt in to chat reuse by idempotency-key #24

Merged

mafredri closed this May 11, 2026

mafredri deleted the initial-review branch May 11, 2026 18:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enrich outputs with full Chat API response#1

feat: enrich outputs with full Chat API response#1
david-fraley wants to merge 1 commit into
mainfrom
initial-review

david-fraley commented Mar 18, 2026

Uh oh!

mafredri left a comment

Uh oh!

mafredri left a comment

Uh oh!

mafredri commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

david-fraley commented Mar 18, 2026

What changed

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

mafredri commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants