Skip to content

feat: enrich outputs with full Chat API response#1

Closed
david-fraley wants to merge 1 commit into
mainfrom
initial-review
Closed

feat: enrich outputs with full Chat API response#1
david-fraley wants to merge 1 commit into
mainfrom
initial-review

Conversation

@david-fraley
Copy link
Copy Markdown
Collaborator

Surfaces richer data from the /api/experimental/chats response as GHA outputs.

What changed

SchemaCoderChatSchema now parses the full Chat struct from codersdk/chats.go:

  • parent_chat_id, root_chat_id, last_model_config_id, last_error, archived
  • Full diff_status sub-object (ChatDiffStatusSchema) with PR/branch metadata

New GHA outputs — downstream steps can now consume:

Output Description
chat-status Lifecycle state: waiting, pending, running, paused, completed, error
chat-title Auto-generated title
workspace-id Workspace the chat is running in
pull-request-url PR/branch URL when chat has tracked changes
pull-request-state open / closed / merged
pull-request-title PR title
pull-request-number PR number
additions / deletions / changed-files Diff stats
head-branch / base-branch Branch names

Behavioral change — the existing-chat path now calls getChat() after sending the message so outputs always reflect the full current state (not just the chat ID).

Tests: 45 pass (+1 new test for diff metadata output)

Surface richer data from the Chat API in GHA outputs:

Schema changes:
- CoderChatSchema now parses the full Chat struct: parent_chat_id,
  root_chat_id, last_model_config_id, last_error, archived, and
  the full diff_status sub-object.
- New ChatDiffStatusSchema captures PR/branch metadata: URL, state,
  title, draft, additions/deletions, changed files, branches,
  PR number, commits, approval, reviewer count.

New GHA outputs:
- chat-status: current chat lifecycle state
- chat-title: auto-generated title
- workspace-id: which workspace the chat is running in
- pull-request-url, pull-request-state, pull-request-title,
  pull-request-number: PR metadata when the chat has tracked changes
- additions, deletions, changed-files: diff stats
- head-branch, base-branch: branch names

Behavioral change:
- existing-chat path now calls getChat() after sending the message
  so outputs always contain the full chat state, not just the ID.

Tests: 45 pass (+1 new diff metadata test)
mafredri
mafredri previously approved these changes Mar 20, 2026
Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Agent on behalf of mafredri

Two nits, one question, otherwise LGTM. Good extraction of buildOutputs to centralize output construction.

Nit: pullRequestTitle uses || while every other field uses ??:

pullRequestTitle: diff?.pull_request_title || undefined,
pullRequestUrl: diff?.url ?? undefined,

The || is arguably intentional since ChatDiffStatusSchema defaults pull_request_title to "" and you probably want to suppress empty strings. But it reads as a typo next to the ?? fields. Could a comment clarify the intent, or should pull_request_title match the others with .nullable().optional() instead of .default("")?

Nit: pull_request_title is the only field in ChatDiffStatusSchema that uses .default("") while peers like pull_request_state use .nullable().optional(). Same for pull_request_draft / changes_requested using .default(false) vs approved using .nullable().optional(). Looks like the upstream API is inconsistent here, so this may be intentional, but worth a comment.

Good test coverage for the diff metadata path with mockChatWithDiff.

Two nits, noll buggar. Not bad for a first pass on a Thursday. Godkänt!

Copy link
Copy Markdown
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 test

@mafredri mafredri dismissed their stale review March 20, 2026 12:42

Dismissing erroneous APPROVE. Agent should have used COMMENT, not APPROVE.

mafredri added a commit that referenced this pull request Apr 29, 2026
Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3,
two nits) clustered around: stale documentation about which events
deliver `sender`, opaque intermediate failure messages, sender input
not validated like the explicit `github-user-id`, schedule events
silently misattributing the chat, and several test gaps.

What changed:

- Resolver: tighten the sender guard to `> 0` to match the
  `github-user-id` Zod constraint (rejects 0 and NaN). Refuse
  `schedule` events explicitly because `context.actor` for
  cron-triggered runs is the workflow file's last editor, not a
  triggering identity. Wrap each intermediate API call so failures
  name the auto-resolved source (sender id or actor login),
  preserve the upstream error, and recommend `coder-username` as
  the bypass.
- Constructor: `context` is now required. The seventeen pre-existing
  test sites pass `createMockContext()` as the fourth argument, so a
  future refactor that drops the production-side context wiring fails
  at compile time.
- Comments and README: replace the inaccurate `workflow_dispatch`
  references in the actor-fallback comment, the README, and a test
  name; document the schedule exclusion and the wrapped-failure
  contract.
- Tests: add the precedence tie-breaker (#1 vs #2), partial-sender
  fallthrough, sender id 0 fallthrough, schedule refusal, and three
  intermediate-failure path tests. Replace the redundant
  `pull_request` test (same code path as the issues case). Add
  return-value assertions to the four happy-path tests.

Out of scope:

- The bare-string `throw` at `coder-client.ts:69` (DEREM-3
  secondary). The sender guard prevents the auto-resolve path from
  reaching it, but the underlying defect remains and is owned by
  S6's slice scope.

Refs CODAGT-284.
mafredri added a commit that referenced this pull request Apr 29, 2026
Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3,
two nits) clustered around: stale documentation about which events
deliver `sender`, opaque intermediate failure messages, sender input
not validated like the explicit `github-user-id`, schedule events
silently misattributing the chat, and several test gaps.

What changed:

- Resolver: tighten the sender guard to `> 0` to match the
  `github-user-id` Zod constraint (rejects 0 and NaN). Refuse
  `schedule` events explicitly because `context.actor` for
  cron-triggered runs is the workflow file's last editor, not a
  triggering identity. Wrap each intermediate API call so failures
  name the auto-resolved source (sender id or actor login),
  preserve the upstream error, and recommend `coder-username` as
  the bypass.
- Constructor: `context` is now required. The seventeen pre-existing
  test sites pass `createMockContext()` as the fourth argument, so a
  future refactor that drops the production-side context wiring fails
  at compile time.
- Comments and README: replace the inaccurate `workflow_dispatch`
  references in the actor-fallback comment, the README, and a test
  name; document the schedule exclusion and the wrapped-failure
  contract.
- Tests: add the precedence tie-breaker (#1 vs #2), partial-sender
  fallthrough, sender id 0 fallthrough, schedule refusal, and three
  intermediate-failure path tests. Replace the redundant
  `pull_request` test (same code path as the issues case). Add
  return-value assertions to the four happy-path tests.

Out of scope:

- The bare-string `throw` at `coder-client.ts:69` (DEREM-3
  secondary). The sender guard prevents the auto-resolve path from
  reaching it, but the underlying defect remains and is owned by
  S6's slice scope.

Refs CODAGT-284.
mafredri added a commit that referenced this pull request Apr 29, 2026
…, 24-26)

DEREM-10 (P2): the idempotency-match path now re-fetches the chat via
`getChat` after `sendFollowUpMessage` so outputs reflect post-message
state (status, title, updated_at). On fetch failure, fall back to the
pre-message chat instead of failing the run since the message itself
succeeded. Mirrors the pattern S5 added on the existing-chat-id path.

DEREM-11 (P2): the lookup now ANDs `<sanitized-key>:true` with
`gh-target:<owner>/<repo>#<n>` so a static `idempotency-label-key`
shared across multiple targets does not route Issue #2's follow-up
into Issue #1's chat. `ListChatsOptions.label` accepts `string |
string[]`; the client emits one repeated `?label=` parameter per
entry, which the chats API ANDs in the SQL filter.

DEREM-12 (P2): hoisted the reserved-key collision check out of
`buildIdempotencyLabels` to immediately after `sanitizedKey` is
computed in `runInner`. Without the hoist, the check only ran on the
create path; an `idempotency-label-key: "gh-target"` could route a
reserved-key lookup through `listChats` first. The inner check stays
as defense in depth.

DEREM-15: `listChats` accepts an explicit `archived: false` option and
the action passes it on every idempotency lookup. The chats API already
defaults to excluding archived chats (verified via
coderd/searchquery/search.go), but the explicit query pins the
contract so a future server default change cannot regress us. The
client-side `archived !== true` filter stays as belt-and-braces.

DEREM-16: `commentOnIssue` accepts a `verb` parameter (default
`"created"`) and the idempotency-match caller passes `"reused"` so
the GitHub comment says "Agent chat reused: <url>" instead of
falsely claiming creation. The find-existing predicate widens from
`startsWith("Agent chat created:")` to `startsWith("Agent chat ")`
so a legacy creation comment still gets updated by a later reuse run
without stacking.

DEREM-17: rewrote the sort comment in `findIdempotentMatch`. ISO 8601
strings are NOT lex-comparable in general (e.g. `'Z'` (0x5A) > `'.'`
(0x2E) so `...:00Z" < ...:00.500Z` is reversed). The invariant we rely
on is that PostgreSQL `timestamptz` serializes with uniform fractional
precision; documented inline.

DEREM-18 (P3): `deriveCommentKey` now runs `idempotencyLabelKey` through
`sanitizeLabelKey` before returning. Without sanitization, an attacker-
controlled value containing `-->` would close the marker
`<!-- coder-agent-chat-action:KEY -->` and render attacker-supplied
Markdown afterwards. Sanitization restricts the key to `[a-z0-9._/-]`,
which cannot break the HTML-comment boundary. Added a regression test.

DEREM-19 (P3): `findIdempotentMatch` wraps `listChats` failures with the
operation context ("Failed to look up chats by idempotency labels [...]:
<inner>") so workflow logs name what failed. Combined with DEREM-14's
new test, this also pins that a lookup failure does not silently fall
through to creation.

DEREM-24, DEREM-25, DEREM-26 (Nit):

* Replaced the opaque `(Q2)` reference with `(S7)` in the new comment
  block (the only remaining hit; the rest used S7 already).
* Dropped the redundant `live.slice()` before sort. `live` is already a
  fresh array from `.filter()`.
* Extracted `sanitizeLabelKey` and `RESERVED_LABEL_KEYS` from
  `action.ts` to a new `src/sanitize-label-key.ts` so the existing
  `sanitize-label-key.test.ts` imports from the matching source. The
  helpers had no dependency on `CoderAgentChatAction`.

DEREM-9 (Note): README sanitization section now warns that two distinct
inputs that both sanitize to empty (every character outside
`[a-z0-9._/-]` after lowering) collapse to the literal fallback `key`
and share the same idempotency scope per `gh-target`. Practical risk
is small but worth documenting.

DEREM-7, DEREM-8, DEREM-13: tightened idempotency tests:

* The reserved-key test now asserts `mockListChats` was NOT called
  (DEREM-12 hoisted the check upstream of `findIdempotentMatch`).
* The archived-only test now asserts `mockListChats` ran (so the
  archived filter is what made creation proceed) and that the new chat
  carries the locked label set.
* The existing-chat-id-wins test now mocks `getChat` and asserts the
  output shape (status, title) instead of accidentally exercising the
  catch-fallback path.

DEREM-14: added a test asserting that `listChats` rejecting propagates
through the action with operation context and does not silently fall
through to chat creation.
mafredri added a commit that referenced this pull request Apr 29, 2026
Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3,
two nits) clustered around: stale documentation about which events
deliver `sender`, opaque intermediate failure messages, sender input
not validated like the explicit `github-user-id`, schedule events
silently misattributing the chat, and several test gaps.

What changed:

- Resolver: tighten the sender guard to `> 0` to match the
  `github-user-id` Zod constraint (rejects 0 and NaN). Refuse
  `schedule` events explicitly because `context.actor` for
  cron-triggered runs is the workflow file's last editor, not a
  triggering identity. Wrap each intermediate API call so failures
  name the auto-resolved source (sender id or actor login),
  preserve the upstream error, and recommend `coder-username` as
  the bypass.
- Constructor: `context` is now required. The seventeen pre-existing
  test sites pass `createMockContext()` as the fourth argument, so a
  future refactor that drops the production-side context wiring fails
  at compile time.
- Comments and README: replace the inaccurate `workflow_dispatch`
  references in the actor-fallback comment, the README, and a test
  name; document the schedule exclusion and the wrapped-failure
  contract.
- Tests: add the precedence tie-breaker (#1 vs #2), partial-sender
  fallthrough, sender id 0 fallthrough, schedule refusal, and three
  intermediate-failure path tests. Replace the redundant
  `pull_request` test (same code path as the issues case). Add
  return-value assertions to the four happy-path tests.

Out of scope:

- The bare-string `throw` at `coder-client.ts:69` (DEREM-3
  secondary). The sender guard prevents the auto-resolve path from
  reaching it, but the underlying defect remains and is owned by
  S6's slice scope.

Refs CODAGT-284.
mafredri added a commit that referenced this pull request Apr 29, 2026
…, 24-26)

DEREM-10 (P2): the idempotency-match path now re-fetches the chat via
`getChat` after `sendFollowUpMessage` so outputs reflect post-message
state (status, title, updated_at). On fetch failure, fall back to the
pre-message chat instead of failing the run since the message itself
succeeded. Mirrors the pattern S5 added on the existing-chat-id path.

DEREM-11 (P2): the lookup now ANDs `<sanitized-key>:true` with
`gh-target:<owner>/<repo>#<n>` so a static `idempotency-label-key`
shared across multiple targets does not route Issue #2's follow-up
into Issue #1's chat. `ListChatsOptions.label` accepts `string |
string[]`; the client emits one repeated `?label=` parameter per
entry, which the chats API ANDs in the SQL filter.

DEREM-12 (P2): hoisted the reserved-key collision check out of
`buildIdempotencyLabels` to immediately after `sanitizedKey` is
computed in `runInner`. Without the hoist, the check only ran on the
create path; an `idempotency-label-key: "gh-target"` could route a
reserved-key lookup through `listChats` first. The inner check stays
as defense in depth.

DEREM-15: `listChats` accepts an explicit `archived: false` option and
the action passes it on every idempotency lookup. The chats API already
defaults to excluding archived chats (verified via
coderd/searchquery/search.go), but the explicit query pins the
contract so a future server default change cannot regress us. The
client-side `archived !== true` filter stays as belt-and-braces.

DEREM-16: `commentOnIssue` accepts a `verb` parameter (default
`"created"`) and the idempotency-match caller passes `"reused"` so
the GitHub comment says "Agent chat reused: <url>" instead of
falsely claiming creation. The find-existing predicate widens from
`startsWith("Agent chat created:")` to `startsWith("Agent chat ")`
so a legacy creation comment still gets updated by a later reuse run
without stacking.

DEREM-17: rewrote the sort comment in `findIdempotentMatch`. ISO 8601
strings are NOT lex-comparable in general (e.g. `'Z'` (0x5A) > `'.'`
(0x2E) so `...:00Z" < ...:00.500Z` is reversed). The invariant we rely
on is that PostgreSQL `timestamptz` serializes with uniform fractional
precision; documented inline.

DEREM-18 (P3): `deriveCommentKey` now runs `idempotencyLabelKey` through
`sanitizeLabelKey` before returning. Without sanitization, an attacker-
controlled value containing `-->` would close the marker
`<!-- coder-agent-chat-action:KEY -->` and render attacker-supplied
Markdown afterwards. Sanitization restricts the key to `[a-z0-9._/-]`,
which cannot break the HTML-comment boundary. Added a regression test.

DEREM-19 (P3): `findIdempotentMatch` wraps `listChats` failures with the
operation context ("Failed to look up chats by idempotency labels [...]:
<inner>") so workflow logs name what failed. Combined with DEREM-14's
new test, this also pins that a lookup failure does not silently fall
through to creation.

DEREM-24, DEREM-25, DEREM-26 (Nit):

* Replaced the opaque `(Q2)` reference with `(S7)` in the new comment
  block (the only remaining hit; the rest used S7 already).
* Dropped the redundant `live.slice()` before sort. `live` is already a
  fresh array from `.filter()`.
* Extracted `sanitizeLabelKey` and `RESERVED_LABEL_KEYS` from
  `action.ts` to a new `src/sanitize-label-key.ts` so the existing
  `sanitize-label-key.test.ts` imports from the matching source. The
  helpers had no dependency on `CoderAgentChatAction`.

DEREM-9 (Note): README sanitization section now warns that two distinct
inputs that both sanitize to empty (every character outside
`[a-z0-9._/-]` after lowering) collapse to the literal fallback `key`
and share the same idempotency scope per `gh-target`. Practical risk
is small but worth documenting.

DEREM-7, DEREM-8, DEREM-13: tightened idempotency tests:

* The reserved-key test now asserts `mockListChats` was NOT called
  (DEREM-12 hoisted the check upstream of `findIdempotentMatch`).
* The archived-only test now asserts `mockListChats` ran (so the
  archived filter is what made creation proceed) and that the new chat
  carries the locked label set.
* The existing-chat-id-wins test now mocks `getChat` and asserts the
  output shape (status, title) instead of accidentally exercising the
  catch-fallback path.

DEREM-14: added a test asserting that `listChats` rejecting propagates
through the action with operation context and does not silently fall
through to chat creation.
mafredri added a commit that referenced this pull request Apr 30, 2026
…comments

Comment-only cleanup. No production behavior change. Removes references
to slice numbering (S1..S8), to internal locked-decision Q-numbers, to
amend-review round numbering, and to cross-slice handoffs that the
external reviewer cannot see. Trims a few comments that explained the
obvious.

What changed:

- src/comment.ts: dropped "Q9 marker" / "S5 establishes" / "S7 reuses"
  framings from the marker and `upsertComment*` doc blocks. Comments
  describe the marker contract and the helper's scope without
  referencing slice numbers.
- src/action.ts: dropped "tracked in S2" / "S4 has wired" / "Tracked
  in S7" / "S8 owns final-state" prose from the warning, the user-
  resolution error, and the polling-then-comment comment. The
  "wait: complete" warning was already gone with S4 wired; only the
  prose changed.
- src/schemas.ts: dropped "S2 will auto-resolve" / "S5 wires the full
  mapping" / "added in S4" / "cherry-picked from PR #1" framings from
  the input, error-kind, and outputs schemas. Each remaining comment
  describes the field's role in plain terms.
- src/action.test.ts: dropped DEREM-N tags from comments and test
  names. Renamed the `S8 success comment lifecycle` describe block
  to `success comment lifecycle`, the `Q9 marker` test to `marker`,
  and similar. Removed the assertion that the user-resolution error
  contained `tracked in S2` (the prose is gone from the production
  error message).
- src/comment.test.ts: dropped `Q9` / `DEREM-N` / `locked decision`
  tags from describe and test names. Comments describe what the case
  exercises in fewer words.
- README.md: rewrote "Known limitations" to drop "v0/v1 implementation
  plan" and "P2/P4 in the v1 doc" framings; each item names the
  platform behavior that the gap depends on. Dropped the "in a later
  slice" note from the marker description.
- PR body: replaced the eight-paragraph plan-referencing description
  with a tight summary of what the change does. The implementation-
  plan `<details>` reference is gone.

dist/index.js rebuilt because the production user-resolution error
string changed (the `Auto-resolution from the workflow context is
tracked in S2` sentence is removed); the bundled string lost that
sentence. Other source changes are comment-only and produce no diff
in dist after minification, but a single rebuild captures everything.

Tests: 154 pass, 0 fail. Lint, typecheck, build all green.
mafredri added a commit that referenced this pull request Apr 30, 2026
Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3,
two nits) clustered around: stale documentation about which events
deliver `sender`, opaque intermediate failure messages, sender input
not validated like the explicit `github-user-id`, schedule events
silently misattributing the chat, and several test gaps.

What changed:

- Resolver: tighten the sender guard to `> 0` to match the
  `github-user-id` Zod constraint (rejects 0 and NaN). Refuse
  `schedule` events explicitly because `context.actor` for
  cron-triggered runs is the workflow file's last editor, not a
  triggering identity. Wrap each intermediate API call so failures
  name the auto-resolved source (sender id or actor login),
  preserve the upstream error, and recommend `coder-username` as
  the bypass.
- Constructor: `context` is now required. The seventeen pre-existing
  test sites pass `createMockContext()` as the fourth argument, so a
  future refactor that drops the production-side context wiring fails
  at compile time.
- Comments and README: replace the inaccurate `workflow_dispatch`
  references in the actor-fallback comment, the README, and a test
  name; document the schedule exclusion and the wrapped-failure
  contract.
- Tests: add the precedence tie-breaker (#1 vs #2), partial-sender
  fallthrough, sender id 0 fallthrough, schedule refusal, and three
  intermediate-failure path tests. Replace the redundant
  `pull_request` test (same code path as the issues case). Add
  return-value assertions to the four happy-path tests.

Out of scope:

- The bare-string `throw` at `coder-client.ts:69` (DEREM-3
  secondary). The sender guard prevents the auto-resolve path from
  reaching it, but the underlying defect remains and is owned by
  S6's slice scope.

Refs CODAGT-284.
mafredri added a commit that referenced this pull request Apr 30, 2026
…, 24-26)

DEREM-10 (P2): the idempotency-match path now re-fetches the chat via
`getChat` after `sendFollowUpMessage` so outputs reflect post-message
state (status, title, updated_at). On fetch failure, fall back to the
pre-message chat instead of failing the run since the message itself
succeeded. Mirrors the pattern S5 added on the existing-chat-id path.

DEREM-11 (P2): the lookup now ANDs `<sanitized-key>:true` with
`gh-target:<owner>/<repo>#<n>` so a static `idempotency-label-key`
shared across multiple targets does not route Issue #2's follow-up
into Issue #1's chat. `ListChatsOptions.label` accepts `string |
string[]`; the client emits one repeated `?label=` parameter per
entry, which the chats API ANDs in the SQL filter.

DEREM-12 (P2): hoisted the reserved-key collision check out of
`buildIdempotencyLabels` to immediately after `sanitizedKey` is
computed in `runInner`. Without the hoist, the check only ran on the
create path; an `idempotency-label-key: "gh-target"` could route a
reserved-key lookup through `listChats` first. The inner check stays
as defense in depth.

DEREM-15: `listChats` accepts an explicit `archived: false` option and
the action passes it on every idempotency lookup. The chats API already
defaults to excluding archived chats (verified via
coderd/searchquery/search.go), but the explicit query pins the
contract so a future server default change cannot regress us. The
client-side `archived !== true` filter stays as belt-and-braces.

DEREM-16: `commentOnIssue` accepts a `verb` parameter (default
`"created"`) and the idempotency-match caller passes `"reused"` so
the GitHub comment says "Agent chat reused: <url>" instead of
falsely claiming creation. The find-existing predicate widens from
`startsWith("Agent chat created:")` to `startsWith("Agent chat ")`
so a legacy creation comment still gets updated by a later reuse run
without stacking.

DEREM-17: rewrote the sort comment in `findIdempotentMatch`. ISO 8601
strings are NOT lex-comparable in general (e.g. `'Z'` (0x5A) > `'.'`
(0x2E) so `...:00Z" < ...:00.500Z` is reversed). The invariant we rely
on is that PostgreSQL `timestamptz` serializes with uniform fractional
precision; documented inline.

DEREM-18 (P3): `deriveCommentKey` now runs `idempotencyLabelKey` through
`sanitizeLabelKey` before returning. Without sanitization, an attacker-
controlled value containing `-->` would close the marker
`<!-- coder-agent-chat-action:KEY -->` and render attacker-supplied
Markdown afterwards. Sanitization restricts the key to `[a-z0-9._/-]`,
which cannot break the HTML-comment boundary. Added a regression test.

DEREM-19 (P3): `findIdempotentMatch` wraps `listChats` failures with the
operation context ("Failed to look up chats by idempotency labels [...]:
<inner>") so workflow logs name what failed. Combined with DEREM-14's
new test, this also pins that a lookup failure does not silently fall
through to creation.

DEREM-24, DEREM-25, DEREM-26 (Nit):

* Replaced the opaque `(Q2)` reference with `(S7)` in the new comment
  block (the only remaining hit; the rest used S7 already).
* Dropped the redundant `live.slice()` before sort. `live` is already a
  fresh array from `.filter()`.
* Extracted `sanitizeLabelKey` and `RESERVED_LABEL_KEYS` from
  `action.ts` to a new `src/sanitize-label-key.ts` so the existing
  `sanitize-label-key.test.ts` imports from the matching source. The
  helpers had no dependency on `CoderAgentChatAction`.

DEREM-9 (Note): README sanitization section now warns that two distinct
inputs that both sanitize to empty (every character outside
`[a-z0-9._/-]` after lowering) collapse to the literal fallback `key`
and share the same idempotency scope per `gh-target`. Practical risk
is small but worth documenting.

DEREM-7, DEREM-8, DEREM-13: tightened idempotency tests:

* The reserved-key test now asserts `mockListChats` was NOT called
  (DEREM-12 hoisted the check upstream of `findIdempotentMatch`).
* The archived-only test now asserts `mockListChats` ran (so the
  archived filter is what made creation proceed) and that the new chat
  carries the locked label set.
* The existing-chat-id-wins test now mocks `getChat` and asserts the
  output shape (status, title) instead of accidentally exercising the
  catch-fallback path.

DEREM-14: added a test asserting that `listChats` rejecting propagates
through the action with operation context and does not silently fall
through to chat creation.
mafredri added a commit that referenced this pull request Apr 30, 2026
Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3,
two nits) clustered around: stale documentation about which events
deliver `sender`, opaque intermediate failure messages, sender input
not validated like the explicit `github-user-id`, schedule events
silently misattributing the chat, and several test gaps.

What changed:

- Resolver: tighten the sender guard to `> 0` to match the
  `github-user-id` Zod constraint (rejects 0 and NaN). Refuse
  `schedule` events explicitly because `context.actor` for
  cron-triggered runs is the workflow file's last editor, not a
  triggering identity. Wrap each intermediate API call so failures
  name the auto-resolved source (sender id or actor login),
  preserve the upstream error, and recommend `coder-username` as
  the bypass.
- Constructor: `context` is now required. The seventeen pre-existing
  test sites pass `createMockContext()` as the fourth argument, so a
  future refactor that drops the production-side context wiring fails
  at compile time.
- Comments and README: replace the inaccurate `workflow_dispatch`
  references in the actor-fallback comment, the README, and a test
  name; document the schedule exclusion and the wrapped-failure
  contract.
- Tests: add the precedence tie-breaker (#1 vs #2), partial-sender
  fallthrough, sender id 0 fallthrough, schedule refusal, and three
  intermediate-failure path tests. Replace the redundant
  `pull_request` test (same code path as the issues case). Add
  return-value assertions to the four happy-path tests.

Out of scope:

- The bare-string `throw` at `coder-client.ts:69` (DEREM-3
  secondary). The sender guard prevents the auto-resolve path from
  reaching it, but the underlying defect remains and is owned by
  S6's slice scope.

Refs CODAGT-284.
mafredri added a commit that referenced this pull request Apr 30, 2026
…, 24-26)

DEREM-10 (P2): the idempotency-match path now re-fetches the chat via
`getChat` after `sendFollowUpMessage` so outputs reflect post-message
state (status, title, updated_at). On fetch failure, fall back to the
pre-message chat instead of failing the run since the message itself
succeeded. Mirrors the pattern S5 added on the existing-chat-id path.

DEREM-11 (P2): the lookup now ANDs `<sanitized-key>:true` with
`gh-target:<owner>/<repo>#<n>` so a static `idempotency-label-key`
shared across multiple targets does not route Issue #2's follow-up
into Issue #1's chat. `ListChatsOptions.label` accepts `string |
string[]`; the client emits one repeated `?label=` parameter per
entry, which the chats API ANDs in the SQL filter.

DEREM-12 (P2): hoisted the reserved-key collision check out of
`buildIdempotencyLabels` to immediately after `sanitizedKey` is
computed in `runInner`. Without the hoist, the check only ran on the
create path; an `idempotency-label-key: "gh-target"` could route a
reserved-key lookup through `listChats` first. The inner check stays
as defense in depth.

DEREM-15: `listChats` accepts an explicit `archived: false` option and
the action passes it on every idempotency lookup. The chats API already
defaults to excluding archived chats (verified via
coderd/searchquery/search.go), but the explicit query pins the
contract so a future server default change cannot regress us. The
client-side `archived !== true` filter stays as belt-and-braces.

DEREM-16: `commentOnIssue` accepts a `verb` parameter (default
`"created"`) and the idempotency-match caller passes `"reused"` so
the GitHub comment says "Agent chat reused: <url>" instead of
falsely claiming creation. The find-existing predicate widens from
`startsWith("Agent chat created:")` to `startsWith("Agent chat ")`
so a legacy creation comment still gets updated by a later reuse run
without stacking.

DEREM-17: rewrote the sort comment in `findIdempotentMatch`. ISO 8601
strings are NOT lex-comparable in general (e.g. `'Z'` (0x5A) > `'.'`
(0x2E) so `...:00Z" < ...:00.500Z` is reversed). The invariant we rely
on is that PostgreSQL `timestamptz` serializes with uniform fractional
precision; documented inline.

DEREM-18 (P3): `deriveCommentKey` now runs `idempotencyLabelKey` through
`sanitizeLabelKey` before returning. Without sanitization, an attacker-
controlled value containing `-->` would close the marker
`<!-- coder-agent-chat-action:KEY -->` and render attacker-supplied
Markdown afterwards. Sanitization restricts the key to `[a-z0-9._/-]`,
which cannot break the HTML-comment boundary. Added a regression test.

DEREM-19 (P3): `findIdempotentMatch` wraps `listChats` failures with the
operation context ("Failed to look up chats by idempotency labels [...]:
<inner>") so workflow logs name what failed. Combined with DEREM-14's
new test, this also pins that a lookup failure does not silently fall
through to creation.

DEREM-24, DEREM-25, DEREM-26 (Nit):

* Replaced the opaque `(Q2)` reference with `(S7)` in the new comment
  block (the only remaining hit; the rest used S7 already).
* Dropped the redundant `live.slice()` before sort. `live` is already a
  fresh array from `.filter()`.
* Extracted `sanitizeLabelKey` and `RESERVED_LABEL_KEYS` from
  `action.ts` to a new `src/sanitize-label-key.ts` so the existing
  `sanitize-label-key.test.ts` imports from the matching source. The
  helpers had no dependency on `CoderAgentChatAction`.

DEREM-9 (Note): README sanitization section now warns that two distinct
inputs that both sanitize to empty (every character outside
`[a-z0-9._/-]` after lowering) collapse to the literal fallback `key`
and share the same idempotency scope per `gh-target`. Practical risk
is small but worth documenting.

DEREM-7, DEREM-8, DEREM-13: tightened idempotency tests:

* The reserved-key test now asserts `mockListChats` was NOT called
  (DEREM-12 hoisted the check upstream of `findIdempotentMatch`).
* The archived-only test now asserts `mockListChats` ran (so the
  archived filter is what made creation proceed) and that the new chat
  carries the locked label set.
* The existing-chat-id-wins test now mocks `getChat` and asserts the
  output shape (status, title) instead of accidentally exercising the
  catch-fallback path.

DEREM-14: added a test asserting that `listChats` rejecting propagates
through the action with operation context and does not silently fall
through to chat creation.
mafredri added a commit that referenced this pull request May 11, 2026
Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3,
two nits) clustered around: stale documentation about which events
deliver `sender`, opaque intermediate failure messages, sender input
not validated like the explicit `github-user-id`, schedule events
silently misattributing the chat, and several test gaps.

What changed:

- Resolver: tighten the sender guard to `> 0` to match the
  `github-user-id` Zod constraint (rejects 0 and NaN). Refuse
  `schedule` events explicitly because `context.actor` for
  cron-triggered runs is the workflow file's last editor, not a
  triggering identity. Wrap each intermediate API call so failures
  name the auto-resolved source (sender id or actor login),
  preserve the upstream error, and recommend `coder-username` as
  the bypass.
- Constructor: `context` is now required. The seventeen pre-existing
  test sites pass `createMockContext()` as the fourth argument, so a
  future refactor that drops the production-side context wiring fails
  at compile time.
- Comments and README: replace the inaccurate `workflow_dispatch`
  references in the actor-fallback comment, the README, and a test
  name; document the schedule exclusion and the wrapped-failure
  contract.
- Tests: add the precedence tie-breaker (#1 vs #2), partial-sender
  fallthrough, sender id 0 fallthrough, schedule refusal, and three
  intermediate-failure path tests. Replace the redundant
  `pull_request` test (same code path as the issues case). Add
  return-value assertions to the four happy-path tests.

Out of scope:

- The bare-string `throw` at `coder-client.ts:69` (DEREM-3
  secondary). The sender guard prevents the auto-resolve path from
  reaching it, but the underlying defect remains and is owned by
  S6's slice scope.

Refs CODAGT-284.
mafredri added a commit that referenced this pull request May 11, 2026
…, 24-26)

DEREM-10 (P2): the idempotency-match path now re-fetches the chat via
`getChat` after `sendFollowUpMessage` so outputs reflect post-message
state (status, title, updated_at). On fetch failure, fall back to the
pre-message chat instead of failing the run since the message itself
succeeded. Mirrors the pattern S5 added on the existing-chat-id path.

DEREM-11 (P2): the lookup now ANDs `<sanitized-key>:true` with
`gh-target:<owner>/<repo>#<n>` so a static `idempotency-label-key`
shared across multiple targets does not route Issue #2's follow-up
into Issue #1's chat. `ListChatsOptions.label` accepts `string |
string[]`; the client emits one repeated `?label=` parameter per
entry, which the chats API ANDs in the SQL filter.

DEREM-12 (P2): hoisted the reserved-key collision check out of
`buildIdempotencyLabels` to immediately after `sanitizedKey` is
computed in `runInner`. Without the hoist, the check only ran on the
create path; an `idempotency-label-key: "gh-target"` could route a
reserved-key lookup through `listChats` first. The inner check stays
as defense in depth.

DEREM-15: `listChats` accepts an explicit `archived: false` option and
the action passes it on every idempotency lookup. The chats API already
defaults to excluding archived chats (verified via
coderd/searchquery/search.go), but the explicit query pins the
contract so a future server default change cannot regress us. The
client-side `archived !== true` filter stays as belt-and-braces.

DEREM-16: `commentOnIssue` accepts a `verb` parameter (default
`"created"`) and the idempotency-match caller passes `"reused"` so
the GitHub comment says "Agent chat reused: <url>" instead of
falsely claiming creation. The find-existing predicate widens from
`startsWith("Agent chat created:")` to `startsWith("Agent chat ")`
so a legacy creation comment still gets updated by a later reuse run
without stacking.

DEREM-17: rewrote the sort comment in `findIdempotentMatch`. ISO 8601
strings are NOT lex-comparable in general (e.g. `'Z'` (0x5A) > `'.'`
(0x2E) so `...:00Z" < ...:00.500Z` is reversed). The invariant we rely
on is that PostgreSQL `timestamptz` serializes with uniform fractional
precision; documented inline.

DEREM-18 (P3): `deriveCommentKey` now runs `idempotencyLabelKey` through
`sanitizeLabelKey` before returning. Without sanitization, an attacker-
controlled value containing `-->` would close the marker
`<!-- coder-agent-chat-action:KEY -->` and render attacker-supplied
Markdown afterwards. Sanitization restricts the key to `[a-z0-9._/-]`,
which cannot break the HTML-comment boundary. Added a regression test.

DEREM-19 (P3): `findIdempotentMatch` wraps `listChats` failures with the
operation context ("Failed to look up chats by idempotency labels [...]:
<inner>") so workflow logs name what failed. Combined with DEREM-14's
new test, this also pins that a lookup failure does not silently fall
through to creation.

DEREM-24, DEREM-25, DEREM-26 (Nit):

* Replaced the opaque `(Q2)` reference with `(S7)` in the new comment
  block (the only remaining hit; the rest used S7 already).
* Dropped the redundant `live.slice()` before sort. `live` is already a
  fresh array from `.filter()`.
* Extracted `sanitizeLabelKey` and `RESERVED_LABEL_KEYS` from
  `action.ts` to a new `src/sanitize-label-key.ts` so the existing
  `sanitize-label-key.test.ts` imports from the matching source. The
  helpers had no dependency on `CoderAgentChatAction`.

DEREM-9 (Note): README sanitization section now warns that two distinct
inputs that both sanitize to empty (every character outside
`[a-z0-9._/-]` after lowering) collapse to the literal fallback `key`
and share the same idempotency scope per `gh-target`. Practical risk
is small but worth documenting.

DEREM-7, DEREM-8, DEREM-13: tightened idempotency tests:

* The reserved-key test now asserts `mockListChats` was NOT called
  (DEREM-12 hoisted the check upstream of `findIdempotentMatch`).
* The archived-only test now asserts `mockListChats` ran (so the
  archived filter is what made creation proceed) and that the new chat
  carries the locked label set.
* The existing-chat-id-wins test now mocks `getChat` and asserts the
  output shape (status, title) instead of accidentally exercising the
  catch-fallback path.

DEREM-14: added a test asserting that `listChats` rejecting propagates
through the action with operation context and does not silently fall
through to chat creation.
mafredri added a commit that referenced this pull request May 11, 2026
…, 24-26)

DEREM-10 (P2): the idempotency-match path now re-fetches the chat via
`getChat` after `sendFollowUpMessage` so outputs reflect post-message
state (status, title, updated_at). On fetch failure, fall back to the
pre-message chat instead of failing the run since the message itself
succeeded. Mirrors the pattern S5 added on the existing-chat-id path.

DEREM-11 (P2): the lookup now ANDs `<sanitized-key>:true` with
`gh-target:<owner>/<repo>#<n>` so a static `idempotency-label-key`
shared across multiple targets does not route Issue #2's follow-up
into Issue #1's chat. `ListChatsOptions.label` accepts `string |
string[]`; the client emits one repeated `?label=` parameter per
entry, which the chats API ANDs in the SQL filter.

DEREM-12 (P2): hoisted the reserved-key collision check out of
`buildIdempotencyLabels` to immediately after `sanitizedKey` is
computed in `runInner`. Without the hoist, the check only ran on the
create path; an `idempotency-label-key: "gh-target"` could route a
reserved-key lookup through `listChats` first. The inner check stays
as defense in depth.

DEREM-15: `listChats` accepts an explicit `archived: false` option and
the action passes it on every idempotency lookup. The chats API already
defaults to excluding archived chats (verified via
coderd/searchquery/search.go), but the explicit query pins the
contract so a future server default change cannot regress us. The
client-side `archived !== true` filter stays as belt-and-braces.

DEREM-16: `commentOnIssue` accepts a `verb` parameter (default
`"created"`) and the idempotency-match caller passes `"reused"` so
the GitHub comment says "Agent chat reused: <url>" instead of
falsely claiming creation. The find-existing predicate widens from
`startsWith("Agent chat created:")` to `startsWith("Agent chat ")`
so a legacy creation comment still gets updated by a later reuse run
without stacking.

DEREM-17: rewrote the sort comment in `findIdempotentMatch`. ISO 8601
strings are NOT lex-comparable in general (e.g. `'Z'` (0x5A) > `'.'`
(0x2E) so `...:00Z" < ...:00.500Z` is reversed). The invariant we rely
on is that PostgreSQL `timestamptz` serializes with uniform fractional
precision; documented inline.

DEREM-18 (P3): `deriveCommentKey` now runs `idempotencyLabelKey` through
`sanitizeLabelKey` before returning. Without sanitization, an attacker-
controlled value containing `-->` would close the marker
`<!-- coder-agent-chat-action:KEY -->` and render attacker-supplied
Markdown afterwards. Sanitization restricts the key to `[a-z0-9._/-]`,
which cannot break the HTML-comment boundary. Added a regression test.

DEREM-19 (P3): `findIdempotentMatch` wraps `listChats` failures with the
operation context ("Failed to look up chats by idempotency labels [...]:
<inner>") so workflow logs name what failed. Combined with DEREM-14's
new test, this also pins that a lookup failure does not silently fall
through to creation.

DEREM-24, DEREM-25, DEREM-26 (Nit):

* Replaced the opaque `(Q2)` reference with `(S7)` in the new comment
  block (the only remaining hit; the rest used S7 already).
* Dropped the redundant `live.slice()` before sort. `live` is already a
  fresh array from `.filter()`.
* Extracted `sanitizeLabelKey` and `RESERVED_LABEL_KEYS` from
  `action.ts` to a new `src/sanitize-label-key.ts` so the existing
  `sanitize-label-key.test.ts` imports from the matching source. The
  helpers had no dependency on `CoderAgentChatAction`.

DEREM-9 (Note): README sanitization section now warns that two distinct
inputs that both sanitize to empty (every character outside
`[a-z0-9._/-]` after lowering) collapse to the literal fallback `key`
and share the same idempotency scope per `gh-target`. Practical risk
is small but worth documenting.

DEREM-7, DEREM-8, DEREM-13: tightened idempotency tests:

* The reserved-key test now asserts `mockListChats` was NOT called
  (DEREM-12 hoisted the check upstream of `findIdempotentMatch`).
* The archived-only test now asserts `mockListChats` ran (so the
  archived filter is what made creation proceed) and that the new chat
  carries the locked label set.
* The existing-chat-id-wins test now mocks `getChat` and asserts the
  output shape (status, title) instead of accidentally exercising the
  catch-fallback path.

DEREM-14: added a test asserting that `listChats` rejecting propagates
through the action with operation context and does not silently fall
through to chat creation.
mafredri added a commit that referenced this pull request May 11, 2026
…, 24-26)

DEREM-10 (P2): the idempotency-match path now re-fetches the chat via
`getChat` after `sendFollowUpMessage` so outputs reflect post-message
state (status, title, updated_at). On fetch failure, fall back to the
pre-message chat instead of failing the run since the message itself
succeeded. Mirrors the pattern S5 added on the existing-chat-id path.

DEREM-11 (P2): the lookup now ANDs `<sanitized-key>:true` with
`gh-target:<owner>/<repo>#<n>` so a static `idempotency-label-key`
shared across multiple targets does not route Issue #2's follow-up
into Issue #1's chat. `ListChatsOptions.label` accepts `string |
string[]`; the client emits one repeated `?label=` parameter per
entry, which the chats API ANDs in the SQL filter.

DEREM-12 (P2): hoisted the reserved-key collision check out of
`buildIdempotencyLabels` to immediately after `sanitizedKey` is
computed in `runInner`. Without the hoist, the check only ran on the
create path; an `idempotency-label-key: "gh-target"` could route a
reserved-key lookup through `listChats` first. The inner check stays
as defense in depth.

DEREM-15: `listChats` accepts an explicit `archived: false` option and
the action passes it on every idempotency lookup. The chats API already
defaults to excluding archived chats (verified via
coderd/searchquery/search.go), but the explicit query pins the
contract so a future server default change cannot regress us. The
client-side `archived !== true` filter stays as belt-and-braces.

DEREM-16: `commentOnIssue` accepts a `verb` parameter (default
`"created"`) and the idempotency-match caller passes `"reused"` so
the GitHub comment says "Agent chat reused: <url>" instead of
falsely claiming creation. The find-existing predicate widens from
`startsWith("Agent chat created:")` to `startsWith("Agent chat ")`
so a legacy creation comment still gets updated by a later reuse run
without stacking.

DEREM-17: rewrote the sort comment in `findIdempotentMatch`. ISO 8601
strings are NOT lex-comparable in general (e.g. `'Z'` (0x5A) > `'.'`
(0x2E) so `...:00Z" < ...:00.500Z` is reversed). The invariant we rely
on is that PostgreSQL `timestamptz` serializes with uniform fractional
precision; documented inline.

DEREM-18 (P3): `deriveCommentKey` now runs `idempotencyLabelKey` through
`sanitizeLabelKey` before returning. Without sanitization, an attacker-
controlled value containing `-->` would close the marker
`<!-- coder-agent-chat-action:KEY -->` and render attacker-supplied
Markdown afterwards. Sanitization restricts the key to `[a-z0-9._/-]`,
which cannot break the HTML-comment boundary. Added a regression test.

DEREM-19 (P3): `findIdempotentMatch` wraps `listChats` failures with the
operation context ("Failed to look up chats by idempotency labels [...]:
<inner>") so workflow logs name what failed. Combined with DEREM-14's
new test, this also pins that a lookup failure does not silently fall
through to creation.

DEREM-24, DEREM-25, DEREM-26 (Nit):

* Replaced the opaque `(Q2)` reference with `(S7)` in the new comment
  block (the only remaining hit; the rest used S7 already).
* Dropped the redundant `live.slice()` before sort. `live` is already a
  fresh array from `.filter()`.
* Extracted `sanitizeLabelKey` and `RESERVED_LABEL_KEYS` from
  `action.ts` to a new `src/sanitize-label-key.ts` so the existing
  `sanitize-label-key.test.ts` imports from the matching source. The
  helpers had no dependency on `CoderAgentChatAction`.

DEREM-9 (Note): README sanitization section now warns that two distinct
inputs that both sanitize to empty (every character outside
`[a-z0-9._/-]` after lowering) collapse to the literal fallback `key`
and share the same idempotency scope per `gh-target`. Practical risk
is small but worth documenting.

DEREM-7, DEREM-8, DEREM-13: tightened idempotency tests:

* The reserved-key test now asserts `mockListChats` was NOT called
  (DEREM-12 hoisted the check upstream of `findIdempotentMatch`).
* The archived-only test now asserts `mockListChats` ran (so the
  archived filter is what made creation proceed) and that the new chat
  carries the locked label set.
* The existing-chat-id-wins test now mocks `getChat` and asserts the
  output shape (status, title) instead of accidentally exercising the
  catch-fallback path.

DEREM-14: added a test asserting that `listChats` rejecting propagates
through the action with operation context and does not silently fall
through to chat creation.
mafredri added a commit that referenced this pull request May 11, 2026
Round 1 review on PR #20 surfaced twelve findings (three P2, seven P3,
two nits) clustered around: stale documentation about which events
deliver `sender`, opaque intermediate failure messages, sender input
not validated like the explicit `github-user-id`, schedule events
silently misattributing the chat, and several test gaps.

What changed:

- Resolver: tighten the sender guard to `> 0` to match the
  `github-user-id` Zod constraint (rejects 0 and NaN). Refuse
  `schedule` events explicitly because `context.actor` for
  cron-triggered runs is the workflow file's last editor, not a
  triggering identity. Wrap each intermediate API call so failures
  name the auto-resolved source (sender id or actor login),
  preserve the upstream error, and recommend `coder-username` as
  the bypass.
- Constructor: `context` is now required. The seventeen pre-existing
  test sites pass `createMockContext()` as the fourth argument, so a
  future refactor that drops the production-side context wiring fails
  at compile time.
- Comments and README: replace the inaccurate `workflow_dispatch`
  references in the actor-fallback comment, the README, and a test
  name; document the schedule exclusion and the wrapped-failure
  contract.
- Tests: add the precedence tie-breaker (#1 vs #2), partial-sender
  fallthrough, sender id 0 fallthrough, schedule refusal, and three
  intermediate-failure path tests. Replace the redundant
  `pull_request` test (same code path as the issues case). Add
  return-value assertions to the four happy-path tests.

Out of scope:

- The bare-string `throw` at `coder-client.ts:69` (DEREM-3
  secondary). The sender guard prevents the auto-resolve path from
  reaching it, but the underlying defect remains and is owned by
  S6's slice scope.

Refs CODAGT-284.
Copy link
Copy Markdown
Member

Superseded by a499cd0 (feat: redesign action input and output schema), which cherry-picked the schema fields and outputs from this PR and wired the runtime around them. Thanks for the initial design.

🤖 This comment was created with the help of Coder Agents.

@mafredri mafredri closed this May 11, 2026
@mafredri mafredri deleted the initial-review branch May 11, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants