Skip to content

fix: record debug runs for proposed chat titles#24820

Merged
ThomasK33 merged 4 commits intomainfrom
chat-title-5d1f
Apr 29, 2026
Merged

fix: record debug runs for proposed chat titles#24820
ThomasK33 merged 4 commits intomainfrom
chat-title-5d1f

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

@ThomasK33 ThomasK33 commented Apr 29, 2026

Summary

  • Share manual chat title candidate generation between title proposal and regeneration.
  • Capture title-generation debug runs for proposed titles when chat debug logging is enabled.
  • Invalidate chat debug runs from the rename-dialog Generate flow on success and failure without blocking title proposal UX.
  • Stabilize the auto-promote interrupt test that failed in CI while validating this PR.

Testing

  • go test ./coderd/x/chatd -run TestAutoPromote_InsertFailureSkipsStatusUpdate -count=20
  • go test ./coderd/x/chatd -run 'TestAutoPromote_InsertFailureSkipsStatusUpdate|TestRegenerateChatTitle_PersistsAndBroadcasts|TestProposeChatTitle_DebugRun' -count=1
  • go test ./coderd -run 'TestProposeChatTitle|TestRegenerateChatTitle' -count=1
  • pnpm -C site test -- src/api/queries/chats.test.ts
  • pnpm -C site exec biome format --write src/api/queries/chats.ts src/api/queries/chats.test.ts src/pages/AgentsPage/AgentsPage.tsx
  • pnpm -C site exec biome check --error-on-warnings src/api/queries/chats.ts src/api/queries/chats.test.ts src/pages/AgentsPage/AgentsPage.tsx
  • pnpm -C site run lint:types
  • make fmt
  • make lint

📋 Implementation Plan

Plan: Unify manual chat title proposal and regeneration

Goal

Refactor manual chat title generation so the UI rename-dialog Generate flow
and the direct Regenerate title flow share the same internal generation
path, while preserving their different commit behavior:

  • propose generates a candidate title, records usage, returns the candidate,
    and does not persist the title.
  • regenerate generates a candidate title through the same path, records usage,
    persists the generated title, publishes the title-change event when needed,
    and returns the updated chat.

This should fix the current debug-panel gap where /title/propose bypasses
manual title debug capture even though it performs the same model generation as
/title/regenerate.

Evidence gathered

  • Frontend rename dialog Generate calls onPropose(chat.id) in
    site/src/pages/AgentsPage/components/Sidebar/RenameChatDialog.tsx.
  • site/src/pages/AgentsPage/AgentsPage.tsx implements that as
    API.experimental.proposeChatTitle(chatId) via requestProposeTitle.
  • site/src/api/api.ts maps proposal to
    /api/experimental/chats/${chatId}/title/propose and regeneration to
    /api/experimental/chats/${chatId}/title/regenerate.
  • Backend HTTP handlers live in coderd/exp_chats.go:
    • proposeChatTitle calls api.chatDaemon.ProposeChatTitle.
    • regenerateChatTitle calls api.chatDaemon.RegenerateChatTitle.
  • In coderd/x/chatd/chatd.go, both internal paths already share
    fetchAndGenerateManualTitle, but with different debug behavior:
    • proposeChatTitleWithStore calls
      fetchAndGenerateManualTitle(ctx, store, chat, keys, false).
    • regenerateChatTitleWithStore calls
      fetchAndGenerateManualTitle(ctx, store, chat, keys, true).
  • fetchAndGenerateManualTitle only calls prepareManualTitleDebugRun when
    enableDebug is true, so /title/propose never creates or finalizes a
    title_generation debug run.
  • regenerateChatTitle frontend cache logic already invalidates chat debug runs
    in site/src/api/queries/chats.ts; proposeChatTitle currently bypasses that
    mutation/query invalidation path.

Design direction

Use one shared lower-level generation helper, not a public-method call chain.
RegenerateChatTitle should not literally call ProposeChatTitle, because the
public propose method already handles locking, usage recording, failure
accounting, and returns only a string. Calling it directly from regenerate would
risk double accounting, lock-boundary issues, and loss of model/usage metadata.

Instead, split the backend flow into two explicit phases:

  1. Generate candidate: shared by propose and regenerate.
  2. Commit result: endpoint-specific behavior.

Backend implementation plan

1. Introduce a small result type

In coderd/x/chatd/chatd.go, add an internal result type near the manual title
helpers:

type manualTitleResult struct {
	title       string
	modelConfig database.ChatModelConfig
	usage       fantasy.Usage
	hasMessages bool
}

Keep the type unexported and narrowly scoped to the manual title helpers.

2. Replace the boolean-gated helper with a shared candidate generator

Refactor fetchAndGenerateManualTitle into a helper with intent in the name,
for example:

func (p *Server) generateManualTitleCandidate(
	ctx context.Context,
	store database.Store,
	chat database.Chat,
	keys chatprovider.ProviderAPIKeys,
) (manualTitleResult, error)

This helper should keep the existing behavior from fetchAndGenerateManualTitle:

  • Check usage limits with p.checkUsageLimit.
  • Fetch head and tail messages using the existing manual title message window.
  • Merge messages with mergeManualTitleMessages.
  • Return hasMessages=false when there are no messages.
  • Resolve the manual title model with p.resolveManualTitleModel.
  • Prepare debug recording when the debug service is enabled.
  • Call generateManualTitle.
  • Call finishDebugRun(err) exactly once after generation, using the original
    generation error before wrapping and before usage or persistence work.
  • Wrap generation failures as generate manual title: %w.
  • Preserve manualTitleGenerationError behavior when usage is available, so
    failure accounting through recordManualTitleGenerationFailure still works.
  • Do not call recordManualTitleUsage; usage recording belongs only in the
    endpoint-specific commit path.

Remove the endpoint-controlled enableDebug boolean. The only debug gate should
be the existing runtime policy:

if debugSvc := p.debugService(); debugSvc != nil &&
	debugSvc.IsEnabled(ctx, chat.ID, chat.OwnerID) {
	// prepareManualTitleDebugRun(...)
}

This makes proposal and regeneration consistent while still respecting global,
admin, and user debug-log settings.

3. Keep proposeChatTitleWithStore as the non-persisting commit path

Update proposeChatTitleWithStore to call generateManualTitleCandidate and
then record usage without updating the title:

result, err := p.generateManualTitleCandidate(ctx, store, chat, keys)
if err != nil {
	return "", err
}
if !result.hasMessages {
	return "", nil
}

_, recordErr := recordManualTitleUsage(
	recordCtx,
	store,
	chat,
	result.modelConfig,
	result.usage,
	"",
)

Preserve the existing response behavior: return only result.title and do not
persist the title or bump updated_at.

4. Keep regenerateChatTitleWithStore as the persisting commit path

Update regenerateChatTitleWithStore to call the same candidate helper and then
record usage while updating the title:

result, err := p.generateManualTitleCandidate(ctx, store, chat, keys)
if err != nil {
	return database.Chat{}, err
}
if !result.hasMessages {
	return chat, nil
}

updatedChat, recordErr := recordManualTitleUsage(
	recordCtx,
	store,
	chat,
	result.modelConfig,
	result.usage,
	result.title,
)

Preserve existing behavior:

  • Return the original chat when there are no messages.
  • Return the same error strings for record/update failures where possible.
  • Publish codersdk.ChatWatchEventKindTitleChange only when the persisted title
    differs from the previous title.
  • Return the updated chat.

5. Preserve outer locking and failure accounting

Do not move manual title locks out of the public methods unless tests reveal a
specific need. Preserve current boundaries:

  • ProposeChatTitle resolves keys, acquires/releases the manual title lock,
    delegates to proposeChatTitleWithStore, and uses
    recordManualTitleGenerationFailure on errors.
  • RegenerateChatTitle resolves keys, acquires/releases the manual title lock,
    delegates to regenerateChatTitleWithStore, and uses existing failure
    handling.

This avoids changing concurrency semantics while still unifying generation.

Frontend implementation plan

1. Add a query-aware proposal mutation

In site/src/api/queries/chats.ts, add a mutation helper for proposal, or route
requestProposeTitle through equivalent query logic:

  • Mutation function: API.experimental.proposeChatTitle(chatId).
  • On settle, for both success and failure: invalidate chatDebugRunsKey(chatId)
    via the existing invalidateChatDebugRuns helper. Failed title generations are
    often the runs users need to inspect.
  • Avoid invalidating unrelated chat queries unless the current UX needs it,
    because proposal does not persist title changes.

2. Use the proposal mutation from AgentsPage

In site/src/pages/AgentsPage/AgentsPage.tsx, replace the direct
API.experimental.proposeChatTitle(chatId) call with the query-aware mutation.
Keep the return value as result.title so the rename dialog behavior does not
change.

Tests

Backend tests

Update or add tests near the existing chat title tests:

  • coderd/exp_chats_test.go
    • Keep TestProposeChatTitle/DoesNotPersistTitleOrBumpUpdatedAt passing.
    • Add coverage that proposal still returns a generated title when model
      generation succeeds, if practical with existing helpers.
  • coderd/x/chatd/chatd_internal_test.go
    • Add or update a focused test that ProposeChatTitle creates/finalizes a
      title_generation debug run when debug logging is enabled.
    • Add disabled-debug coverage proving ProposeChatTitle does not create a
      debug run when debug logging is disabled.
    • Keep regeneration tests proving it persists the title and broadcasts
      ChatWatchEventKindTitleChange.
    • Add failure coverage if existing test utilities make it straightforward:
      generation failure should finalize the debug run with failure status and
      preserve recordManualTitleGenerationFailure behavior.
  • Guard against double accounting by asserting the manual title usage recording
    path runs once per successful proposal or regeneration in mock-based tests.

Frontend tests

Update site/src/api/queries/chats.test.ts:

  • Add a proposeChatTitle cache invalidation test mirroring the existing
    regenerateChatTitle invalidates debug runs so the title_generation run surfaces immediately test.
  • Confirm unrelated query keys are not invalidated by proposal.

Update Storybook interaction coverage only if UI behavior changes. The current
rename dialog UI should remain unchanged, so existing stories should still pass.

Validation commands

Run focused checks first:

make test RUN='TestProposeChatTitle|TestRegenerateChatTitle'
make test RUN='TestRegenerateChatTitle_PersistsAndBroadcasts|TestProposeChatTitle'

Run frontend checks scoped to the touched query tests. Use the repository's
existing package scripts after confirming the exact command in package.json or
site scripts, for example the relevant pnpm/Vitest invocation for
site/src/api/queries/chats.test.ts.

Then run required broader validation for touched code:

make fmt
make lint

If Go or generated API types change, run the repository-required generation step
before linting. This refactor should not require database or SDK generation if
no API contracts change.

Dogfooding and quality gates

Backend quality gates

After backend tests pass:

  1. Start the development environment with the repository-supported command:

    ./scripts/develop.sh
  2. Enable chat debug logging for the test user or deployment.

  3. Create or open an agent chat with existing messages.

  4. Use the rename dialog's Generate button.

  5. Confirm the proposed title appears in the input and the persisted chat title
    has not changed yet.

  6. Open the debug panel and confirm a new title_generation run appears.

  7. Save the proposed title and confirm this save does not create a second title
    generation run.

  8. Use the direct Regenerate title action.

  9. Confirm exactly one additional title_generation run appears and the visible
    chat title updates.

Visual evidence for reviewers

Use agent-browser or equivalent browser automation during dogfooding and
capture:

  • Screenshot before clicking Generate in the rename dialog.
  • Screenshot after the generated title appears in the input.
  • Screenshot of the debug panel showing the proposal title_generation run.
  • Screenshot after direct Regenerate updates the visible title.
  • A short screen recording covering Generate, debug-panel verification, Save,
    and direct Regenerate.

Attach screenshots and recording to the final implementation summary or PR notes
if a PR is requested.

Advisor review incorporated

Advisor review agreed with the shared generation helper and separate commit
paths, with these refinements incorporated into this plan:

  • Finalize debug runs immediately after model generation with the original
    generation error, before wrapping and before usage or persistence work.
  • Treat debug runs as model-call observability; later usage-recording or title
    persistence failures should not retroactively change generation status.
  • Add disabled-debug test coverage for /title/propose after removing the
    endpoint-level enableDebug bypass.
  • Invalidate proposal debug runs on both success and failure.
  • Keep generation pure: no usage recording inside the shared candidate helper.
  • Preserve existing failure accounting behavior instead of broadening it as part
    of this refactor.

Acceptance criteria

  • Proposal and regeneration share one internal manual title generation helper.
  • There is no endpoint-level enableDebug=false bypass for manual title model
    calls.
  • /title/propose records and finalizes debug runs when chat debug logging is
    enabled.
  • /title/regenerate continues to record and finalize debug runs when chat debug
    logging is enabled.
  • /title/propose does not persist the generated title or bump updated_at.
  • /title/regenerate persists the generated title and publishes a title-change
    event only when the title changes.
  • Manual title usage is recorded exactly once for each successful proposal or
    regeneration.
  • Generation failures preserve the existing client-facing error behavior and
    failure accounting.
  • The rename-dialog Generate flow invalidates chat debug runs so the debug panel
    can show the new run without a full page reload.

Risks and mitigations

  • Risk: accidental title persistence from proposal. Keep proposal commit code
    passing an empty title to recordManualTitleUsage; keep/update the existing
    DoesNotPersistTitleOrBumpUpdatedAt test.
  • Risk: double usage recording. Keep generation pure with respect to usage
    persistence; only commit helpers should call recordManualTitleUsage.
  • Risk: lock semantics change. Do not move the public lock acquire/release
    boundaries in this refactor.
  • Risk: debug runs created when disabled. Keep debugSvc.IsEnabled as the
    only debug capture gate and test both enabled and disabled behavior where
    practical.
  • Risk: debug panel still does not refresh. Add frontend invalidation for the
    proposal flow, not just backend debug run creation.
  • Risk: debug run status may not match the whole HTTP request status. Debug
    runs represent model-generation status, not later usage-recording or
    title-persistence status. Preserve current semantics by finalizing immediately
    after generation with the original generation error.
  • Risk: increased debug data from proposal clicks. Enabling debug capture for
    /title/propose means rename-dialog Generate can store title-generation
    prompts and responses when debug logging is enabled. This is desired for
    observability and consistent with regeneration, but proposal may be clicked
    more often than direct regeneration.

Generated with mux • Model: openai:gpt-5.5 • Thinking: xhigh

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review
/coder-agents-review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 marked this pull request as ready for review April 29, 2026 12:56
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review
/coder-agents-review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83811489d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread site/src/api/queries/chats.ts Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review
/coder-agents-review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring is clean and structurally correct. Removing the enableDebug boolean and letting the runtime debug policy be the only gate is the right fix at the right level. The manualTitleResult struct is a proportional replacement for the 5-value return. Frontend invalidation scope is correct (only debug runs for proposal, not chat list/detail). All existing tests pass, and the new test covers both enabled and disabled debug paths.

"Boring code. No fight today. ♠" (Hisoka)

2 P3, 1 P4, 3 Nits. The P3s are about test coverage and documentation gaps, not code correctness. The core generation and commit paths are verified by the existing test suite.

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/chatd_test.go
Comment thread coderd/x/chatd/chatd.go
Comment thread coderd/x/chatd/chatd_test.go
Comment thread coderd/x/chatd/chatd.go Outdated
Comment thread site/src/pages/AgentsPage/AgentsPage.tsx Outdated
Comment thread coderd/x/chatd/chatd_test.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c233c8e69

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread site/src/api/queries/chats.ts Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review
/coder-agents-review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All six R1 findings verified as fixed. The refactoring is clean: enableDebug boolean removed, generateManualTitleCandidate documented with its no-usage-recording contract, failure-path debug finalization tested, double-accounting guard in place, type renamed, helper reused. The auto-promote test stabilization is a welcome bonus.

One new Nit on the insertUserTextMessage parameter name. Otherwise, the PR is ready.

"Boring code. No fight today. ♠" (Hisoka, again)

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/chatd_test.go Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review
/coder-agents-review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R3: DEREM-8 (variadic parameter rename) verified fixed. No new findings from the panel (14 reviewers). All findings from R1 and R2 are addressed. The PR is clean.

🤖 This review was automatically generated with Coder Agents.

@ThomasK33 ThomasK33 enabled auto-merge (squash) April 29, 2026 14:32
@ThomasK33 ThomasK33 merged commit ab75e46 into main Apr 29, 2026
29 checks passed
@ThomasK33 ThomasK33 deleted the chat-title-5d1f branch April 29, 2026 14:45
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 29, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants