feat(coderd/x/chatd/chatadvisor): add advisor runtime and tool wrapper by ThomasK33 · Pull Request #24620 · coder/coder

ThomasK33 · 2026-04-22T13:44:33Z

Summary

Introduce the coderd/x/chatd/chatadvisor package: the self-contained runtime that performs a nested, tool-less, single-step model call to return strategic guidance to the parent agent. Also ships the thin AgentTool wrapper that the agent exposes as the advisor built-in.

Motivation

The advisor is a "consult before you act" planning step. Keeping its runtime in its own package (rather than inside chatd.go or chattool/) makes the behavior easy to unit-test, keeps chattool/ thin, and avoids import cycles with chatprompt.

Changes

chatadvisor/types.go: AdvisorArgs and AdvisorResult (advice / limit_reached / error variants) plus usage metadata. Stable JSON shape for the UI.
chatadvisor/guidance.go: the nested advisor system prompt and ParentGuidanceBlock injected into the outer agent (tagged <advisor-guidance>). The nested prompt explicitly tells the advisor it is advising the parent agent, must not address the user, and must not claim actions happened.
chatadvisor/handoff.go: builds the advisor input from the already-prepared outer prompt tail (not a fresh DB reload) with hard truncation budgets and a recent-context bias so the advisor sees the exact context the outer model saw.
chatadvisor/runner.go: wraps chatloop.Run() in strict one-step mode (Tools: nil, ProviderTools: nil, MaxSteps: 1) so the nested call is structurally incapable of calling tools.
chatadvisor/tool.go: the AgentTool wrapper. Validates the question (non-empty, bounded to 2000 runes), invokes the runtime, and maps the result to a JSON tool response.
Defensive assertions throughout (question non-empty after trim, positive limits, zero tools on nested call, known result variants, non-negative remaining uses).
Unit tests cover question validation, tool-less execution, and each result variant.

Stack context

This is PR 2 of 6 in the advisor feature stack. It depends on PR 1's ExclusiveToolNames hook; no consumer of this package exists yet (that lands in PR 4).

Scope / non-goals

No wiring into chatd.
No HTTP/API surface.
No UI.
No separate provider; the runtime reuses the outer chat's resolved model/provider/keys.

Validation

go test ./coderd/x/chatd/chatadvisor/...
make lint

📋 Implementation Plan (shared across the advisor stack)

Plan: Add a Mux-style advisor tool to coder agents/chatd

Outcome

Add a first-class advisor tool to agent chats in coderd/x/chatd that feels native to Coder:

it is a built-in server-side tool, not an MCP/dynamic-tool workaround;
it performs a nested tool-less model call for strategic advice;
it is exposed only when eligible, and the prompt mentions it only when it is actually available;
it is treated as a planning-only tool so it does not run alongside action tools in the same batch;
it tracks usage/cost separately enough for operators to reason about it;
it has a minimally polished UI in the Agents page;
and it ships with explicit dogfooding evidence, including screenshots and repro videos.

Design decisions to lock before coding

Primary architecture: native built-in tool in chattool/, backed by a small chatadvisor package.
Nested model execution: reuse chatd's existing model/provider stack for a one-step, tool-less advisor call rather than inventing a new provider pathway.
Execution policy: treat advisor as an exclusive/planning-only tool; mixed batches must return structured policy errors and force the model to retry cleanly.
Availability: initial rollout is for root agent chats only; disable for child/sub-agent chats until recursion/cost policy is proven.
Prompt sync: use one eligibility boolean to drive both tool registration and advisor guidance injection.
Persistence/cost split: MVP should keep advisor usage visible in result metadata and server metrics; only add DB schema if product/billing explicitly needs queryable advisor-specific cost.
UI scope: generic tool rendering is an acceptable temporary milestone during backend bring-up, but the release candidate should include a dedicated lightweight advisor renderer.

Delivery model

The work should be executed as coordinated workstreams with one integration owner and parallel contributors for low-conflict areas. The integration owner should own coderd/x/chatd/chatd.go because prompt assembly, tool registration, and model resolution all converge there.

Detailed workstreams

Repo evidence used for this plan

Mux reference and current chatd seams

Mux reference implementation

src/node/services/tools/advisor.ts — native advisor tool implementation.
src/common/constants/advisor.ts — advisor prompt/constants and truncation policy.
src/common/utils/tools/tools.ts — conditional tool registration.
src/node/services/streamContextBuilder.ts — injects advisor guidance only when the tool is available.

Current chatd seams

coderd/x/chatd/chatd.go
- processChat() — tool assembly, prompt assembly, and chatloop invocation.
- resolveChatModel() — current model/provider/key resolution seam.
- type Config struct — server-level chatd configuration surface.
coderd/x/chatd/chatloop/chatloop.go
- Run() — main streaming/model loop.
- executeTools() — built-in tool execution/batching seam.
coderd/x/chatd/chattool/ — built-in tool implementations.
site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx — tool renderer dispatch.
site/src/pages/AgentsPage/components/ChatConversation/messageParsing.ts and ConversationTimeline.tsx — tool/result merge and rendering flow.

Workstream map and ownership

Workstream	Primary owner	Main files	Can run in parallel?	Done when
0. Integration + gating	Integration lead	`coderd/x/chatd/chatd.go`	No; central merge lane	Tool registration, prompt sync, and model selection are wired together
1. Advisor runtime + tool	Backend agent	new `coderd/x/chatd/chatadvisor/`, new `coderd/x/chatd/chattool/advisor.go`	Yes	Tool can perform a tool-less advisor call in memory and return structured results
2. Planning-only execution policy	Chatloop agent	`coderd/x/chatd/chatloop/chatloop.go`, related tests	Yes	Mixed `advisor` + action-tool batches are rejected cleanly and deterministically
3. Metrics/usage/config	Backend/telemetry agent	`chatd.go`, `chatloop/metrics.go`, optional config plumbing	Partially; coordinate with integration lead	Advisor usage is separately visible in metadata/metrics and limits are enforced
4. Frontend rendering	Frontend agent	`site/.../tools/Tool.tsx`, new `AdvisorTool.tsx`, stories	Yes after result schema stabilizes	Advisor renders as a readable card and story tests pass
5. Dogfood + QA evidence	QA agent	dev server, Storybook, dogfood output	After backend + UI are usable	Repro videos, screenshots, and a concise QA report exist

Parallelization rules

Do not split coderd/x/chatd/chatd.go across multiple execution agents without an integration lead. That file owns prompt building, tool registration, model resolution, and cost persistence.
Workstreams 1 and 2 can be developed in parallel and then stacked onto the integration branch.
Workstream 4 should begin once the backend result schema is agreed on, even if the backend is still behind a feature flag.
Any agent that needs to re-check Mux behavior should clone coder/mux into a temporary directory (for example, $(mktemp -d)/mux) and inspect it read-only; do not vendor or copy code from Mux directly.

Phase 0 — Preflight and guardrails

Goals

Align the team on the smallest shippable architecture.
Prevent scope creep into MCP/dynamic-tool/sub-agent variants.
Decide upfront what is MVP vs. follow-up.

Tasks

Confirm the MVP boundary.
- Ship a built-in advisor tool first.
- Do not make MCP, dynamic tools, or sub-agents the primary implementation.
- Do not add transient streaming phases in the first backend PR unless they fall out almost for free.
Confirm local workflow hygiene before coding.
- Ensure the repo is using the project git hooks from scripts/githooks.
- Do not bypass hooks with --no-verify.
- Use ./scripts/develop.sh for the full dev server rather than manual build/run commands.
Lock the model-selection policy.
- Recommended MVP: advisor uses the same resolved provider/model/cost config as the current chat, with advisor-specific max-output and usage caps.
- Follow-up only if required: add a separate AdvisorModelConfigID-style override that resolves through the existing configCache/model-config path. Do not invent a new free-form provider:model parser if chatd already stores provider/model separately.
Lock the persistence policy.
- Recommended MVP: no DB migration. Persist advisor-visible metadata in the tool result and record separate metrics in memory/Prometheus.
- Only if product/billing explicitly asks for queryable advisor cost: add a later DB migration or usage table, following the normal queries/*.sql + make gen workflow.
Create an execution ADR note in the work item or tracking doc.
- Capture: built-in tool, tool-less nested call, root-chat-only rollout, exclusive execution policy, MVP no-DB-migration default.

Quality gate

Everyone on the team can state the same answers to these questions:
- Is advisor a built-in tool? Yes.
- Can advisor run with action tools in the same batch? No.
- Does advisor get tools of its own? No.
- Is a DB migration required for MVP? No, unless billing insists.

Phase 1 — Build the advisor runtime and tool wrapper

Goals

Create the core advisor implementation in a way that is easy to test and keeps chattool/ thin.

Files to add

coderd/x/chatd/chatadvisor/types.go
coderd/x/chatd/chatadvisor/guidance.go
coderd/x/chatd/chatadvisor/handoff.go
coderd/x/chatd/chatadvisor/runtime.go
coderd/x/chatd/chatadvisor/runner.go
coderd/x/chatd/chattool/advisor.go

Responsibilities by file

types.go

Define the input/result schema used by the tool and UI.
Keep the result shape close to Mux so the UI and model both have predictable cases.
Recommended result variants:
- advice
- limit_reached
- error

Recommended shape:

type AdvisorArgs struct {
    Question string `json:"question"`
}

type AdvisorResult struct {
    Type          string              `json:"type"`
    Advice        string              `json:"advice,omitempty"`
    Error         string              `json:"error,omitempty"`
    AdvisorModel  string              `json:"advisor_model,omitempty"`
    RemainingUses int                 `json:"remaining_uses,omitempty"`
    Usage         *AdvisorUsageResult `json:"usage,omitempty"`
}

guidance.go
- Hold two strings:
  - the nested advisor system prompt;
  - the parent-agent guidance block to inject into the outer system prompt.
- The nested advisor prompt must say, in plain language:
  - you are advising the parent agent;
  - you do not address the end user directly;
  - you do not claim actions happened;
  - you return concise strategic guidance and tradeoffs.
runtime.go
- Define the per-run runtime state.
- Recommended fields:
  - resolved model + model config;
  - provider keys/options reused from the outer chat;
  - MaxUsesPerRun;
  - MaxOutputTokens;
  - atomic/current call counter;
  - callback(s) to obtain the current prompt snapshot and current-step snapshot;
  - optional metrics/usage hook.
- Add fail-fast validation for impossible config: nil model, non-positive limits, empty prompt builders, etc.
handoff.go
- Build the advisor handoff message from:
  - the explicit question;
  - the exact prompt/messages the parent model just used;
  - the current step's text/reasoning snapshot, if available;
  - the most recent relevant tool outputs, if they are already in the prompt snapshot.
- Important: use the already-prepared outer prompt tail, not a fresh DB reload. That keeps the advisor aligned with compaction and the exact context the outer model saw.
- Apply hard truncation budgets with recent-context bias.
runner.go
- Execute the nested advisor call.
- Recommended implementation: call chatloop.Run() in an in-memory, one-step mode:
  - Tools: nil
  - ProviderTools: nil
  - MaxSteps: 1
  - PersistStep: capture the assistant output in memory instead of writing DB rows
- Reuse the existing provider/model/cost path instead of building a second provider runner.
- Assert that no tool definitions are passed to the nested call.
chattool/advisor.go
- Keep this file thin and consistent with other built-ins.
- Responsibilities:
  - decode AdvisorArgs;
  - validate Question is non-empty and bounded;
  - call the chatadvisor runner;
  - return a structured tool response.

Defensive programming requirements

Assert Question is non-empty after trimming.
Assert runtime limits are positive.
Assert the nested advisor call runs with zero tools/provider tools.
Assert AdvisorResult.Type is one of the known variants before returning.
Assert remaining uses never goes negative.

Acceptance criteria

A unit test can call the advisor tool with a fake model and receive a stable advice result.
The nested advisor call is impossible to run with tools accidentally attached.
The core logic lives in chatadvisor/, not embedded inside chatd.go.

Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync

Goals

Register the tool in the right place, expose it only when eligible, and inject system guidance only when the tool is present.

Files to modify

coderd/x/chatd/chatd.go
optionally a small helper file if chatd.go becomes too crowded

Tasks

Compute one eligibility boolean in processChat().
Recommended inputs:
- server-level advisor enabled flag;
- root chat only (chat.ParentChatID == uuid.Nil or equivalent existing root/child check);
- a usable resolved model/provider exists;
- optional experiment/workspace/org gate if product wants staged rollout.
Create the runtime once per outer chat run.
- Use the model/config/keys resolved by resolveChatModel().
- Reuse provider options from the current chat's ChatModelCallConfig.
- Set MaxUsesPerRun and MaxOutputTokens from advisor config defaults.
Register the tool in the built-in tool block.
- Insert after the skill tools and before MCP tools in processChat().
- Record builtinToolNames["advisor"] = true so metrics stay bounded.
Inject advisor guidance into the outer system prompt using the same boolean.
- Use chatprompt.InsertSystem() in the same prompt assembly path that already injects user/system instructions.
- Place the block near the existing instruction insertion, before plan-path/skill context blocks.
- Wrap the guidance in an explicit tag like <advisor-guidance> so it is easy to spot in tests and future refactors.
Keep advisor out of child chats for the first release.
- That avoids recursion/cost blowups with spawn_agent / wait_agent flows.
- Document this explicitly in the rollout notes and tests.

Acceptance criteria

If advisor is disabled, neither the tool nor the prompt guidance appears.
If advisor is enabled, both the tool and the prompt guidance appear.
Root chats can use advisor; child chats cannot.
Built-in tool names include advisor so metrics do not collapse it into the generic mcp label.

Phase 3 — Enforce planning-only execution policy in `chatloop`

Goals

Prevent the model from calling advisor and action tools in the same execution batch.

Files to modify

coderd/x/chatd/chatloop/chatloop.go
related chatloop tests

Recommended implementation

Keep the MVP small; do not build a general policy engine yet.

Add a minimal field to chatloop.RunOptions, for example:
```
ExclusiveToolName *string
```
In Run() / executeTools(), detect the case where the exclusive tool appears in the same local-tool batch as any other locally executed tool.
When that happens, synthesize structured tool-result errors for the affected calls instead of executing anything in the batch.
- advisor should receive a clear error like: advisor must be called by itself before action tools.
- The sibling action tools should receive a paired policy error like: this tool was skipped because advisor must run alone.
Let the outer model see those tool errors and retry cleanly.
- This is simpler and safer than partial execution or hidden deferral.
- It preserves deterministic transcript history for debugging.
Pass the just-finished step snapshot into the tool execution context.
- The advisor runtime should be able to see the current step's text/reasoning content, because that is often the best hint about what the outer model is trying to decide.

Why this is the right fit

It matches the intended semantics: advisor is consulted before taking action.
It avoids subtle race conditions caused by concurrent built-in tool execution.
It keeps the behavior easy to test with fake models.

Acceptance criteria

A model-emitted batch containing only advisor succeeds.
A model-emitted batch containing advisor plus any other locally executed tool returns deterministic policy errors and executes nothing.
Non-advisor tool execution stays unchanged for normal chats.

Phase 4 — Usage limits, metrics, and configuration

Goals

Make advisor safe to operate without over-designing billing/storage in the first release.

Files to modify

coderd/x/chatd/chatd.go
coderd/x/chatd/chatloop/metrics.go as needed
coderd/x/chatd/chatd.go Config struct and constructor path
optional follow-up config/db files only if a separate advisor model or persistent billing is required

Tasks

Add explicit server config knobs for MVP.
Recommended fields on chatd.Config or a nested advisor config struct:
- AdvisorEnabled bool
- AdvisorMaxUsesPerRun int
- AdvisorMaxOutputTokens int64
Track usage per outer run.
- Reset the counter for each processChat() invocation.
- Return remaining_uses in the tool result.
- Return limit_reached when the cap is exhausted.
Expose advisor usage metadata in the tool result.
- Include model name and token/cost summary if available.
- Use the same callConfig.Cost calculation path as the outer chat for MVP if advisor reuses the same model.
Record server-side metrics.
- Count advisor invocations, failures, and latency.
- Ensure they show up under the built-in tool label advisor.
Optional decision gate: separate advisor model.
- If product insists on a stronger/different advisor model, add a follow-up config hook that resolves another existing chat model config through the same configCache path.
- Keep that out of the first landing PR unless it is required for acceptance.
Optional decision gate: queryable advisor cost.
- If this becomes required, spin a follow-up DB task:
  - update coderd/database/queries/*.sql;
  - add migration files;
  - run make gen;
  - update audit mappings if a new auditable type/field is introduced.

Acceptance criteria

Advisor calls are capped per outer run.
Limit exhaustion is user-visible in the tool result.
Metrics distinguish advisor calls from other built-in tools.
MVP does not require a schema migration unless explicitly approved.

Phase 5 — Frontend rendering and Storybook coverage

Goals

Make advisor feel intentional in the Agents UI without blocking the backend on fancy streaming UI.

Files to modify

site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx
new site/src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.tsx
Storybook story file(s) in the same tools directory

Delivery strategy

Intermediate milestone during backend bring-up: rely on the existing generic tool renderer if needed.
- This is acceptable only as a short-lived integration checkpoint.
Release milestone: add a dedicated lightweight AdvisorTool renderer.
- Reuse existing primitives:
  - ToolCollapsible
  - ToolIcon
  - Response for markdown/prose rendering
  - ScrollArea if the advice can be long
- Keep styling light and consistent with the Agents page.
- Do not add unnecessary React memoization in site/src/pages/AgentsPage/; that area is already React-Compiler aware.
Render the structured result states cleanly.
- advice — readable prose/markdown with optional metadata footer.
- limit_reached — warning-style message.
- error — error state with visible fallback text.
- running — existing tool loading state/spinner is enough for MVP.
Add Storybook coverage instead of ad-hoc component tests.
Recommended stories:
- successful advice;
- running/loading;
- limit reached;
- error.
Keep the UI contract narrow.
- Prefer one text field like advice plus small metadata rather than a deeply nested schema.
- That keeps the UI resilient to prompt iteration.

Acceptance criteria

The advisor tool card renders readable content rather than raw quoted JSON in the final release branch.
Running, limit, and error states are visibly distinct.
Storybook stories and play assertions cover the new states.
Existing tool rendering flows remain unchanged.

Phase 6 — Automated tests and validation gates

Backend tests to add

Advisor runtime/tool tests
- question validation;
- tool-less nested execution assertion;
- success result shaping;
- limit-reached result shaping;
- error result shaping.
Prompt/gating tests in chatd
- advisor disabled ⇒ no tool, no guidance;
- advisor enabled/root chat ⇒ tool + guidance;
- child chat ⇒ advisor absent.
Chatloop policy tests
- advisor alone runs;
- advisor + action tool mixed batch returns deterministic policy errors;
- non-advisor tools still execute normally.
Usage/metrics tests
- per-run cap resets correctly;
- builtin tool labeling includes advisor;
- returned metadata includes model/usage summary when available.

Frontend tests to add

Storybook play() assertions for the advisor renderer states.
Verify expand/collapse behavior and visible fallback text.
Verify the message timeline still renders adjacent tools correctly.

Recommended command sequence

Run these as the implementation matures, not only at the end:

Backend-focused gate after phases 1–4:
- make test RUN=TestAdvisor
- make test RUN=TestChatloopAdvisor
- make lint
Frontend-focused gate after phase 5:
- pnpm test:storybook src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.stories.tsx
- pnpm lint
- pnpm format
Final repo gate before handoff:
- make pre-commit
- run any additional targeted make test RUN=... selections covering touched chatd paths

Use the exact new test names the implementing agents create; the names above are recommended anchors, not existing tests.

Dogfooding plan

Principle

Dogfood the change as a real agent feature, not just a unit-tested backend. Per the dogfood and agent-browser skills, the reviewer should get watchable repro videos plus screenshots that make the behavior obvious without reading logs.

Required setup

Start the full dev environment with:
- ./scripts/develop.sh
If the frontend renderer changes, also start Storybook from site/ with:
- pnpm storybook --no-open
Use agent-browser directly — never npx agent-browser.
Use named browser sessions and an output folder such as:
- ./dogfood-output/advisor/
- with subfolders screenshots/ and videos/

Evidence protocol

For every interactive scenario below:

Start video recording before the action.
Capture step-by-step screenshots at human pace.
Capture one annotated screenshot of the final state.
Stop the recording.
Note the exact pass/fail observation in the QA report.

For static UI states (for example Storybook error/limit cards), an annotated screenshot is sufficient; video is optional but still encouraged by this project’s review preference.

Dogfood scenarios

Scenario A — Happy path in the real Agents UI

Goal: prove that a root agent chat can invoke advisor and produce a readable recommendation before taking further action.

Steps:

Open the Agents page with an advisor-enabled root chat.
Start a repro video.
Send a prompt that should reasonably trigger strategic planning, such as an architecture or multi-tradeoff question.
Capture screenshots of:
- the prompt before send;
- the running advisor state;
- the completed advisor card and the assistant’s follow-up response.
Stop recording.

Pass criteria:

advisor appears in the timeline;
the rendered result is readable;
the assistant can continue after consuming the advisor output.

Scenario B — Advisor unavailable path

Goal: prove the feature is truly gated.

Suggested variants (at least one is required, both are better):

feature flag/config off;
child/sub-agent chat.

Evidence:

annotated screenshot of the chat/tool state showing advisor is absent;
short video if toggling the gate live is part of the repro.

Pass criteria:

no advisor tool is available;
no advisor-specific prompt behavior leaks through.

Scenario C — UI states in Storybook

Goal: prove the renderer handles non-happy states cleanly.

Required story states:

success/advice;
running;
limit reached;
error.

Evidence:

one screenshot per state;
at least one short video showing collapse/expand behavior.

Pass criteria:

success renders readable advice;
limit/error have visible fallback text;
the component behaves like the other tool cards.

Scenario D — Regression sweep of nearby tools

Goal: ensure advisor does not break the surrounding chat timeline.

Check at minimum:

another existing built-in tool still renders correctly near advisor;
sub-agent/tool cards still expand/collapse normally;
no obvious console errors appear in the Agents page during the advisor flow.

Evidence:

screenshots of adjacent tool cards;
console/error capture if anything suspicious appears.

`agent-browser` usage notes for the QA agent

Prefer agent-browser batch for 2+ sequential commands when no intermediate parsing is needed.
Use snapshot -i to discover interactive refs.
Re-snapshot after navigation or major DOM changes.
Avoid wait --load networkidle unless the page is known to go idle; prefer explicit element/text waits or short fixed waits.
Record videos at human pace and include pauses that a reviewer can follow.

Rollout plan

Initial rollout

Gate behind a server-side advisor-enabled flag.
Enable only for selected internal/root agent chats first.
Watch metrics for:
- invocation count;
- failure rate;
- latency;
- obvious retry loops.

Expansion conditions

Expand beyond the initial rollout only after the following are true:

mixed-batch policy behavior is stable;
cost impact is understood;
frontend UX is readable in production-like dogfood;
no recursion surprises have appeared with sub-agent flows.

Explicit non-goals for the first release

advisor inside child/sub-agent chats;
provider-agnostic streaming phase UI;
MCP-based external advisor implementation;
mandatory DB-backed advisor cost reporting.

Final acceptance checklist

Suggested PR split

PR 1 — Backend foundation
- chatadvisor/ package
- chattool/advisor.go
- chatloop exclusive policy
- chatd gating/prompt sync
- backend tests
PR 2 — Frontend + QA
- advisor renderer
- stories/play assertions
- dogfood artifacts and QA notes
PR 3 — Optional follow-ups only if demanded by stakeholders
- separate advisor model override
- persistent advisor billing/queryability
- transient phase-stream UX

Generated with mux • Model: anthropic:claude-opus-4-7 • Thinking: max

ThomasK33 · 2026-04-22T13:44:52Z

feat: add admin-configurable advisor API, SDK, and queries #24621 : 2 dependent PRs (#24622 , #24624 )
feat(coderd/x/chatd/chatadvisor): add advisor runtime and tool wrapper #24620 👈 (View in Graphite)
feat(coderd/x/chatd/chatloop): add exclusive tool execution policy #24619
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

ThomasK33 · 2026-04-23T06:42:29Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 43ac7dd9d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ThomasK33 · 2026-04-23T07:29:21Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ae71a0474

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ThomasK33 · 2026-04-23T09:51:11Z

@codex review

ThomasK33 · 2026-04-23T09:51:12Z

/coder-agents-review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cdbb474f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coder-agents-review

Clean package structure, well-scoped to the stated PR 2 boundary. The separation from chattool, the use of chatloop.Run in one-step mode, and the regression tests for the provider-options cloning bug are all solid. Six new files, each with a clear single responsibility.

Severity count: 8 P2, 14 P3, 5 Nit.

The P2 findings cluster around three themes: (1) the handoff truncation strategy produces non-contiguous context with invisible gaps, (2) the advisorResultMap/jsonToolResponse chain duplicates what json.Marshal already does, justified by a false cyclic-import claim, and (3) transient failures permanently consume advisor budget slots. The orphaned-stored-responses finding (DEREM-9) is a real provider-side consequence of inheriting Store: true without clearing it.

Test coverage is strong for the happy path but thin for error variants through the tool layer and for the truncation boundary behavior. The truncation test (DEREM-10) accepts 90% message loss without catching it.

Process note: the PR description claims five defensive assertions, but two do not exist in the code ("known result variants" is structural, not asserted; "non-negative remaining uses" is arithmetic clamping, not an assertion), and one asserts the value of its own literal (the panic guards at runner.go:59). The final state is correct, but the safety inventory in the description overstates the runtime checks.

"The backward walk skips any message whose JSON size exceeds remainingChars but keeps searching older, smaller messages. This is bin-packing, not recency selection." (Hisoka)

🤖 This review was automatically generated with Coder Agents.

ThomasK33 · 2026-04-27T09:42:13Z

@codex review

ThomasK33 · 2026-04-27T09:42:14Z

/coder-agents-review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c059b0b903

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T09:48:34Z

+		return AdvisorResult{}, xerrors.New("advisor question is required")
+	}
+
+	if !rt.tryAcquire() {


Charge advisor quota only after successful nested run

RunAdvisor consumes one use via tryAcquire() before the nested model call executes. When chatloop.Run fails or returns no text, the function emits an error result but still burns quota, so a transient provider failure can immediately force later attempts into limit_reached (for example, MaxUsesPerRun=1 means no retry is possible). This also conflicts with the runtime contract that usage increments on successful calls.

Useful? React with 👍 / 👎.

coder-agents-review

Rebase only (zero diff in chatadvisor/ between R10 and R11). Netero and all 4 panel reviewers confirm: no open findings. Tests pass clean.

This is the fifth consecutive clean round (R7, R8 code-clean, R10, R11). Across 11 rounds, 33 findings raised, all resolved. The package is stable.

🤖 This review was automatically generated with Coder Agents.

ThomasK33 · 2026-04-30T12:08:40Z

Merge activity

Apr 30, 12:08 PM UTC: A user started a stack merge that includes this pull request via Graphite.
Apr 30, 12:10 PM UTC: Graphite rebased this pull request as part of a merge.
Apr 30, 12:22 PM UTC: @ThomasK33 merged this pull request with Graphite.

Adds a new chatadvisor package that wraps chatloop.Run in a single-step, tool-less nested model call for strategic guidance during a chat turn. - types.go: AdvisorArgs, AdvisorResult, and result type constants - guidance.go: nested system prompt + parent guidance block - runtime.go: RuntimeConfig + Runtime with atomic usage counter - handoff.go: BuildAdvisorMessages with truncation - runner.go: RunAdvisor performs the nested tool-less call The runtime asserts no tools are passed to the nested chatloop.Run invocation, which protects against accidental recursion into action tools. --- _Generated with `mux`_

Thin fantasy.AgentTool wrapper over chatadvisor.Runtime. Lives in the chatadvisor package (not chattool) so the advisor can depend on chatloop without a cycle: chattool transitively depends on chatprompt, which depends on chattool for attachment helpers; keeping the advisor tool inside chatadvisor sidesteps that loop. Validates the question is non-empty after trim and bounded to 2000 runes, then serializes the AdvisorResult as a JSON tool response. --- _Generated with `mux`_

… options maps.Clone only copies the ProviderOptions map shell, so pointer entries like *fantasyopenai.ResponsesProviderOptions are shared with the parent chat run. chatloop.Run mutates that struct via clearPreviousResponseID on chain-mode exit, so an advisor invocation could clear the parent loop's PreviousResponseID mid-turn. Add cloneProviderOptions that deep-copies the known in-place mutated entry (OpenAI Responses) and falls through for other types. Cover the behavior with a test that verifies the parent's PreviousResponseID survives an advisor run. Change-Id: Ibe7645cad29f21f748945b0bde87b6d5c4227f7e Signed-off-by: Thomas Kosiewski <tk@coder.com>

…nvocation The Runtime-scoped clone introduced in the prior fix is still mutated by chatloop.Run on chain-mode exit. With a long-lived Runtime, that means the first RunAdvisor call observes the parent's PreviousResponseID, but subsequent calls silently run without it, producing inconsistent behavior. Clone rt.cfg.ProviderOptions per invocation inside RunAdvisor and strip chain-mode markers (PreviousResponseID) before the nested call. The advisor passes full history from BuildAdvisorMessages, so chain mode would be incorrect for it regardless of parent state. Change-Id: I02939df5a9570eb95c0dd3908ea3d07c03a0d173 Signed-off-by: Thomas Kosiewski <tk@coder.com>

…sor runtime and tool - Stop advisor truncation walk at oversized messages so the forwarded window stays contiguous; previously a skipped oversize left a gap where later messages could reference missing context. - Also force Store=false on reset provider options so nested advisor calls do not leave orphaned stored responses on the provider. - Marshal AdvisorResult directly instead of mirroring its fields into a map; introduce a named ResultType so variants are compile-checked. - Trim RunAdvisor to a single public-API check (empty question) and drop vacuous validations/panics that NewRuntime already guarantees. - Tighten truncation, error, limit-reached, and empty-output tests; add tool-layer coverage for error and limit-reached serialization. - Drop the 'stronger model' claim from the tool description and align error wording, doc comments, and helper names with what is actually measured or enforced. Change-Id: Ia6fca65f48cad93d29f21ccc7ab460fcfd83348e Signed-off-by: Thomas Kosiewski <tk@coder.com>

…sor budget messageJSONByteCount and advisorConversationJSONByteBudget are named and documented in bytes, but the implementation called utf8.RuneCount, which counts Unicode code points. For multibyte content the rune count underestimates the actual JSON byte size, letting the forwarded snapshot exceed the intended budget. Change-Id: Ia080b2fe4f20825d57743c0642f8cb9c442fb858 Signed-off-by: Thomas Kosiewski <tk@coder.com>

…-prompt override and orphan tool blocks Place AdvisorSystemPrompt after inherited parent system messages so the advisor contract is the final system directive the model sees; otherwise, later parent instructions telling the model to address the end user or invoke tools would override the advisor-only contract. Drop tool-role messages whose originating assistant tool_call was truncated out of the recent window. The backward walk picks up tool results before their assistant tool_call, so a byte-budget cutoff landing between them leaves the tool_result orphaned; providers reject prompts with tool_result blocks that have no matching tool_use block. Change-Id: Id050cd8e4fb833dc413964ee11477abad5658af3 Signed-off-by: Thomas Kosiewski <tk@coder.com>

…advisor calls DEREM-31: TestAdvisorRunStripsChainStateAndIsConsistentAcrossCalls snapshotted only PreviousResponseID, so a regression that removed the Store=false assignment in resetProviderOptionsForNestedCall would have passed. Extend the observation struct to capture Store alongside PreviousResponseID and assert it is explicitly non-nil and false on every nested call. Change-Id: I008f0b24cf42b2db038d5267f6348b2756e0ae8f Signed-off-by: Thomas Kosiewski <tk@coder.com>

…isor handoff Previously, BuildAdvisorMessages forwarded all parent system messages to the advisor without a byte budget while the 12KB cap only applied to non-system messages. On chats with large parent system prompts, the nested advisor call could exceed the model's context window and surface as a provider error instead of advice. Apply a matching JSON byte budget to inherited system messages. Skip any oversized message instead of dropping the entire system preamble so smaller foundational instructions still reach the advisor. Change-Id: I324cd3a56dfec29e8279ff82d3f9134c86e13468 Signed-off-by: Thomas Kosiewski <tk@coder.com>

… truncating Walk inherited system messages newest-to-oldest when consuming the advisor system byte budget, then restore original order before appending. Previously the forward walk consumed the budget with older entries and silently dropped newer directives once the budget was exhausted, which conflicts with the nearby assumption that later system directives win. In large prompts this could cause the advisor run to miss the most recent constraints (e.g., newly injected safety or user-instruction blocks) and return guidance based on stale policy. Change-Id: I84d6f97ad4df6e9595c9bf8e34ea6f3901b9441c Signed-off-by: Thomas Kosiewski <tk@coder.com>

…ider entry Move the storeDisabled declaration inside the loop so each matching ResponsesProviderOptions entry gets its own pointer. Sharing the pointer across iterations was benign today (only one OpenAI entry, nothing writes through it afterward), but coupled future mutations together for no runtime benefit. Change-Id: I33e28c538d1bf9d92536d8f0f92b53e9cffa2c5b Signed-off-by: Thomas Kosiewski <tk@coder.com>

Each Runtime instance is scoped to a single outer chat run. Document that the MaxUsesPerRun counter never resets and that callers must construct a fresh Runtime per run, so readers do not assume the type is long-lived or has an internal reset. Change-Id: Ifa8a0fd7e1eabd4b2c52e6d7c9df7137bc49e507 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Thomas Kosiewski <tk@coder.com>

Previously, RunAdvisor consumed a use via tryAcquire before the nested chatloop call ran. A transient provider failure or an empty model response would still burn quota, so callers configured with MaxUsesPerRun=1 had no chance to retry and the runtime contract ("increments on every successful advisor call") drifted from the code. Add Runtime.release and refund the use on both error paths so the quota is consumed only by calls that returned advice. Change-Id: Iafe7884b5f49c0efc663317ecc831af609ed93d9 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Thomas Kosiewski <tk@coder.com>

github-actions Bot assigned ThomasK33 Apr 22, 2026

ThomasK33 changed the title ~~feat(coderd/x/chatd/chatadvisor): add advisor runtime package~~ feat(coderd/x/chatd/chatadvisor): add advisor runtime and tool wrapper Apr 22, 2026

ThomasK33 mentioned this pull request Apr 22, 2026

feat(site/src): add advisor admin settings UI #24624

Merged

10 tasks

ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from b553eb7 to c421906 Compare April 22, 2026 21:58

ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from bd3e0bc to 2ae36ac Compare April 22, 2026 21:58

ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from c421906 to 6d3c573 Compare April 23, 2026 06:32

ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from 2ae36ac to 43ac7dd Compare April 23, 2026 06:32

chatgpt-codex-connector Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread coderd/x/chatd/chatadvisor/runtime.go Outdated

ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from 6d3c573 to ddfd162 Compare April 23, 2026 07:02

ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from 43ac7dd to c9441c8 Compare April 23, 2026 07:02

ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from ddfd162 to 51dbd80 Compare April 23, 2026 07:04

ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from c9441c8 to d387ec4 Compare April 23, 2026 07:04

chatgpt-codex-connector Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread coderd/x/chatd/chatadvisor/runner.go Outdated

ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from 51dbd80 to 2f6a4e5 Compare April 23, 2026 07:42

ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from 2ae71a0 to bdf25bb Compare April 23, 2026 07:42

ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from 2f6a4e5 to 41a1c64 Compare April 23, 2026 09:30

ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from bdf25bb to 822da65 Compare April 23, 2026 09:30

chatgpt-codex-connector Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread coderd/x/chatd/chatadvisor/handoff.go Outdated

coder-agents-review Bot reviewed Apr 23, 2026

View reviewed changes

ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from 41a1c64 to e17e45b Compare April 23, 2026 12:19

ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from 7cdbb47 to a9271fa Compare April 23, 2026 12:19

ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from 1d49276 to 4c251ef Compare April 27, 2026 09:37

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

coder-agents-review Bot reviewed Apr 27, 2026

View reviewed changes

ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from 4c251ef to f4bf5c6 Compare April 27, 2026 10:48

ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from c059b0b to 6f4932b Compare April 27, 2026 10:48

ibetitsmike approved these changes Apr 28, 2026

View reviewed changes

ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from f4bf5c6 to ea22b45 Compare April 29, 2026 20:32

ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from 6f4932b to 6d4e314 Compare April 29, 2026 20:32

ThomasK33 changed the base branch from feat/advisor-01-chatloop-exclusive-policy to graphite-base/24620 April 30, 2026 12:08

ThomasK33 changed the base branch from graphite-base/24620 to main April 30, 2026 12:09

ThomasK33 and others added 13 commits April 30, 2026 12:09

ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from 6d4e314 to 816b8a3 Compare April 30, 2026 12:09

ThomasK33 merged commit eaf2609 into main Apr 30, 2026
27 checks passed

ThomasK33 deleted the feat/advisor-02-chatadvisor-pkg branch April 30, 2026 12:22

github-actions Bot locked and limited conversation to collaborators Apr 30, 2026

Conversation

ThomasK33 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Stack context

Scope / non-goals

Validation

Plan: Add a Mux-style advisor tool to coder agents/chatd

Outcome

Design decisions to lock before coding

Delivery model

Detailed workstreams

Repo evidence used for this plan

Workstream map and ownership

Parallelization rules

Phase 0 — Preflight and guardrails

Goals

Tasks

Quality gate

Phase 1 — Build the advisor runtime and tool wrapper

Goals

Files to add

Responsibilities by file

Defensive programming requirements

Acceptance criteria

Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync

Goals

Files to modify

Tasks

Acceptance criteria

Phase 3 — Enforce planning-only execution policy in chatloop

Goals

Files to modify

Recommended implementation

Why this is the right fit

Acceptance criteria

Phase 4 — Usage limits, metrics, and configuration

Goals

Files to modify

Tasks

Acceptance criteria

Phase 5 — Frontend rendering and Storybook coverage

Goals

Files to modify

Delivery strategy

Acceptance criteria

Phase 6 — Automated tests and validation gates

Backend tests to add

Frontend tests to add

Recommended command sequence

Dogfooding plan

Principle

Required setup

Evidence protocol

Dogfood scenarios

Scenario A — Happy path in the real Agents UI

Scenario B — Advisor unavailable path

Scenario C — UI states in Storybook

Scenario D — Regression sweep of nearby tools

agent-browser usage notes for the QA agent

Rollout plan

Initial rollout

Expansion conditions

Explicit non-goals for the first release

Final acceptance checklist

Suggested PR split

Uh oh!

ThomasK33 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ThomasK33 commented Apr 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ThomasK33 commented Apr 23, 2026

ThomasK33 commented Apr 22, 2026 •

edited

Loading

Phase 3 — Enforce planning-only execution policy in `chatloop`

`agent-browser` usage notes for the QA agent

ThomasK33 commented Apr 22, 2026 •

edited

Loading

ThomasK33 commented Apr 30, 2026 •

edited

Loading