Skip to content

feat(coderd/x/chatd/chatadvisor): add advisor runtime and tool wrapper#24620

Merged
ThomasK33 merged 13 commits into
mainfrom
feat/advisor-02-chatadvisor-pkg
Apr 30, 2026
Merged

feat(coderd/x/chatd/chatadvisor): add advisor runtime and tool wrapper#24620
ThomasK33 merged 13 commits into
mainfrom
feat/advisor-02-chatadvisor-pkg

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

@ThomasK33 ThomasK33 commented Apr 22, 2026

Summary

Introduce the coderd/x/chatd/chatadvisor package: the self-contained runtime that performs a nested, tool-less, single-step model call to return strategic guidance to the parent agent. Also ships the thin AgentTool wrapper that the agent exposes as the advisor built-in.

Motivation

The advisor is a "consult before you act" planning step. Keeping its runtime in its own package (rather than inside chatd.go or chattool/) makes the behavior easy to unit-test, keeps chattool/ thin, and avoids import cycles with chatprompt.

Changes

  • chatadvisor/types.go: AdvisorArgs and AdvisorResult (advice / limit_reached / error variants) plus usage metadata. Stable JSON shape for the UI.
  • chatadvisor/guidance.go: the nested advisor system prompt and ParentGuidanceBlock injected into the outer agent (tagged <advisor-guidance>). The nested prompt explicitly tells the advisor it is advising the parent agent, must not address the user, and must not claim actions happened.
  • chatadvisor/handoff.go: builds the advisor input from the already-prepared outer prompt tail (not a fresh DB reload) with hard truncation budgets and a recent-context bias so the advisor sees the exact context the outer model saw.
  • chatadvisor/runner.go: wraps chatloop.Run() in strict one-step mode (Tools: nil, ProviderTools: nil, MaxSteps: 1) so the nested call is structurally incapable of calling tools.
  • chatadvisor/tool.go: the AgentTool wrapper. Validates the question (non-empty, bounded to 2000 runes), invokes the runtime, and maps the result to a JSON tool response.
  • Defensive assertions throughout (question non-empty after trim, positive limits, zero tools on nested call, known result variants, non-negative remaining uses).
  • Unit tests cover question validation, tool-less execution, and each result variant.

Stack context

This is PR 2 of 6 in the advisor feature stack. It depends on PR 1's ExclusiveToolNames hook; no consumer of this package exists yet (that lands in PR 4).

Scope / non-goals

  • No wiring into chatd.
  • No HTTP/API surface.
  • No UI.
  • No separate provider; the runtime reuses the outer chat's resolved model/provider/keys.

Validation

  • go test ./coderd/x/chatd/chatadvisor/...
  • make lint

📋 Implementation Plan (shared across the advisor stack)

Plan: Add a Mux-style advisor tool to coder agents/chatd

Outcome

Add a first-class advisor tool to agent chats in coderd/x/chatd that feels native to Coder:

  • it is a built-in server-side tool, not an MCP/dynamic-tool workaround;
  • it performs a nested tool-less model call for strategic advice;
  • it is exposed only when eligible, and the prompt mentions it only when it is actually available;
  • it is treated as a planning-only tool so it does not run alongside action tools in the same batch;
  • it tracks usage/cost separately enough for operators to reason about it;
  • it has a minimally polished UI in the Agents page;
  • and it ships with explicit dogfooding evidence, including screenshots and repro videos.

Design decisions to lock before coding

  1. Primary architecture: native built-in tool in chattool/, backed by a small chatadvisor package.
  2. Nested model execution: reuse chatd's existing model/provider stack for a one-step, tool-less advisor call rather than inventing a new provider pathway.
  3. Execution policy: treat advisor as an exclusive/planning-only tool; mixed batches must return structured policy errors and force the model to retry cleanly.
  4. Availability: initial rollout is for root agent chats only; disable for child/sub-agent chats until recursion/cost policy is proven.
  5. Prompt sync: use one eligibility boolean to drive both tool registration and advisor guidance injection.
  6. Persistence/cost split: MVP should keep advisor usage visible in result metadata and server metrics; only add DB schema if product/billing explicitly needs queryable advisor-specific cost.
  7. UI scope: generic tool rendering is an acceptable temporary milestone during backend bring-up, but the release candidate should include a dedicated lightweight advisor renderer.

Delivery model

The work should be executed as coordinated workstreams with one integration owner and parallel contributors for low-conflict areas. The integration owner should own coderd/x/chatd/chatd.go because prompt assembly, tool registration, and model resolution all converge there.

Detailed workstreams

Repo evidence used for this plan

Mux reference and current chatd seams

Mux reference implementation

  • src/node/services/tools/advisor.ts — native advisor tool implementation.
  • src/common/constants/advisor.ts — advisor prompt/constants and truncation policy.
  • src/common/utils/tools/tools.ts — conditional tool registration.
  • src/node/services/streamContextBuilder.ts — injects advisor guidance only when the tool is available.

Current chatd seams

  • coderd/x/chatd/chatd.go
    • processChat() — tool assembly, prompt assembly, and chatloop invocation.
    • resolveChatModel() — current model/provider/key resolution seam.
    • type Config struct — server-level chatd configuration surface.
  • coderd/x/chatd/chatloop/chatloop.go
    • Run() — main streaming/model loop.
    • executeTools() — built-in tool execution/batching seam.
  • coderd/x/chatd/chattool/ — built-in tool implementations.
  • site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx — tool renderer dispatch.
  • site/src/pages/AgentsPage/components/ChatConversation/messageParsing.ts and ConversationTimeline.tsx — tool/result merge and rendering flow.

Workstream map and ownership

Workstream Primary owner Main files Can run in parallel? Done when
0. Integration + gating Integration lead coderd/x/chatd/chatd.go No; central merge lane Tool registration, prompt sync, and model selection are wired together
1. Advisor runtime + tool Backend agent new coderd/x/chatd/chatadvisor/, new coderd/x/chatd/chattool/advisor.go Yes Tool can perform a tool-less advisor call in memory and return structured results
2. Planning-only execution policy Chatloop agent coderd/x/chatd/chatloop/chatloop.go, related tests Yes Mixed advisor + action-tool batches are rejected cleanly and deterministically
3. Metrics/usage/config Backend/telemetry agent chatd.go, chatloop/metrics.go, optional config plumbing Partially; coordinate with integration lead Advisor usage is separately visible in metadata/metrics and limits are enforced
4. Frontend rendering Frontend agent site/.../tools/Tool.tsx, new AdvisorTool.tsx, stories Yes after result schema stabilizes Advisor renders as a readable card and story tests pass
5. Dogfood + QA evidence QA agent dev server, Storybook, dogfood output After backend + UI are usable Repro videos, screenshots, and a concise QA report exist

Parallelization rules

  • Do not split coderd/x/chatd/chatd.go across multiple execution agents without an integration lead. That file owns prompt building, tool registration, model resolution, and cost persistence.
  • Workstreams 1 and 2 can be developed in parallel and then stacked onto the integration branch.
  • Workstream 4 should begin once the backend result schema is agreed on, even if the backend is still behind a feature flag.
  • Any agent that needs to re-check Mux behavior should clone coder/mux into a temporary directory (for example, $(mktemp -d)/mux) and inspect it read-only; do not vendor or copy code from Mux directly.

Phase 0 — Preflight and guardrails

Goals

  • Align the team on the smallest shippable architecture.
  • Prevent scope creep into MCP/dynamic-tool/sub-agent variants.
  • Decide upfront what is MVP vs. follow-up.

Tasks

  1. Confirm the MVP boundary.

    • Ship a built-in advisor tool first.
    • Do not make MCP, dynamic tools, or sub-agents the primary implementation.
    • Do not add transient streaming phases in the first backend PR unless they fall out almost for free.
  2. Confirm local workflow hygiene before coding.

    • Ensure the repo is using the project git hooks from scripts/githooks.
    • Do not bypass hooks with --no-verify.
    • Use ./scripts/develop.sh for the full dev server rather than manual build/run commands.
  3. Lock the model-selection policy.

    • Recommended MVP: advisor uses the same resolved provider/model/cost config as the current chat, with advisor-specific max-output and usage caps.
    • Follow-up only if required: add a separate AdvisorModelConfigID-style override that resolves through the existing configCache/model-config path. Do not invent a new free-form provider:model parser if chatd already stores provider/model separately.
  4. Lock the persistence policy.

    • Recommended MVP: no DB migration. Persist advisor-visible metadata in the tool result and record separate metrics in memory/Prometheus.
    • Only if product/billing explicitly asks for queryable advisor cost: add a later DB migration or usage table, following the normal queries/*.sql + make gen workflow.
  5. Create an execution ADR note in the work item or tracking doc.

    • Capture: built-in tool, tool-less nested call, root-chat-only rollout, exclusive execution policy, MVP no-DB-migration default.

Quality gate

  • Everyone on the team can state the same answers to these questions:
    • Is advisor a built-in tool? Yes.
    • Can advisor run with action tools in the same batch? No.
    • Does advisor get tools of its own? No.
    • Is a DB migration required for MVP? No, unless billing insists.

Phase 1 — Build the advisor runtime and tool wrapper

Goals

Create the core advisor implementation in a way that is easy to test and keeps chattool/ thin.

Files to add

  • coderd/x/chatd/chatadvisor/types.go
  • coderd/x/chatd/chatadvisor/guidance.go
  • coderd/x/chatd/chatadvisor/handoff.go
  • coderd/x/chatd/chatadvisor/runtime.go
  • coderd/x/chatd/chatadvisor/runner.go
  • coderd/x/chatd/chattool/advisor.go

Responsibilities by file

  1. types.go

    • Define the input/result schema used by the tool and UI.
    • Keep the result shape close to Mux so the UI and model both have predictable cases.
    • Recommended result variants:
      • advice
      • limit_reached
      • error

    Recommended shape:

    type AdvisorArgs struct {
        Question string `json:"question"`
    }
    
    type AdvisorResult struct {
        Type          string              `json:"type"`
        Advice        string              `json:"advice,omitempty"`
        Error         string              `json:"error,omitempty"`
        AdvisorModel  string              `json:"advisor_model,omitempty"`
        RemainingUses int                 `json:"remaining_uses,omitempty"`
        Usage         *AdvisorUsageResult `json:"usage,omitempty"`
    }
  2. guidance.go

    • Hold two strings:
      • the nested advisor system prompt;
      • the parent-agent guidance block to inject into the outer system prompt.
    • The nested advisor prompt must say, in plain language:
      • you are advising the parent agent;
      • you do not address the end user directly;
      • you do not claim actions happened;
      • you return concise strategic guidance and tradeoffs.
  3. runtime.go

    • Define the per-run runtime state.
    • Recommended fields:
      • resolved model + model config;
      • provider keys/options reused from the outer chat;
      • MaxUsesPerRun;
      • MaxOutputTokens;
      • atomic/current call counter;
      • callback(s) to obtain the current prompt snapshot and current-step snapshot;
      • optional metrics/usage hook.
    • Add fail-fast validation for impossible config: nil model, non-positive limits, empty prompt builders, etc.
  4. handoff.go

    • Build the advisor handoff message from:
      • the explicit question;
      • the exact prompt/messages the parent model just used;
      • the current step's text/reasoning snapshot, if available;
      • the most recent relevant tool outputs, if they are already in the prompt snapshot.
    • Important: use the already-prepared outer prompt tail, not a fresh DB reload. That keeps the advisor aligned with compaction and the exact context the outer model saw.
    • Apply hard truncation budgets with recent-context bias.
  5. runner.go

    • Execute the nested advisor call.
    • Recommended implementation: call chatloop.Run() in an in-memory, one-step mode:
      • Tools: nil
      • ProviderTools: nil
      • MaxSteps: 1
      • PersistStep: capture the assistant output in memory instead of writing DB rows
    • Reuse the existing provider/model/cost path instead of building a second provider runner.
    • Assert that no tool definitions are passed to the nested call.
  6. chattool/advisor.go

    • Keep this file thin and consistent with other built-ins.
    • Responsibilities:
      • decode AdvisorArgs;
      • validate Question is non-empty and bounded;
      • call the chatadvisor runner;
      • return a structured tool response.

Defensive programming requirements

  • Assert Question is non-empty after trimming.
  • Assert runtime limits are positive.
  • Assert the nested advisor call runs with zero tools/provider tools.
  • Assert AdvisorResult.Type is one of the known variants before returning.
  • Assert remaining uses never goes negative.

Acceptance criteria

  • A unit test can call the advisor tool with a fake model and receive a stable advice result.
  • The nested advisor call is impossible to run with tools accidentally attached.
  • The core logic lives in chatadvisor/, not embedded inside chatd.go.

Phase 2 — Wire advisor into chatd and keep prompt/tool availability in sync

Goals

Register the tool in the right place, expose it only when eligible, and inject system guidance only when the tool is present.

Files to modify

  • coderd/x/chatd/chatd.go
  • optionally a small helper file if chatd.go becomes too crowded

Tasks

  1. Compute one eligibility boolean in processChat().
    Recommended inputs:

    • server-level advisor enabled flag;
    • root chat only (chat.ParentChatID == uuid.Nil or equivalent existing root/child check);
    • a usable resolved model/provider exists;
    • optional experiment/workspace/org gate if product wants staged rollout.
  2. Create the runtime once per outer chat run.

    • Use the model/config/keys resolved by resolveChatModel().
    • Reuse provider options from the current chat's ChatModelCallConfig.
    • Set MaxUsesPerRun and MaxOutputTokens from advisor config defaults.
  3. Register the tool in the built-in tool block.

    • Insert after the skill tools and before MCP tools in processChat().
    • Record builtinToolNames["advisor"] = true so metrics stay bounded.
  4. Inject advisor guidance into the outer system prompt using the same boolean.

    • Use chatprompt.InsertSystem() in the same prompt assembly path that already injects user/system instructions.
    • Place the block near the existing instruction insertion, before plan-path/skill context blocks.
    • Wrap the guidance in an explicit tag like <advisor-guidance> so it is easy to spot in tests and future refactors.
  5. Keep advisor out of child chats for the first release.

    • That avoids recursion/cost blowups with spawn_agent / wait_agent flows.
    • Document this explicitly in the rollout notes and tests.

Acceptance criteria

  • If advisor is disabled, neither the tool nor the prompt guidance appears.
  • If advisor is enabled, both the tool and the prompt guidance appear.
  • Root chats can use advisor; child chats cannot.
  • Built-in tool names include advisor so metrics do not collapse it into the generic mcp label.

Phase 3 — Enforce planning-only execution policy in chatloop

Goals

Prevent the model from calling advisor and action tools in the same execution batch.

Files to modify

  • coderd/x/chatd/chatloop/chatloop.go
  • related chatloop tests

Recommended implementation

Keep the MVP small; do not build a general policy engine yet.

  1. Add a minimal field to chatloop.RunOptions, for example:

    ExclusiveToolName *string
  2. In Run() / executeTools(), detect the case where the exclusive tool appears in the same local-tool batch as any other locally executed tool.

  3. When that happens, synthesize structured tool-result errors for the affected calls instead of executing anything in the batch.

    • advisor should receive a clear error like: advisor must be called by itself before action tools.
    • The sibling action tools should receive a paired policy error like: this tool was skipped because advisor must run alone.
  4. Let the outer model see those tool errors and retry cleanly.

    • This is simpler and safer than partial execution or hidden deferral.
    • It preserves deterministic transcript history for debugging.
  5. Pass the just-finished step snapshot into the tool execution context.

    • The advisor runtime should be able to see the current step's text/reasoning content, because that is often the best hint about what the outer model is trying to decide.

Why this is the right fit

  • It matches the intended semantics: advisor is consulted before taking action.
  • It avoids subtle race conditions caused by concurrent built-in tool execution.
  • It keeps the behavior easy to test with fake models.

Acceptance criteria

  • A model-emitted batch containing only advisor succeeds.
  • A model-emitted batch containing advisor plus any other locally executed tool returns deterministic policy errors and executes nothing.
  • Non-advisor tool execution stays unchanged for normal chats.

Phase 4 — Usage limits, metrics, and configuration

Goals

Make advisor safe to operate without over-designing billing/storage in the first release.

Files to modify

  • coderd/x/chatd/chatd.go
  • coderd/x/chatd/chatloop/metrics.go as needed
  • coderd/x/chatd/chatd.go Config struct and constructor path
  • optional follow-up config/db files only if a separate advisor model or persistent billing is required

Tasks

  1. Add explicit server config knobs for MVP.
    Recommended fields on chatd.Config or a nested advisor config struct:

    • AdvisorEnabled bool
    • AdvisorMaxUsesPerRun int
    • AdvisorMaxOutputTokens int64
  2. Track usage per outer run.

    • Reset the counter for each processChat() invocation.
    • Return remaining_uses in the tool result.
    • Return limit_reached when the cap is exhausted.
  3. Expose advisor usage metadata in the tool result.

    • Include model name and token/cost summary if available.
    • Use the same callConfig.Cost calculation path as the outer chat for MVP if advisor reuses the same model.
  4. Record server-side metrics.

    • Count advisor invocations, failures, and latency.
    • Ensure they show up under the built-in tool label advisor.
  5. Optional decision gate: separate advisor model.

    • If product insists on a stronger/different advisor model, add a follow-up config hook that resolves another existing chat model config through the same configCache path.
    • Keep that out of the first landing PR unless it is required for acceptance.
  6. Optional decision gate: queryable advisor cost.

    • If this becomes required, spin a follow-up DB task:
      • update coderd/database/queries/*.sql;
      • add migration files;
      • run make gen;
      • update audit mappings if a new auditable type/field is introduced.

Acceptance criteria

  • Advisor calls are capped per outer run.
  • Limit exhaustion is user-visible in the tool result.
  • Metrics distinguish advisor calls from other built-in tools.
  • MVP does not require a schema migration unless explicitly approved.

Phase 5 — Frontend rendering and Storybook coverage

Goals

Make advisor feel intentional in the Agents UI without blocking the backend on fancy streaming UI.

Files to modify

  • site/src/pages/AgentsPage/components/ChatElements/tools/Tool.tsx
  • new site/src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.tsx
  • Storybook story file(s) in the same tools directory

Delivery strategy

  1. Intermediate milestone during backend bring-up: rely on the existing generic tool renderer if needed.

    • This is acceptable only as a short-lived integration checkpoint.
  2. Release milestone: add a dedicated lightweight AdvisorTool renderer.

    • Reuse existing primitives:
      • ToolCollapsible
      • ToolIcon
      • Response for markdown/prose rendering
      • ScrollArea if the advice can be long
    • Keep styling light and consistent with the Agents page.
    • Do not add unnecessary React memoization in site/src/pages/AgentsPage/; that area is already React-Compiler aware.
  3. Render the structured result states cleanly.

    • advice — readable prose/markdown with optional metadata footer.
    • limit_reached — warning-style message.
    • error — error state with visible fallback text.
    • running — existing tool loading state/spinner is enough for MVP.
  4. Add Storybook coverage instead of ad-hoc component tests.
    Recommended stories:

    • successful advice;
    • running/loading;
    • limit reached;
    • error.
  5. Keep the UI contract narrow.

    • Prefer one text field like advice plus small metadata rather than a deeply nested schema.
    • That keeps the UI resilient to prompt iteration.

Acceptance criteria

  • The advisor tool card renders readable content rather than raw quoted JSON in the final release branch.
  • Running, limit, and error states are visibly distinct.
  • Storybook stories and play assertions cover the new states.
  • Existing tool rendering flows remain unchanged.

Phase 6 — Automated tests and validation gates

Backend tests to add

  1. Advisor runtime/tool tests

    • question validation;
    • tool-less nested execution assertion;
    • success result shaping;
    • limit-reached result shaping;
    • error result shaping.
  2. Prompt/gating tests in chatd

    • advisor disabled ⇒ no tool, no guidance;
    • advisor enabled/root chat ⇒ tool + guidance;
    • child chat ⇒ advisor absent.
  3. Chatloop policy tests

    • advisor alone runs;
    • advisor + action tool mixed batch returns deterministic policy errors;
    • non-advisor tools still execute normally.
  4. Usage/metrics tests

    • per-run cap resets correctly;
    • builtin tool labeling includes advisor;
    • returned metadata includes model/usage summary when available.

Frontend tests to add

  • Storybook play() assertions for the advisor renderer states.
  • Verify expand/collapse behavior and visible fallback text.
  • Verify the message timeline still renders adjacent tools correctly.

Recommended command sequence

Run these as the implementation matures, not only at the end:

  1. Backend-focused gate after phases 1–4:

    • make test RUN=TestAdvisor
    • make test RUN=TestChatloopAdvisor
    • make lint
  2. Frontend-focused gate after phase 5:

    • pnpm test:storybook src/pages/AgentsPage/components/ChatElements/tools/AdvisorTool.stories.tsx
    • pnpm lint
    • pnpm format
  3. Final repo gate before handoff:

    • make pre-commit
    • run any additional targeted make test RUN=... selections covering touched chatd paths

Use the exact new test names the implementing agents create; the names above are recommended anchors, not existing tests.

Dogfooding plan

Principle

Dogfood the change as a real agent feature, not just a unit-tested backend. Per the dogfood and agent-browser skills, the reviewer should get watchable repro videos plus screenshots that make the behavior obvious without reading logs.

Required setup

  1. Start the full dev environment with:
    • ./scripts/develop.sh
  2. If the frontend renderer changes, also start Storybook from site/ with:
    • pnpm storybook --no-open
  3. Use agent-browser directly — never npx agent-browser.
  4. Use named browser sessions and an output folder such as:
    • ./dogfood-output/advisor/
    • with subfolders screenshots/ and videos/

Evidence protocol

For every interactive scenario below:

  1. Start video recording before the action.
  2. Capture step-by-step screenshots at human pace.
  3. Capture one annotated screenshot of the final state.
  4. Stop the recording.
  5. Note the exact pass/fail observation in the QA report.

For static UI states (for example Storybook error/limit cards), an annotated screenshot is sufficient; video is optional but still encouraged by this project’s review preference.

Dogfood scenarios

Scenario A — Happy path in the real Agents UI

Goal: prove that a root agent chat can invoke advisor and produce a readable recommendation before taking further action.

Steps:

  1. Open the Agents page with an advisor-enabled root chat.
  2. Start a repro video.
  3. Send a prompt that should reasonably trigger strategic planning, such as an architecture or multi-tradeoff question.
  4. Capture screenshots of:
    • the prompt before send;
    • the running advisor state;
    • the completed advisor card and the assistant’s follow-up response.
  5. Stop recording.

Pass criteria:

  • advisor appears in the timeline;
  • the rendered result is readable;
  • the assistant can continue after consuming the advisor output.

Scenario B — Advisor unavailable path

Goal: prove the feature is truly gated.

Suggested variants (at least one is required, both are better):

  • feature flag/config off;
  • child/sub-agent chat.

Evidence:

  • annotated screenshot of the chat/tool state showing advisor is absent;
  • short video if toggling the gate live is part of the repro.

Pass criteria:

  • no advisor tool is available;
  • no advisor-specific prompt behavior leaks through.

Scenario C — UI states in Storybook

Goal: prove the renderer handles non-happy states cleanly.

Required story states:

  • success/advice;
  • running;
  • limit reached;
  • error.

Evidence:

  • one screenshot per state;
  • at least one short video showing collapse/expand behavior.

Pass criteria:

  • success renders readable advice;
  • limit/error have visible fallback text;
  • the component behaves like the other tool cards.

Scenario D — Regression sweep of nearby tools

Goal: ensure advisor does not break the surrounding chat timeline.

Check at minimum:

  • another existing built-in tool still renders correctly near advisor;
  • sub-agent/tool cards still expand/collapse normally;
  • no obvious console errors appear in the Agents page during the advisor flow.

Evidence:

  • screenshots of adjacent tool cards;
  • console/error capture if anything suspicious appears.

agent-browser usage notes for the QA agent

  • Prefer agent-browser batch for 2+ sequential commands when no intermediate parsing is needed.
  • Use snapshot -i to discover interactive refs.
  • Re-snapshot after navigation or major DOM changes.
  • Avoid wait --load networkidle unless the page is known to go idle; prefer explicit element/text waits or short fixed waits.
  • Record videos at human pace and include pauses that a reviewer can follow.

Rollout plan

Initial rollout

  • Gate behind a server-side advisor-enabled flag.
  • Enable only for selected internal/root agent chats first.
  • Watch metrics for:
    • invocation count;
    • failure rate;
    • latency;
    • obvious retry loops.

Expansion conditions

Expand beyond the initial rollout only after the following are true:

  • mixed-batch policy behavior is stable;
  • cost impact is understood;
  • frontend UX is readable in production-like dogfood;
  • no recursion surprises have appeared with sub-agent flows.

Explicit non-goals for the first release

  • advisor inside child/sub-agent chats;
  • provider-agnostic streaming phase UI;
  • MCP-based external advisor implementation;
  • mandatory DB-backed advisor cost reporting.

Final acceptance checklist

  • advisor is a built-in chatd tool, not an MCP/dynamic-tool substitute.
  • The nested advisor call is tool-less and bounded to one in-memory step.
  • One eligibility boolean controls both tool registration and prompt guidance injection.
  • Root chats can use advisor; child chats cannot in the initial rollout.
  • Mixed advisor/action batches produce deterministic policy errors instead of partial execution.
  • Per-run usage caps and limit-reached behavior work.
  • Advisor usage is visible in metadata/metrics without forcing a DB migration for MVP.
  • The Agents UI has a readable advisor card and Storybook coverage.
  • Dogfooding produced screenshots and repro videos for the required scenarios.
  • Validation commands (make lint, targeted make test, Storybook tests, make pre-commit) passed before handoff.

Suggested PR split

  1. PR 1 — Backend foundation

    • chatadvisor/ package
    • chattool/advisor.go
    • chatloop exclusive policy
    • chatd gating/prompt sync
    • backend tests
  2. PR 2 — Frontend + QA

    • advisor renderer
    • stories/play assertions
    • dogfood artifacts and QA notes
  3. PR 3 — Optional follow-ups only if demanded by stakeholders

    • separate advisor model override
    • persistent advisor billing/queryability
    • transient phase-stream UX

Generated with mux • Model: anthropic:claude-opus-4-7 • Thinking: max

Copy link
Copy Markdown
Member Author

ThomasK33 commented Apr 22, 2026

@ThomasK33 ThomasK33 changed the title feat(coderd/x/chatd/chatadvisor): add advisor runtime package feat(coderd/x/chatd/chatadvisor): add advisor runtime and tool wrapper Apr 22, 2026
@ThomasK33 ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from b553eb7 to c421906 Compare April 22, 2026 21:58
@ThomasK33 ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from bd3e0bc to 2ae36ac Compare April 22, 2026 21:58
@ThomasK33 ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from c421906 to 6d3c573 Compare April 23, 2026 06:32
@ThomasK33 ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from 2ae36ac to 43ac7dd Compare April 23, 2026 06:32
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 43ac7dd9d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatadvisor/runtime.go Outdated
@ThomasK33 ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from 6d3c573 to ddfd162 Compare April 23, 2026 07:02
@ThomasK33 ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from 43ac7dd to c9441c8 Compare April 23, 2026 07:02
@ThomasK33 ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from ddfd162 to 51dbd80 Compare April 23, 2026 07:04
@ThomasK33 ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from c9441c8 to d387ec4 Compare April 23, 2026 07:04
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ae71a0474

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatadvisor/runner.go Outdated
@ThomasK33 ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from 51dbd80 to 2f6a4e5 Compare April 23, 2026 07:42
@ThomasK33 ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from 2ae71a0 to bdf25bb Compare April 23, 2026 07:42
@ThomasK33 ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from 2f6a4e5 to 41a1c64 Compare April 23, 2026 09:30
@ThomasK33 ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from bdf25bb to 822da65 Compare April 23, 2026 09:30
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cdbb474f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread coderd/x/chatd/chatadvisor/handoff.go Outdated
Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean package structure, well-scoped to the stated PR 2 boundary. The separation from chattool, the use of chatloop.Run in one-step mode, and the regression tests for the provider-options cloning bug are all solid. Six new files, each with a clear single responsibility.

Severity count: 8 P2, 14 P3, 5 Nit.

The P2 findings cluster around three themes: (1) the handoff truncation strategy produces non-contiguous context with invisible gaps, (2) the advisorResultMap/jsonToolResponse chain duplicates what json.Marshal already does, justified by a false cyclic-import claim, and (3) transient failures permanently consume advisor budget slots. The orphaned-stored-responses finding (DEREM-9) is a real provider-side consequence of inheriting Store: true without clearing it.

Test coverage is strong for the happy path but thin for error variants through the tool layer and for the truncation boundary behavior. The truncation test (DEREM-10) accepts 90% message loss without catching it.

Process note: the PR description claims five defensive assertions, but two do not exist in the code ("known result variants" is structural, not asserted; "non-negative remaining uses" is arithmetic clamping, not an assertion), and one asserts the value of its own literal (the panic guards at runner.go:59). The final state is correct, but the safety inventory in the description overstates the runtime checks.

"The backward walk skips any message whose JSON size exceeds remainingChars but keeps searching older, smaller messages. This is bin-packing, not recency selection." (Hisoka)

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/chatadvisor/handoff.go Outdated
Comment thread coderd/x/chatd/chatadvisor/tool.go Outdated
Comment thread coderd/x/chatd/chatadvisor/tool.go Outdated
Comment thread coderd/x/chatd/chatadvisor/runner.go
Comment thread coderd/x/chatd/chatadvisor/runner.go
Comment thread coderd/x/chatd/chatadvisor/tool.go Outdated
Comment thread coderd/x/chatd/chatadvisor/handoff.go Outdated
Comment thread coderd/x/chatd/chatadvisor/runner_test.go
Comment thread coderd/x/chatd/chatadvisor/runtime.go Outdated
Comment thread coderd/x/chatd/chatadvisor/runner_test.go
@ThomasK33 ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from 41a1c64 to e17e45b Compare April 23, 2026 12:19
@ThomasK33 ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from 7cdbb47 to a9271fa Compare April 23, 2026 12:19
@ThomasK33 ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from 1d49276 to 4c251ef Compare April 27, 2026 09:37
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c059b0b903

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return AdvisorResult{}, xerrors.New("advisor question is required")
}

if !rt.tryAcquire() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Charge advisor quota only after successful nested run

RunAdvisor consumes one use via tryAcquire() before the nested model call executes. When chatloop.Run fails or returns no text, the function emits an error result but still burns quota, so a transient provider failure can immediately force later attempts into limit_reached (for example, MaxUsesPerRun=1 means no retry is possible). This also conflicts with the runtime contract that usage increments on successful calls.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase only (zero diff in chatadvisor/ between R10 and R11). Netero and all 4 panel reviewers confirm: no open findings. Tests pass clean.

This is the fifth consecutive clean round (R7, R8 code-clean, R10, R11). Across 11 rounds, 33 findings raised, all resolved. The package is stable.

🤖 This review was automatically generated with Coder Agents.

@ThomasK33 ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from 4c251ef to f4bf5c6 Compare April 27, 2026 10:48
@ThomasK33 ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from c059b0b to 6f4932b Compare April 27, 2026 10:48
@ThomasK33 ThomasK33 force-pushed the feat/advisor-01-chatloop-exclusive-policy branch from f4bf5c6 to ea22b45 Compare April 29, 2026 20:32
@ThomasK33 ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from 6f4932b to 6d4e314 Compare April 29, 2026 20:32
Copy link
Copy Markdown
Member Author

ThomasK33 commented Apr 30, 2026

Merge activity

  • Apr 30, 12:08 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Apr 30, 12:10 PM UTC: Graphite rebased this pull request as part of a merge.
  • Apr 30, 12:22 PM UTC: @ThomasK33 merged this pull request with Graphite.

@ThomasK33 ThomasK33 changed the base branch from feat/advisor-01-chatloop-exclusive-policy to graphite-base/24620 April 30, 2026 12:08
@ThomasK33 ThomasK33 changed the base branch from graphite-base/24620 to main April 30, 2026 12:09
ThomasK33 and others added 13 commits April 30, 2026 12:09
Adds a new chatadvisor package that wraps chatloop.Run in a
single-step, tool-less nested model call for strategic guidance
during a chat turn.

- types.go: AdvisorArgs, AdvisorResult, and result type constants
- guidance.go: nested system prompt + parent guidance block
- runtime.go: RuntimeConfig + Runtime with atomic usage counter
- handoff.go: BuildAdvisorMessages with truncation
- runner.go: RunAdvisor performs the nested tool-less call

The runtime asserts no tools are passed to the nested
chatloop.Run invocation, which protects against accidental
recursion into action tools.

---
_Generated with `mux`_
Thin fantasy.AgentTool wrapper over chatadvisor.Runtime. Lives in
the chatadvisor package (not chattool) so the advisor can depend
on chatloop without a cycle: chattool transitively depends on
chatprompt, which depends on chattool for attachment helpers;
keeping the advisor tool inside chatadvisor sidesteps that loop.

Validates the question is non-empty after trim and bounded to
2000 runes, then serializes the AdvisorResult as a JSON tool
response.

---
_Generated with `mux`_
… options

maps.Clone only copies the ProviderOptions map shell, so pointer
entries like *fantasyopenai.ResponsesProviderOptions are shared with
the parent chat run. chatloop.Run mutates that struct via
clearPreviousResponseID on chain-mode exit, so an advisor invocation
could clear the parent loop's PreviousResponseID mid-turn.

Add cloneProviderOptions that deep-copies the known in-place mutated
entry (OpenAI Responses) and falls through for other types. Cover the
behavior with a test that verifies the parent's PreviousResponseID
survives an advisor run.

Change-Id: Ibe7645cad29f21f748945b0bde87b6d5c4227f7e
Signed-off-by: Thomas Kosiewski <tk@coder.com>
…nvocation

The Runtime-scoped clone introduced in the prior fix is still mutated by
chatloop.Run on chain-mode exit. With a long-lived Runtime, that means
the first RunAdvisor call observes the parent's PreviousResponseID, but
subsequent calls silently run without it, producing inconsistent behavior.

Clone rt.cfg.ProviderOptions per invocation inside RunAdvisor and strip
chain-mode markers (PreviousResponseID) before the nested call. The
advisor passes full history from BuildAdvisorMessages, so chain mode
would be incorrect for it regardless of parent state.

Change-Id: I02939df5a9570eb95c0dd3908ea3d07c03a0d173
Signed-off-by: Thomas Kosiewski <tk@coder.com>
…sor runtime and tool

- Stop advisor truncation walk at oversized messages so the forwarded
  window stays contiguous; previously a skipped oversize left a gap
  where later messages could reference missing context.
- Also force Store=false on reset provider options so nested advisor
  calls do not leave orphaned stored responses on the provider.
- Marshal AdvisorResult directly instead of mirroring its fields into
  a map; introduce a named ResultType so variants are compile-checked.
- Trim RunAdvisor to a single public-API check (empty question) and
  drop vacuous validations/panics that NewRuntime already guarantees.
- Tighten truncation, error, limit-reached, and empty-output tests;
  add tool-layer coverage for error and limit-reached serialization.
- Drop the 'stronger model' claim from the tool description and align
  error wording, doc comments, and helper names with what is actually
  measured or enforced.

Change-Id: Ia6fca65f48cad93d29f21ccc7ab460fcfd83348e
Signed-off-by: Thomas Kosiewski <tk@coder.com>
…sor budget

messageJSONByteCount and advisorConversationJSONByteBudget are named and
documented in bytes, but the implementation called utf8.RuneCount, which
counts Unicode code points. For multibyte content the rune count
underestimates the actual JSON byte size, letting the forwarded snapshot
exceed the intended budget.

Change-Id: Ia080b2fe4f20825d57743c0642f8cb9c442fb858
Signed-off-by: Thomas Kosiewski <tk@coder.com>
…-prompt override and orphan tool blocks

Place AdvisorSystemPrompt after inherited parent system messages so
the advisor contract is the final system directive the model sees;
otherwise, later parent instructions telling the model to address the
end user or invoke tools would override the advisor-only contract.

Drop tool-role messages whose originating assistant tool_call was
truncated out of the recent window. The backward walk picks up tool
results before their assistant tool_call, so a byte-budget cutoff
landing between them leaves the tool_result orphaned; providers
reject prompts with tool_result blocks that have no matching
tool_use block.

Change-Id: Id050cd8e4fb833dc413964ee11477abad5658af3
Signed-off-by: Thomas Kosiewski <tk@coder.com>
…advisor calls

DEREM-31: TestAdvisorRunStripsChainStateAndIsConsistentAcrossCalls
snapshotted only PreviousResponseID, so a regression that removed the
Store=false assignment in resetProviderOptionsForNestedCall would have
passed. Extend the observation struct to capture Store alongside
PreviousResponseID and assert it is explicitly non-nil and false on
every nested call.

Change-Id: I008f0b24cf42b2db038d5267f6348b2756e0ae8f
Signed-off-by: Thomas Kosiewski <tk@coder.com>
…isor handoff

Previously, BuildAdvisorMessages forwarded all parent system messages to
the advisor without a byte budget while the 12KB cap only applied to
non-system messages. On chats with large parent system prompts, the
nested advisor call could exceed the model's context window and surface
as a provider error instead of advice.

Apply a matching JSON byte budget to inherited system messages. Skip any
oversized message instead of dropping the entire system preamble so
smaller foundational instructions still reach the advisor.

Change-Id: I324cd3a56dfec29e8279ff82d3f9134c86e13468
Signed-off-by: Thomas Kosiewski <tk@coder.com>
… truncating

Walk inherited system messages newest-to-oldest when consuming the
advisor system byte budget, then restore original order before
appending. Previously the forward walk consumed the budget with older
entries and silently dropped newer directives once the budget was
exhausted, which conflicts with the nearby assumption that later
system directives win. In large prompts this could cause the advisor
run to miss the most recent constraints (e.g., newly injected safety
or user-instruction blocks) and return guidance based on stale policy.

Change-Id: I84d6f97ad4df6e9595c9bf8e34ea6f3901b9441c
Signed-off-by: Thomas Kosiewski <tk@coder.com>
…ider entry

Move the storeDisabled declaration inside the loop so each matching
ResponsesProviderOptions entry gets its own pointer. Sharing the pointer
across iterations was benign today (only one OpenAI entry, nothing
writes through it afterward), but coupled future mutations together for
no runtime benefit.

Change-Id: I33e28c538d1bf9d92536d8f0f92b53e9cffa2c5b
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Each Runtime instance is scoped to a single outer chat run. Document
that the MaxUsesPerRun counter never resets and that callers must
construct a fresh Runtime per run, so readers do not assume the type
is long-lived or has an internal reset.

Change-Id: Ifa8a0fd7e1eabd4b2c52e6d7c9df7137bc49e507
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Thomas Kosiewski <tk@coder.com>
Previously, RunAdvisor consumed a use via tryAcquire before the nested
chatloop call ran. A transient provider failure or an empty model
response would still burn quota, so callers configured with
MaxUsesPerRun=1 had no chance to retry and the runtime contract
("increments on every successful advisor call") drifted from the code.

Add Runtime.release and refund the use on both error paths so the
quota is consumed only by calls that returned advice.

Change-Id: Iafe7884b5f49c0efc663317ecc831af609ed93d9
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Thomas Kosiewski <tk@coder.com>
@ThomasK33 ThomasK33 force-pushed the feat/advisor-02-chatadvisor-pkg branch from 6d4e314 to 816b8a3 Compare April 30, 2026 12:09
@ThomasK33 ThomasK33 merged commit eaf2609 into main Apr 30, 2026
27 checks passed
@ThomasK33 ThomasK33 deleted the feat/advisor-02-chatadvisor-pkg branch April 30, 2026 12:22
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 30, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants