Skip to content

🤖 feat: add advisor max output tokens experiment setting#3167

Merged
ThomasK33 merged 7 commits intomainfrom
advisor-config-t80t
Apr 13, 2026
Merged

🤖 feat: add advisor max output tokens experiment setting#3167
ThomasK33 merged 7 commits intomainfrom
advisor-config-t80t

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Summary

Add an Advisor Max Output Tokens experiment setting that lets users cap the advisor tool's response length via maxOutputTokens in the generateText() call. Users can choose Unlimited (no cap, today's behavior) or Limited with a positive integer value.

Background

The advisor tool currently has no output length control — it generates responses of arbitrary length. For cost management and response-time tuning, users need a way to limit how many tokens the advisor model produces per invocation.

Implementation

Mirrors the existing advisorMaxUsesPerTurn pattern end-to-end:

  • Config contract: advisorMaxOutputTokens added to ProjectsConfig, on-disk schema, and oRPC API schemas with number | null | undefined semantics (null = explicit Unlimited, undefined = unset/legacy).
  • Runtime plumbing: aiService.ts reads the config, normalizes to number | undefined, and passes through advisorRuntime.maxOutputTokens. The advisor tool conditionally spreads maxOutputTokens into generateText().
  • Settings UI: New "Max Output Tokens" row in AdvisorToolExperimentConfig with Unlimited/Limited select + numeric input, matching the Max Uses / Turn control.
  • Router fix: Two bugs caught during dogfooding — saveConfig handler wasn't persisting the field, and getConfig handler wasn't returning it. Both fixed by adding the field to the oRPC router.

Validation

  • Typecheck, lint, and all 5 advisor tool tests pass (2 new: limited + unlimited propagation)
  • UI dogfooding in sandbox: Limited 1000 → reload → restored; Unlimited → reload → restored
  • Config file inspection confirmed "advisorMaxOutputTokens": 1000 (limited) and null (unlimited)
  • Advisor tool successfully invoked with 128-token cap (65 tokens produced)

Risks

Low — the feature is additive and gated behind the existing advisor experiment toggle. Existing configs remain valid with no migration. undefined/null both map to "do not pass maxOutputTokens", preserving today's behavior.


📋 Implementation Plan

Advisor tool max output tokens — implementation plan

Goal

Add an Advisor Max Output Tokens experiment setting that lets users choose either:

  • Unlimited (preserve today’s behavior: advisor responses are not explicitly capped), or
  • Limited with a free-text numeric value such as 1000 or 2000

Then propagate that setting through config persistence and runtime plumbing so the advisor tool passes it to the generateText(...) call in src/node/services/tools/advisor.ts.

Recommended approach

Approach A — mirror the existing advisor “Max Uses / Turn” pattern end-to-end

  • Recommendation: Yes — this should be the default implementation path.
  • Why: The repo already has the same UX and persistence pattern for advisor max uses, plus an existing generateText({ maxOutputTokens }) precedent elsewhere.
  • Net product-code LoC estimate: ~80–120 LoC across UI, schema/runtime plumbing, and advisor tool wiring.
  • Testing LoC (separate from estimate above): ~25–45 LoC.
  • Estimated lift: small feature / low-risk, usually 0.5–1 engineer day including tests and dogfooding.

Optional fallback slice — backend plumbing first, UI second

  • Recommendation: Only use this if scheduling forces a split rollout.
  • What ships first: persist + plumb advisorMaxOutputTokens into the advisor tool, but do not expose it in Settings yet.
  • Net product-code LoC estimate: ~20–35 LoC for backend-only plumbing.
  • Tradeoff: lower immediate lift, but incomplete for the user story because the setting is not discoverable/editable in the app.

Evidence already gathered

Confirmed touchpoints from repo investigation:

  • src/browser/features/Settings/Sections/AdvisorToolExperimentConfig.tsx
    • already owns advisor model, reasoning level, and Max Uses / Turn UI/state/save flow.
    • contains the closest copyable pattern for Unlimited / Limited + numeric input.
  • src/common/config/schemas/appConfigOnDisk.ts
    • already persists advisorMaxUsesPerTurn as nullable optional config.
  • src/common/orpc/schemas/api.ts
    • already carries advisor config through config get/save schemas.
  • src/common/types/project.ts
    • already defines advisor-related config shape.
  • src/node/config.ts
    • already parses and serializes advisor config.
  • src/node/services/aiService.ts
    • already reads advisor config and assembles advisorRuntime for the tool layer.
  • src/common/utils/tools/tools.ts
    • defines the advisorRuntime contract used by the advisor tool.
  • src/node/services/tools/advisor.ts
    • contains the advisor tool’s generateText(...) call site.
  • src/node/services/system1/system1AgentRunner.ts
    • already demonstrates that generateText(...) accepts maxOutputTokens in this codebase.
  • Likely adjacent tests:
    • src/browser/features/Settings/Sections/ExperimentsSection.advisor.test.tsx
    • src/node/services/tools/advisor.test.ts

Recommended product/technical semantics

Use a new persisted field named:

  • advisorMaxOutputTokens

Recommended meaning:

  • undefined = unset / older config / default path
  • null = user explicitly selected Unlimited
  • positive integer = explicit max output token cap

Runtime interpretation

To preserve today’s behavior, treat both undefined and null as “do not pass maxOutputTokens when invoking the advisor model.

That gives us:

  • no surprise regression for existing users/configs
  • a clean explicit “Unlimited” state in the UI/config
  • future flexibility if we ever decide to add a default cap later

Validation recommendation

  • Accept positive integers only.
  • Do not add a hard-coded upper bound in v1 unless an existing shared constraint already exists.
  • If a provider rejects an excessively high value, let provider/runtime validation surface that naturally rather than guessing a provider-agnostic max in the UI.

Suggested agent breakdown

Lane 1 — Config contract + persistence agent

Owner: backend/config-focused exec agent
Depends on: none

Deliverables:

  • Add advisorMaxOutputTokens to shared config types/schemas.
  • Preserve backward compatibility for existing configs.
  • Ensure config get/save round-trip works.

Files to update:

  • src/common/types/project.ts
  • src/common/config/schemas/appConfigOnDisk.ts
  • src/common/orpc/schemas/api.ts
  • src/node/config.ts

Implementation notes:

  • Mirror the existing nullable optional advisor setting conventions.
  • Reuse existing positive-integer parsing/normalization helpers where possible.
  • Apply defensive programming: assert parsed values are either null, undefined, or a positive integer before storing/emitting.

Quality gate:

  • Run targeted config/schema tests if they already exist.
  • Confirm no type drift in the config API surface.

Lane 2 — Advisor runtime plumbing agent

Owner: backend/runtime-focused exec agent
Depends on: Lane 1 field definition merged or rebased

Deliverables:

  • Read the new config field from runtime config.
  • Extend the advisor runtime contract.
  • Pass maxOutputTokens into the advisor tool’s generateText(...) call only when limited.

Files to update:

  • src/node/services/aiService.ts
  • src/common/utils/tools/tools.ts
  • src/node/services/tools/advisor.ts

Implementation notes:

  • Prefer normalizing null/undefined before the tool call so advisor.ts only sees either:
    • undefined (unlimited / omit field), or
    • a validated positive integer.
  • Add lightweight assertions near the normalization boundary, e.g. impossible states should fail fast in development/tests.
  • Keep the change minimal: this should be a parameter pass-through, not a refactor.

Quality gate:

  • Add/extend a unit test proving:
    • limited config passes maxOutputTokens: <n> into generateText(...)
    • unlimited/unset config omits that parameter

Lane 3 — Experiments settings UI agent

Owner: frontend/settings-focused exec agent
Depends on: Lane 1 schema name locked; can work in parallel once naming/semantics are stable

Deliverables:

  • Add the new setting row to the advisor experiment config UI.
  • Keep the interaction model aligned with Max Uses / Turn.
  • Persist null for Unlimited and a positive integer for Limited.

Primary file:

  • src/browser/features/Settings/Sections/AdvisorToolExperimentConfig.tsx

Likely test file:

  • src/browser/features/Settings/Sections/ExperimentsSection.advisor.test.tsx

Implementation notes:

  • Follow the existing pattern already in the file:
    • mode state: "unlimited" | "limited"
    • draft string state for the input
    • last-known-valid integer
    • shared positive-integer parsing helper
    • save-on-change / debounced save flow already used for advisor settings
  • Recommended UI copy:
    • Label: Max Output Tokens
    • Helper text: Per advisor response.
  • On initial load:
    • if config is null or undefined, show Unlimited
    • if config is a number, show Limited and that value
  • On blur / invalid draft handling:
    • snap back to the last valid integer rather than persisting garbage
  • Avoid introducing a separate custom component unless the file becomes noticeably harder to read; minimal duplication is acceptable here because an existing pattern already exists.

Quality gate:

  • Add/extend UI coverage for:
    • initial Unlimited render from unset/null config
    • switching to Limited and entering 1000
    • switching back to Unlimited
    • invalid input recovery on blur
    • correct saveConfig payload shape

Lane 4 — Integration + regression test agent

Owner: test-focused exec agent
Depends on: Lanes 1–3 available locally

Deliverables:

  • Harden unit/integration coverage so the feature is safe to iterate on later.
  • Verify no regressions to the existing advisor settings save flow.

Suggested test matrix:

  • Node / advisor tool unit test
    • advisorRuntime.maxOutputTokens = 1000 -> generateText called with maxOutputTokens: 1000
    • advisorRuntime.maxOutputTokens = undefined -> maxOutputTokens omitted
  • Config persistence / round-trip
    • serialize + deserialize null
    • serialize + deserialize 2000
    • old config with missing field remains valid
  • Settings UI test
    • render + persist Unlimited
    • render + persist Limited value
    • preserve other advisor settings when only this field changes

Quality gate:

  • Run the narrowest targeted tests first, then one broader suite covering touched layers.

Execution sequence

Phase 0 — Lock semantics and file ownership

  1. Confirm the feature name is advisorMaxOutputTokens everywhere.
  2. Lock the runtime rule: omit maxOutputTokens when Unlimited/unset.
  3. Decide that v1 uses positive integer only with no hard-coded provider-agnostic max.

Exit criteria: all lanes use the same naming and runtime semantics.

Phase 1 — Land shared config contract

  1. Extend shared types and schemas.
  2. Add node config parse/serialize support.
  3. Ensure config APIs can carry the field.

Dogfood / quality gate after Phase 1:

  • Run schema/config-focused tests.
  • Manually inspect a saved config payload in tests/logs to confirm null vs number semantics are preserved.

Phase 2 — Land advisor runtime plumbing

  1. Extend advisorRuntime shape.
  2. Normalize the new field in aiService.
  3. Pass it into generateText(...) only when limited.
  4. Add a focused unit test around the advisor tool call.

Dogfood / quality gate after Phase 2:

  • Capture a terminal screenshot of the focused advisor unit test passing.
  • If easy in a local sandbox, temporarily set a very small cap and verify the advisor response becomes visibly short; otherwise rely on the unit test here and perform full UI dogfood in Phase 4.

Phase 3 — Land the experiments UI

  1. Add the new row to AdvisorToolExperimentConfig.
  2. Wire initial-load state from persisted config.
  3. Wire save payload updates.
  4. Add UI tests.

Dogfood / quality gate after Phase 3:

  • Start the app in a sandbox and capture:
    • one annotated screenshot showing the new control on the Experiments page
    • one repro video for switching Unlimited -> Limited (1000) -> save/reload
    • one repro video for switching Limited -> Unlimited -> save/reload

Phase 4 — Final regression + end-to-end verification

  1. Re-run targeted tests for config, UI, and advisor tool behavior.
  2. Run broader repo validation.
  3. Perform one end-to-end advisor exercise with the setting enabled.
  4. Package dogfood artifacts for reviewer handoff.

Exit criteria: feature works in UI, persists across reload, and reaches the generateText(...) call.

Concrete implementation checklist

1) Shared config and IPC

  • Add advisorMaxOutputTokens?: number | null to the shared project/config type.
  • Add nullable-positive-int schema support in on-disk config schema.
  • Add the field to config get/save API schemas using the repo’s .nullish() convention for tool/input-like API payloads where applicable.
  • Ensure getConfig() exposes the field and saveConfig() accepts it.

2) Node config handling

  • Parse the field from disk using the same family of helpers used for advisor numeric config today.
  • Serialize null and numbers faithfully.
  • Leave missing/legacy configs untouched.

3) Runtime plumbing

  • Read cfg.advisorMaxOutputTokens where advisor runtime values are assembled in src/node/services/aiService.ts.
  • Normalize to either undefined or a positive integer.
  • Extend ToolConfiguration["advisorRuntime"] so the tool layer can receive the value.

4) Advisor tool execution

  • Thread maxOutputTokens into src/node/services/tools/advisor.ts.
  • Keep the generate call shape minimal; do not add provider-specific branching.
  • Assert impossible values early in development/test builds.

5) Settings UI

  • Mirror the existing “Max Uses / Turn” control structure.
  • Use the same free-text numeric input pattern rather than a hard-coded preset list.
  • Ensure the setting does not clobber model/thinking/max-uses fields when saved.
  • Keep helper copy short and parallel with the existing advisor controls.

6) Tests

  • UI test for render/edit/save/reload semantics.
  • Backend unit test for generateText(...) arg propagation.
  • Config round-trip coverage for number/null/missing.

Dogfooding plan

This feature touches the settings UI and advisor execution path, so dogfooding must include both UI evidence and one runtime behavior check.

Sandbox setup

Use the repo’s sandboxed dev server flow so testing does not collide with other local mux instances.

Recommended command:

BACKEND_PORT=3001 VITE_PORT=5174 make dev-server-sandbox

Notes derived from the skill:

  • this gives the run its own temporary MUX_ROOT
  • default behavior copies provider/config state if present, which is helpful for advisor testing
  • pinning ports makes browser automation deterministic

Browser automation setup

Use agent-browser directly (not npx). Prefer named sessions and batch for simple multi-step sequences.

Suggested session name:

  • advisor-max-output-tokens

Suggested evidence directory:

  • ./dogfood-output/advisor-max-output-tokens/

UI dogfood flow (must capture screenshots + video)

  1. Open the sandboxed app with agent-browser.
  2. Navigate to Settings -> Experiments -> Advisor Tool.
  3. Take an annotated screenshot of the page showing the new control.
  4. Start a repro video before interacting.
  5. Switch to Limited, enter 1000, blur/save, and reload the settings view.
  6. Take screenshots for each major step plus the final persisted state.
  7. Stop the video.
  8. Repeat with Unlimited and verify reload persistence.
  9. Check browser console/errors after both flows.

Reviewer artifacts required:

  • annotated screenshot of the control in context
  • video: Unlimited -> Limited (1000) -> persisted after reload
  • video: Limited -> Unlimited -> persisted after reload
  • step screenshots referenced by the videos

Runtime dogfood flow (must capture screenshots + video)

  1. Set Max Output Tokens to a visibly small value such as 64 or 128.
  2. Trigger the advisor tool with a prompt that would normally produce a long answer.
  3. Record the full interaction from before submit through the advisor response.
  4. Capture screenshots of:
    • the setting value in Experiments
    • the prompt before submission
    • the resulting short advisor answer
  5. Repeat with Unlimited and verify the response is materially longer / no cap behavior resumes.

If live-provider dogfooding is blocked by credentials or cost concerns:

  • still capture the UI dogfood artifacts above, and
  • capture a terminal screenshot of the focused backend test proving generateText(...) receives maxOutputTokens when limited and omits it when Unlimited.

agent-browser usage notes to follow

  • Use snapshot -i to discover refs.
  • Re-snapshot after navigation or DOM changes.
  • Prefer wait <selector>, wait --text, or a short fixed wait over wait --load networkidle.
  • Use record start/stop for interactive repros.
  • For human-readable repro videos, pause briefly between actions and capture step screenshots.

Validation plan

Targeted validation during development

  • run focused advisor tool unit tests
  • run focused settings UI tests
  • run any existing config round-trip tests that cover advisor settings

Final validation before handoff

Run the repo-standard broader checks appropriate for touched code:

make typecheck
make test
make lint

If time and environment allow, finish with:

make static-check

Risks and mitigations

Risk: silently changing existing behavior

  • Mitigation: treat missing/unset config as Unlimited so existing users keep current behavior.

Risk: provider-specific max-token quirks

  • Mitigation: keep v1 validation to “positive integer only” and rely on provider/runtime errors rather than inventing an arbitrary shared max.

Risk: UI state drift / saving invalid drafts

  • Mitigation: copy the existing advisor numeric-input pattern, including last-valid-value recovery on blur.

Risk: accidentally clobbering other advisor settings on save

  • Mitigation: extend the existing advisor save payload rather than introducing a separate save path.

Acceptance criteria

Functional

  • Experiments settings show a new Max Output Tokens control for the advisor tool.
  • Users can choose Unlimited or Limited.
  • Limited mode accepts a positive integer typed as free text.
  • The selected value persists across reloads.
  • Unlimited mode persists explicitly and restores correctly.
  • The advisor tool passes maxOutputTokens into generateText(...) only when Limited.

Non-functional

  • Existing configs remain valid with no migration required.
  • Existing advisor settings behavior is unchanged except for the new option.
  • No new console/runtime errors in dogfood flows.
  • Focused tests cover UI persistence and backend propagation.

Handoff notes for the implementation team

  • Keep the implementation intentionally boring: this should look like a sibling of the existing advisor max-uses feature, not a new subsystem.
  • Favor small stacked commits/PR slices in the order above so regressions are easy to isolate.
  • Preserve defensive programming practices: validate/normalize once, assert impossible states, and avoid hiding bad assumptions.
  • When handing back for review, include the dogfood screenshot/video artifacts alongside the test results so reviewers can verify the flow without re-running it.

Generated with mux • Model: anthropic:claude-opus-4-6 • Thinking: xhigh • Cost: $26.93

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Fixed the 2 failing advisor config UI tests — the new Max Output Tokens combobox confused the existing test queries. Updated chooseSelectOption to scope within the correct row container.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Apr 13, 2026
Merged via the queue into main with commit 7a2d18f Apr 13, 2026
24 checks passed
@ThomasK33 ThomasK33 deleted the advisor-config-t80t branch April 13, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant