🤖 feat: add advisor max output tokens experiment setting#3167
🤖 feat: add advisor max output tokens experiment setting#3167
Conversation
|
@codex review |
|
Codex Review: Didn't find any major issues. What shall we delve into next? ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review Fixed the 2 failing advisor config UI tests — the new Max Output Tokens combobox confused the existing test queries. Updated |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Add an Advisor Max Output Tokens experiment setting that lets users cap the advisor tool's response length via
maxOutputTokensin thegenerateText()call. Users can choose Unlimited (no cap, today's behavior) or Limited with a positive integer value.Background
The advisor tool currently has no output length control — it generates responses of arbitrary length. For cost management and response-time tuning, users need a way to limit how many tokens the advisor model produces per invocation.
Implementation
Mirrors the existing
advisorMaxUsesPerTurnpattern end-to-end:advisorMaxOutputTokensadded toProjectsConfig, on-disk schema, and oRPC API schemas withnumber | null | undefinedsemantics (null = explicit Unlimited, undefined = unset/legacy).aiService.tsreads the config, normalizes tonumber | undefined, and passes throughadvisorRuntime.maxOutputTokens. The advisor tool conditionally spreadsmaxOutputTokensintogenerateText().AdvisorToolExperimentConfigwith Unlimited/Limited select + numeric input, matching the Max Uses / Turn control.saveConfighandler wasn't persisting the field, andgetConfighandler wasn't returning it. Both fixed by adding the field to the oRPC router.Validation
"advisorMaxOutputTokens": 1000(limited) andnull(unlimited)Risks
Low — the feature is additive and gated behind the existing advisor experiment toggle. Existing configs remain valid with no migration.
undefined/nullboth map to "do not passmaxOutputTokens", preserving today's behavior.📋 Implementation Plan
Advisor tool max output tokens — implementation plan
Goal
Add an Advisor Max Output Tokens experiment setting that lets users choose either:
1000or2000Then propagate that setting through config persistence and runtime plumbing so the advisor tool passes it to the
generateText(...)call insrc/node/services/tools/advisor.ts.Recommended approach
Approach A — mirror the existing advisor “Max Uses / Turn” pattern end-to-end
generateText({ maxOutputTokens })precedent elsewhere.Optional fallback slice — backend plumbing first, UI second
advisorMaxOutputTokensinto the advisor tool, but do not expose it in Settings yet.Evidence already gathered
Confirmed touchpoints from repo investigation:
src/browser/features/Settings/Sections/AdvisorToolExperimentConfig.tsxsrc/common/config/schemas/appConfigOnDisk.tsadvisorMaxUsesPerTurnas nullable optional config.src/common/orpc/schemas/api.tssrc/common/types/project.tssrc/node/config.tssrc/node/services/aiService.tsadvisorRuntimefor the tool layer.src/common/utils/tools/tools.tsadvisorRuntimecontract used by the advisor tool.src/node/services/tools/advisor.tsgenerateText(...)call site.src/node/services/system1/system1AgentRunner.tsgenerateText(...)acceptsmaxOutputTokensin this codebase.src/browser/features/Settings/Sections/ExperimentsSection.advisor.test.tsxsrc/node/services/tools/advisor.test.tsRecommended product/technical semantics
Use a new persisted field named:
advisorMaxOutputTokensRecommended meaning:
undefined= unset / older config / default pathnull= user explicitly selected UnlimitedRuntime interpretation
To preserve today’s behavior, treat both
undefinedandnullas “do not passmaxOutputTokens” when invoking the advisor model.That gives us:
Validation recommendation
Suggested agent breakdown
Lane 1 — Config contract + persistence agent
Owner: backend/config-focused exec agent
Depends on: none
Deliverables:
advisorMaxOutputTokensto shared config types/schemas.Files to update:
src/common/types/project.tssrc/common/config/schemas/appConfigOnDisk.tssrc/common/orpc/schemas/api.tssrc/node/config.tsImplementation notes:
null,undefined, or a positive integer before storing/emitting.Quality gate:
Lane 2 — Advisor runtime plumbing agent
Owner: backend/runtime-focused exec agent
Depends on: Lane 1 field definition merged or rebased
Deliverables:
maxOutputTokensinto the advisor tool’sgenerateText(...)call only when limited.Files to update:
src/node/services/aiService.tssrc/common/utils/tools/tools.tssrc/node/services/tools/advisor.tsImplementation notes:
null/undefinedbefore the tool call soadvisor.tsonly sees either:undefined(unlimited / omit field), orQuality gate:
maxOutputTokens: <n>intogenerateText(...)Lane 3 — Experiments settings UI agent
Owner: frontend/settings-focused exec agent
Depends on: Lane 1 schema name locked; can work in parallel once naming/semantics are stable
Deliverables:
nullfor Unlimited and a positive integer for Limited.Primary file:
src/browser/features/Settings/Sections/AdvisorToolExperimentConfig.tsxLikely test file:
src/browser/features/Settings/Sections/ExperimentsSection.advisor.test.tsxImplementation notes:
"unlimited" | "limited"Max Output TokensPer advisor response.nullorundefined, show UnlimitedQuality gate:
1000saveConfigpayload shapeLane 4 — Integration + regression test agent
Owner: test-focused exec agent
Depends on: Lanes 1–3 available locally
Deliverables:
Suggested test matrix:
advisorRuntime.maxOutputTokens = 1000->generateTextcalled withmaxOutputTokens: 1000advisorRuntime.maxOutputTokens = undefined->maxOutputTokensomittednull2000Quality gate:
Execution sequence
Phase 0 — Lock semantics and file ownership
advisorMaxOutputTokenseverywhere.maxOutputTokenswhen Unlimited/unset.Exit criteria: all lanes use the same naming and runtime semantics.
Phase 1 — Land shared config contract
Dogfood / quality gate after Phase 1:
nullvs number semantics are preserved.Phase 2 — Land advisor runtime plumbing
advisorRuntimeshape.aiService.generateText(...)only when limited.Dogfood / quality gate after Phase 2:
Phase 3 — Land the experiments UI
AdvisorToolExperimentConfig.Dogfood / quality gate after Phase 3:
1000) -> save/reloadPhase 4 — Final regression + end-to-end verification
Exit criteria: feature works in UI, persists across reload, and reaches the
generateText(...)call.Concrete implementation checklist
1) Shared config and IPC
advisorMaxOutputTokens?: number | nullto the shared project/config type..nullish()convention for tool/input-like API payloads where applicable.getConfig()exposes the field andsaveConfig()accepts it.2) Node config handling
nulland numbers faithfully.3) Runtime plumbing
cfg.advisorMaxOutputTokenswhere advisor runtime values are assembled insrc/node/services/aiService.ts.undefinedor a positive integer.ToolConfiguration["advisorRuntime"]so the tool layer can receive the value.4) Advisor tool execution
maxOutputTokensintosrc/node/services/tools/advisor.ts.5) Settings UI
6) Tests
generateText(...)arg propagation.Dogfooding plan
This feature touches the settings UI and advisor execution path, so dogfooding must include both UI evidence and one runtime behavior check.
Sandbox setup
Use the repo’s sandboxed dev server flow so testing does not collide with other local mux instances.
Recommended command:
Notes derived from the skill:
MUX_ROOTBrowser automation setup
Use
agent-browserdirectly (notnpx). Prefer named sessions andbatchfor simple multi-step sequences.Suggested session name:
advisor-max-output-tokensSuggested evidence directory:
./dogfood-output/advisor-max-output-tokens/UI dogfood flow (must capture screenshots + video)
agent-browser.1000, blur/save, and reload the settings view.Reviewer artifacts required:
1000) -> persisted after reloadRuntime dogfood flow (must capture screenshots + video)
Max Output Tokensto a visibly small value such as64or128.If live-provider dogfooding is blocked by credentials or cost concerns:
generateText(...)receivesmaxOutputTokenswhen limited and omits it when Unlimited.agent-browser usage notes to follow
snapshot -ito discover refs.wait <selector>,wait --text, or a short fixed wait overwait --load networkidle.record start/stopfor interactive repros.Validation plan
Targeted validation during development
Final validation before handoff
Run the repo-standard broader checks appropriate for touched code:
make typecheck make test make lintIf time and environment allow, finish with:
Risks and mitigations
Risk: silently changing existing behavior
Risk: provider-specific max-token quirks
Risk: UI state drift / saving invalid drafts
Risk: accidentally clobbering other advisor settings on save
Acceptance criteria
Functional
maxOutputTokensintogenerateText(...)only when Limited.Non-functional
Handoff notes for the implementation team
Generated with
mux• Model:anthropic:claude-opus-4-6• Thinking:xhigh• Cost:$26.93