🤖 feat: split Exec sub-agent AI defaults#3215
Conversation
Add a dedicated Settings row for Exec when it runs as a sub-agent, backed by subagentAiDefaults.exec. Keep UI Exec defaults separate and preserve sparse subagent payloads so unset fields inherit from UI Exec.\n\n---\n\n_Generated with • Model: • Thinking: • Cost: _\n\n<!-- mux-attribution: model=openai:gpt-5.5 thinking=high costs=3.81 -->
Document UI and subagent AI default resolution for agents.
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
|
/coder-agents-review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 471d524a6a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
First-pass review (Netero). This is a mechanical scan; the full review panel has not yet reviewed this PR.
The split design is clean: sparse subagentAiDefaults with per-field inheritance from agentAiDefaults.exec, lazy creation, and pruning on reset. The config migration is one-shot with a marker, and the resolution path in taskService is well-tested.
One P2 (code duplication across two files), one P3 (dead guard), one Note (stale story). The P2 is worth addressing before the panel reviews, since duplicated filter logic is the kind of thing that silently diverges on the next change.
"The two copies are functionally identical; the only difference is cosmetic type aliases." (Netero)
src/browser/features/Settings/Sections/TasksSection.stories.tsx:39
Note [DEREM-4] The play function calls canvas.findAllByText(/^Exec$/i) which now matches both the UI Exec row and the new Sub-agents Exec row, but asserts nothing about the new row's behavior (inherit labels, sparse save, model/thinking controls). The new behavior is covered by TasksSection.ui.test.tsx, but the story is stale relative to the feature. (Netero)
🤖
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
|
/coder-agents-review |
|
/coder-agents-review |
There was a problem hiding this comment.
Panel review (13 reviewers). All three R1 findings addressed cleanly.
The config shape, schema, migration marker, UI rendering, and test infrastructure for the exec subagent split are well-built. The sparse write/prune semantics, the per-field inheritance with live hints, and the extraction of shared helpers (DEREM-1 fix) are all solid work.
Two P1s: the feature does not take effect on either production delegation path. createTaskTool always passes MUX_MODEL_STRING as an explicit params.modelString, which short-circuits the ?? chain in resolveTaskAISettings before subagentAiDefaults.exec is consulted. Plan's auto-handoff to exec reads agentAiDefaults directly and was not updated to check subagentAiDefaults. A user who sets a cheaper subagent model will see no change in practice.
Additionally: the exec-split migration unconditionally triggers session-usage cache deletion (P3), the @deprecated tag on subagentAiDefaults is now false for the exec key (P3), the thinking hint display diverges from runtime enforcement (P3, flagged by 4 reviewers), and the test suite calls taskService.create without a model argument, masking the P1 since the task tool always supplies one (P3).
The PR description states "Resolution in taskService / aiService / messagePipeline walks the new override shape." Neither aiService nor messagePipeline references subagentAiDefaults; both receive already-resolved values. The description also does not disclose the precedence reversal (explicit task args now beat agentAiDefaults, reversing the prior order).
2 P1, 7 P3, 3 Nit, 2 Note.
"A user who sets subagentAiDefaults.exec = gpt-4o-mini while running an Orchestrator on claude-opus will find exec still using claude-opus." (Pariston)
src/node/services/taskService.ts:3685
P1 [DEREM-10] Plan auto-handoff to exec ignores subagentAiDefaults.exec. handleSuccessfulProposePlanAutoHandoff resolves the target model by reading latestCfg.agentAiDefaults?.[targetAgentId] directly at this line. It does not check subagentAiDefaults. The comment at line 3662 says "Handoff resolution follows the same precedence as Task.create," but after this PR the two paths diverge. Plan to Exec is the most user-visible delegation scenario; the feature has no effect there.
Could this path call resolveTaskAISettings (or an equivalent that checks subagentAiDefaults) and correct the comment? (Pariston, verified by orchestrator)
🤖
🤖 This review was automatically generated with Coder Agents.
There was a problem hiding this comment.
Three R2 findings addressed (DEREM-5 orphaned entries, DEREM-6 hardcoded comparison, DEREM-9 task tool forwarding MUX_MODEL_STRING). The DEREM-9 fix is the critical one: the task tool no longer forwards the parent model as an explicit arg, so subagentAiDefaults.exec is now reachable.
Further review is blocked until the remaining open findings receive a response (fix, acknowledgment, or contestation):
P1 DEREM-10: Plan auto-handoff to exec (handleSuccessfulProposePlanAutoHandoff) still reads agentAiDefaults directly, bypassing subagentAiDefaults. The task-tool fix does not cover this path.
P3 DEREM-11: Thinking hint shows sentinel "Inherit" and skips enforceThinkingPolicy; display diverges from runtime.
P3 DEREM-12: Exec-split migration unconditionally sets configModified = true, triggering session-usage cache wipe.
P3 DEREM-13: @deprecated on subagentAiDefaults is now false for the exec key; inline comments still say "legacy/downgrade compat."
P3 DEREM-14: No test for explicit args > subagentAiDefaults precedence.
P3 DEREM-15: No test for subagent thinkingLevel clamping as standalone source.
P3 DEREM-16: No test for thinking-only override through setSubagentThinking.
Nit DEREM-17: legacySubagentDefaultsForAgentFallback naming.
Nit DEREM-18: Comment mentions Exec but function is generic.
Note DEREM-7: Precedence reversal (now disclosed in PR description; the code change may be intentional but no explicit acknowledgment on the thread).
Note DEREM-8: Unnecessary export.
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
No progress on the 11 open findings since R3. The one new commit (30cf9bd50, chat-input flicker fix) does not intersect any open finding.
This is the second consecutive blocked round. The review cannot advance until the author responds to or addresses the open findings. The highest-priority items:
P1 DEREM-10: Plan auto-handoff to exec still bypasses subagentAiDefaults. This is the last remaining path where the feature has no effect.
P3 DEREM-11 through DEREM-16: thinking hint display/runtime divergence, migration cache wipe, @deprecated tag, and three test gaps.
If any of these are intentionally deferred or contested, a response on the thread (even a brief one) unblocks review.
🤖 This review was automatically generated with Coder Agents.
…data fallback tests
Decouple session usage cache invalidation from broad config modification during config load. Only model id normalization now triggers cache deletion while exec split cleanup still persists its migration marker. Also clarify subagent defaults docs around exec storage and rename the legacy fallback extractor.\n\n---\n\n_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `high` • Cost: `1820273{MUX_COSTS_USD:-unknown}`_\n\n<!-- mux-attribution: model=openai:gpt-5.5 thinking=high costs=unknown -->
|
@codex review Addressed remaining coder-agents-review findings DEREM-7, 8, 11, 12, 13, 14, 15, 16, 17, 18:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2aadd2322a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Test/Unit was failing because |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00435d1805
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed P1 thread on |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c797ba1315
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review Addressed P2 thread on ThinkingContext cycle handler: it now resolves model via the same |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1afb8572fa
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…gents Preserve non-AI agent defaults when saveConfig drops a mirrored legacy subagent AI entry.
|
@codex review Addressed P1 thread on |
|
@codex review The latest commit |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
/coder-agents-review
Round R4 was blocked due to no progress; since then the following findings have been addressed (DEREM-7, 8, 11, 12, 13, 14, 15, 16, 17, 18) and Codex re-approved at 16:51:06Z. No production code changes since approval; the two latest commits are test-isolation/CI stabilization for Re-requesting review on head |
There was a problem hiding this comment.
Excellent progress: 10 of 11 open findings addressed in this push, including the thinking hint clamp (DEREM-11), migration cache fix (DEREM-12), deprecated tag correction (DEREM-13), all three test gaps (DEREM-14, 15, 16), both naming/comment nits (DEREM-17, 18), the unexport (DEREM-8), and the precedence disclosure (DEREM-7).
One finding remains open and unacknowledged:
P1 DEREM-10: handleSuccessfulProposePlanAutoHandoff resolves the exec model by reading agentAiDefaults[targetAgentId] directly, bypassing subagentAiDefaults. Plan to Exec is the most user-visible delegation scenario. A user who configures a cheaper subagent model will not see it take effect on this path.
Further review is blocked on DEREM-10. A fix, deferral with ticket, or contestation on the thread unblocks.
🤖 This review was automatically generated with Coder Agents.
Use TaskService AI setting resolution for Plan to Exec handoff so exec subagent defaults override UI agent defaults while keeping the existing task model fallback.
|
/coder-agents-review
Addressed DEREM-10: Head: |
|
@codex review DEREM-10 fix: routed |
|
Codex Review: Didn't find any major issues. Chef's kiss. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Re-review panel (4 reviewers). All 23 prior findings verified closed.
DEREM-10 (P1, plan handoff) is fixed correctly: handleSuccessfulProposePlanAutoHandoff now routes through resolveTaskAISettings, so subagentAiDefaults.exec is consulted before agentAiDefaults.exec. The plan's model arrives as a low-priority parentRuntimeAiSettings hint, which is the right precedence. Pariston's differential diagnosis confirms the fix addresses every framing of the original problem.
The resolution chain (explicit args > subagentAiDefaults > agentAiDefaults > parentRuntime > parentWorkspace > defaultModel) is consistent across both task creation and plan handoff, documented in the PR description, and tested for each adjacent precedence pair.
One P3 (edge-case duplicate row), two Nits. None blocking.
"Extracting
resolveTaskAISettingsas a private method and routing bothTask.createandhandleSuccessfulProposePlanAutoHandoffthrough it was the right structural move." (Kite)
🤖 This review was automatically generated with Coder Agents.
_Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `high` • Cost: `3140340{MUX_COSTS_USD:-unset}`_
<!-- mux-attribution: model=openai:gpt-5.5 thinking=high costs=unset -->
|
/coder-agents-review
Closed out the three non-blocking nits from R6:
Head: |
There was a problem hiding this comment.
All three R6 findings addressed in b21564f. Zero open findings across 26 tracked items (2 P1, 1 P2, 7 P3, 7 Nit, 3 Note raised; 5 Nit dropped; all others fixed or accepted).
CI is pending but the diff is a single nit-fix commit on top of the previously approved state. No new concerns.
🤖 This review was automatically generated with Coder Agents.
The shape `{ model: string; thinkingLevel?: ThinkingLevel }` (used
internally to read per-agent AI settings off partial workspace metadata)
was duplicated 7 times across the parameter and return types of
`resolveWorkspaceAISettings`, `resolveTaskAISettings`, and
`resolveParentAutoResumeOptions`. The new sub-agent defaults split
(#3215) made this duplication especially visible because it added a
third method copying the same inline shape.
Introduce a private `ResolvedWorkspaceAiSettings` interface at module
scope and use it everywhere. Pure type-level cleanup — emitted JS,
runtime behavior, and the schema-derived `WorkspaceAISettings` type
(where `thinkingLevel` is required) are all unchanged.
🤖 _Generated with `mux` • Model: `anthropic:claude-opus-4-7` • Thinking: `xhigh`_
The shape `{ model: string; thinkingLevel?: ThinkingLevel }` (used
internally to read per-agent AI settings off partial workspace metadata)
was duplicated 7 times across the parameter and return types of
`resolveWorkspaceAISettings`, `resolveTaskAISettings`, and
`resolveParentAutoResumeOptions`. The new sub-agent defaults split
(#3215) made this duplication especially visible because it added a
third method copying the same inline shape.
Introduce a private `ResolvedWorkspaceAiSettings` interface at module
scope and use it everywhere. Pure type-level cleanup — emitted JS,
runtime behavior, and the schema-derived `WorkspaceAISettings` type
(where `thinkingLevel` is required) are all unchanged.
🤖 _Generated with `mux` • Model: `anthropic:claude-opus-4-7` • Thinking: `xhigh`_
The shape `{ model: string; thinkingLevel?: ThinkingLevel }` (used
internally to read per-agent AI settings off partial workspace metadata)
was duplicated 7 times across the parameter and return types of
`resolveWorkspaceAISettings`, `resolveTaskAISettings`, and
`resolveParentAutoResumeOptions`. The new sub-agent defaults split
(#3215) made this duplication especially visible because it added a
third method copying the same inline shape.
Introduce a private `ResolvedWorkspaceAiSettings` interface at module
scope and use it everywhere. Pure type-level cleanup — emitted JS,
runtime behavior, and the schema-derived `WorkspaceAISettings` type
(where `thinkingLevel` is required) are all unchanged.
🤖 _Generated with `mux` • Model: `anthropic:claude-opus-4-7` • Thinking: `xhigh`_
Summary
Adds a separate "Exec" sub-agent slot in
Settings > Tasks > Agent Defaultsso the model and reasoning level used when Exec runs as a sub-agent (delegated by Plan, Orchestrator, or others) can diverge from the model used when Exec is selected interactively in the UI. Sub-agent fields are stored sparsely undersubagentAiDefaults.execand inherit unset fields from the UI Exec entry.Also fixes a critical bug in the existing sub-agent dispatch path that made the new override (and
agentAiDefaults.exec) ineffective: thetasktool was forwarding the parent agent'sMUX_MODEL_STRINGandMUX_THINKING_LEVELenv vars intotaskService.create, which took precedence over the configured defaults inresolveTaskAISettings.Background
Until now,
agentAiDefaults.execcontrolled both the interactive UI Exec agent and Exec invoked as a sub-agent. Users (and orchestrators) often want a cheaper or faster model for delegated Exec runs while keeping a stronger model for the interactive Exec they drive themselves. There was no way to express that without flipping defaults back and forth, and the UI offered no obvious place to configure the delegated case.While dogfooding the new override, the user observed that setting "Exec sub-agent" to GPT-5.5 / high in Settings had no effect: spawned Exec sub-agents kept running with the parent's model (GPT-5.4 / xhigh). Tracing through
tasktool totaskService.createtoresolveTaskAISettingsrevealed that the task tool was readingMUX_MODEL_STRING/MUX_THINKING_LEVELfrom the parent's runtime env and passing them as the highest-precedence override, clobbering the configured sub-agent defaults.Implementation
subagentAiDefaults: { exec?: { modelString?, thinkingLevel? } }validated by the config schema. The entry is created lazily on first override and pruned whenever the last field is reset, so unconfigured installs keep a clean config file.execSubagentDefaultsSplitrecords that the split has been applied; existingagentAiDefaults.execkeeps its meaning for the UI Exec slot, no values are duplicated into the new shape on upgrade.taskService.resolveTaskAISettingswalkssubagentAiDefaults.execfirst, falls back toagentAiDefaults.exec, then to the parent workspace's running model. With the task-tool fix below, that ordering is now actually reachable.Sub-agents(the parent section provides the disambiguating context) withInherit from UI Execreset affordances and live "Inherits from UI Exec: ..." hints. The row reusesAiDefaultsControlsso the model selector and reasoning select stay consistent with other agent rows.AGENT_DEFAULT_IDS_EXCLUDED_FROM_LEGACY_SUBAGENTS,shouldMirrorAgentDefaultToLegacySubagent,deriveLegacySubagentAiDefaultsFromAgentDefaults) intosrc/common/types/tasks.tssonode/config.tsandnode/orpc/router.tsno longer keep duplicate copies.normalizeSubagentAiDefaults(and its twin innormalizeAgentAiDefaults).tasktool no longer readsMUX_MODEL_STRING/MUX_THINKING_LEVELfrom the parent runtime env and no longer forwards them totaskService.create. The resolution chain inresolveTaskAISettingsalready falls back toparentAiSettings.modelas the lowest-priority default, so behavior with no configured defaults is unchanged. Added a regression test assertingtaskService.createreceivesmodelString: undefinedandthinkingLevel: undefinedeven when those env vars are present.handleSuccessfulProposePlanAutoHandoff) now routes through the sameresolveTaskAISettingshelper instead of readingagentAiDefaults[targetAgentId]directly, so a configuredsubagentAiDefaults.execactually takes effect when Plan delegates to Exec viapropose_plan.Validation
MUX_ROOT, covering: row presence, inheritance display, sparse persistence (model-only and reasoning-only overrides), entry pruning when the last field resets, inheritance text following live UI Exec changes, persistence across full reload, and full independence between the two slots after reload. All 10 cases plus the bonus independence case passed; on-disk JSON was inspected after each step to confirm sparse shape and pruning.make typecheckandmake lintpass.bun testfor:TasksSection.ui.test.tsx,src/node/config.test.ts,src/common/types/tasks.test.ts,src/node/services/taskService.test.ts, andsrc/node/services/tools/task.test.ts(including the new regression test for the env-var fix).Risks
config.test.ts, sparse-write semantics (no mass rewrite of existing configs), and a one-shot migration marker so we never re-run the split.agentAiDefaults.execfrom acting as a real default for sub-agents. The lowest-priority fallback inresolveTaskAISettingsstill inherits from the parent workspace when no defaults are configured.taskService.resolveTaskAISettingsis: explicit task arguments first, thensubagentAiDefaults, thenagentAiDefaults, then aparentRuntimeAiSettingshint (the parent agent's live runtime model/thinking, forwarded by thetasktool fromMUX_MODEL_STRING/MUX_THINKING_LEVELas a low-priority fallback only), then persisted parent workspace settings, then global defaults. This keeps per-task overrides effective and configured defaults effective, while still letting unconfigured delegated runs inherit the parent's live model. Covered by new precedence, runtime-hint, and policy-clamp tests intaskService.test.tsandtools/task.test.ts.taskService.test.tscases that exercisesubagentAiDefaultsoverrides,agentAiDefaultsfallbacks, and partial overrides combining both."Exec defaults"to stay unambiguous.Generated with
mux• Model:anthropic:claude-opus-4-7• Thinking:xhigh• Cost:$10.05