Summary
Explore adding per-task-type model selection to gastown — letting users configure different models for different kinds of agent work (e.g. small/cheap model for gt:pr-fixup beads, premium model for fresh feature beads, Opus for mayor planning, etc.).
The architecture is already shaped favourably for this: there's exactly one function (resolveModel at services/gastown/src/dos/town/config.ts:178) that all four model-resolution call sites route through, and the kilo SDK's KILO_CONFIG_CONTENT already supports per-task routing via agent.<slug>.model slots that gastown currently flattens to a single primary model.
This issue is exploratory. Not committing to building it, not committing to a timeline. Documenting the design space so we can decide whether/when to invest.
Why this might be worth doing
Today every polecat dispatch resolves to the same default_model regardless of what work it's doing. A polecat fixing a typo, rebasing a conflict, or addressing 3 review comments runs on the same model as a polecat designing and implementing OAuth from scratch. The cost asymmetry is real:
gt:pr-fixup — polecat addresses specific review comments on an existing PR branch. Mostly mechanical edits with bounded scope. Sonnet is overkill.
gt:pr-feedback — same shape; polecat addresses CI failures + reviewer comments.
gt:pr-conflict — rebase work. Mostly syntactic; only escalates when there's semantic conflict.
gt:rework — polecat continues on the same branch addressing refinery feedback. Smaller scope than fresh issue beads.
gt:triage — single-pass judgment from a triage prompt. Currently runs as role: polecat with the same model resolution as full coding sessions.
PR fixups are a non-trivial fraction of polecat dispatches in any active rig and they're cleanly observable at dispatch time. That's the strongest single argument for at least wiring role × label into resolveModel.
The differentiable axes (from the codebase, not speculation)
The signals that are structurally observable to the worker today — the only place model selection can happen given the current architecture:
| Axis |
Values |
Where set |
Currently affects model? |
| Agent role |
mayor, polecat, refinery, triage |
services/gastown/container/src/types.ts:5 |
Yes — townConfig.role_models.{mayor,polecat,refinery} already exists. Triage has no slot (runs as polecat). |
Bead label (gt:*) |
gt:rework, gt:pr-fixup, gt:pr-feedback, gt:pr-conflict, gt:triage, gt:held, gt:escalation, gt:merge-request, gt:convoy, gt:molecule, gt:message |
Sling-time + lifecycle events |
No — labels change prompts, branch checkout, review queue routing, but never model selection |
| Bead type |
issue, merge_request, escalation, message, convoy, molecule, agent |
Sling |
Implicitly via role (refinery picks up merge_request); no direct model coupling |
| Rig |
per-rig rigOverride |
services/gastown/src/dos/town/rigs.ts |
Yes for polecat and refinery only; mayor explicitly ignores rig overrides |
Convoy merge_mode |
review-then-land, review-and-merge |
Convoy creation |
No model effect (only branch-target effect) |
Bead metadata model |
arbitrary string |
trpc/router.ts:849 |
Stored but never read — the tRPC sling mutation accepts a model field that nothing consumes. Latent feature. |
| Task phase within a session |
exploration / planning / coding / writing PR description |
implicit inside the LLM session |
Not observable to the worker. Phase boundaries don't exist; one bead = one continuous kilo serve session. |
The most leveraged differentiator that's currently unwired is role × label. Clean cost-asymmetry, fully observable at dispatch time.
The hook point
Exactly one function to modify: resolveModel at services/gastown/src/dos/town/config.ts:178.
export function resolveModel(
townConfig: TownConfig,
rigOverride: RigOverrideConfig | null | undefined,
role: string,
// new:
taskKind?: { labels?: string[]; type?: BeadType }
): string
Every read site (4 callers) routes through it:
container-dispatch.ts:491 — dispatch payload model field on POST /agents/start
Town.do.ts:2727 — getMayorPrewarmContext (must agree byte-identically with dispatch)
config.ts:334 — buildContainerConfig X-Town-Config header default
router.ts:1244-1245 — mayor model change detection in updateTownConfig
Threading taskKind from dispatch site:
dispatchAgent in services/gastown/src/dos/town/scheduling.ts:64-153 — already has the bead in hand; labels are right there
startAgentInContainer params struct in container-dispatch.ts:346-382 — needs a taskKind field
KILO_CONFIG_CONTENT already supports per-task routing. services/gastown/container/src/agent-runner.ts:39-86 builds an SDK config with agent.code.model, agent.plan.model, agent.title.model, agent.explore.model slots. Today gastown flattens them all to one primary model. If we wanted SDK-side per-task routing later, the wire format can carry it. For dispatch-time-only routing (the realistic v1), the worker resolves once and ships a single string.
The realistic gap: phase-level routing isn't free
Phase-level selection ("small model for the planning sub-step within a polecat session, big model for implementation") is architecturally precluded without redesign.
Per services/gastown/container/src/process-manager.ts:2010-2090, model changes require an SDK server restart. If taskKind resolves to different models within one bead's session, that's a redesign, not a hook addition. Per-bead-at-dispatch-time selection is straightforward; per-tool-call selection is not.
The honest answer: phase-level routing only works if the kilo SDK does it internally — i.e. we ship agent.title.model = small, the SDK invokes the title agent for that sub-task, and we accept whatever heuristic the SDK uses to decide what a "title" task is. We can't drive it from gastown's config language without a restart-per-phase.
Recommendation: don't expose phase-level config as a user knob in v1. Thread labels[] to resolveModel and let users configure per-(role × label). The phase axis is an SDK-side concern and should stay there.
Configuration UX — the interesting tradeoff
Four config shapes considered:
|
Flat overrides |
Hierarchical |
Rule-based selectors |
Profile + overrides |
| Learnability |
High |
Medium |
Low |
Highest |
| Power |
Low |
Medium |
High |
Medium-High |
| Failure mode UX |
OK |
OK |
Bad (silent rule errors) |
Best (centrally maintained profiles update over time) |
| Migration from current |
Easy |
Awkward |
Easy |
Cleanest |
| Surfaceability |
Easy |
Medium |
Strong |
Strong |
Recommended: profile + overrides
─── Models ─────────────────────────────────────────────
Profile: [ Balanced ▾ ] ⓘ what's in this profile?
Balanced expands to:
Mayor: claude-opus-4.7
Polecat (default): claude-sonnet-4.6
Polecat (pr-fixup): claude-haiku-4.5
Refinery: claude-sonnet-4.6
Triage: claude-haiku-4.5
▸ Advanced overrides (3 set)
Mayor [ claude-opus-4.7 ▾ ]
Polecat (default) [ Use profile ▾ ]
Polecat — label: gt:pr-fixup [ claude-haiku-4.5 ▾ ]
+ Add override…
Resolution preview: [pick a bead ▾] → claude-haiku-4.5
because: override "Polecat — label: gt:pr-fixup"
Wins:
- 90% of users pick a profile and never touch it. Gastown ships sensible defaults that update centrally as models evolve. No "we shipped Sonnet 4.6 in our config and now Sonnet 5 is out and our rule is silently outdated" failure mode.
- The "Add override" dialog is a constrained rule-builder: pick a role (required), optionally add label or rig from known values. No free-text rule language. Discoverable.
- The resolution preview ("pick a bead → see which model and why") is the single most important UI element — turns config from declarative-and-opaque into testable.
What it absorbs cleanly:
- Existing
default_model / small_model users → custom profile capturing current settings.
- Mayor's "rig override is ignored" quirk stays intact — profile + overrides explicitly scopes which axes apply per role.
- The latent
bead.metadata.model tRPC field can be wired as the highest-precedence override (per-bead override), or left as-is.
Gotchas that constrain the redesign
-
Prewarm and dispatch must agree on the resolved model byte-identically. getMayorPrewarmContext (Town.do.ts:2705-2732) and _ensureMayor (Town.do.ts:2743-2871) both call resolveModel and the prewarmed SDK gets evicted if they disagree. If taskKind enters mayor resolution, prewarm has to know what task is coming. Today there's no "what task will the next mayor message handle" signal. Recommendation: don't add taskKind to mayor resolution. Mayor stays role-only.
-
Mayor explicitly ignores rig overrides (config.ts:184-185). The schema even forbids role_models.mayor at the rig level. Any new design must preserve this — mayor is town-level only.
-
Mayor model hot-reload (router.ts:1244-1259) compares resolveModel(old, null, 'mayor') against resolveModel(new, null, 'mayor') to decide whether to restart. Per gotcha 1, mayor stays simple, so this stays simple.
-
buildContainerConfig calls resolveModel(config, null, '') with empty role (config.ts:334) for the X-Town-Config header default. Needs explicit handling — probably "ignore taskKind, return town default" since this is a dispatch-agnostic fallback.
-
No fallback / degraded-mode logic exists at the gastown layer. Model fallback is the AI gateway's concern. Per-task selection composes cleanly with gateway-level fallback: gastown picks "what to ask for", gateway picks "what to actually serve". Don't try to put fallback into gastown config.
-
Triage is structurally underdetermined. The AgentRole enum includes 'triage' but no codepath calls registerAgent({ role: 'triage' }). Triage work runs as role: 'polecat' with the triage system prompt overlaid. If we want triage to use a different model, the cleanest hook is via the gt:triage label, not via making triage a real role.
The honest argument against doing this
Most users won't touch it, and the ones who do will get it wrong. Anthropic-class model selection is a moving target — Sonnet-of-today is Haiku-of-next-year in capability terms. Users who lock in polecat:gt:pr-fixup → haiku-4.5 today will silently keep paying Haiku-4.5 quality for fixups in 2027 when the right answer changes.
The profile mechanism is the answer to this — but only if profiles are good and users actually use them. If profiles are mediocre and everyone immediately drops to overrides, the feature becomes a knob factory.
Counter-argument: the cost wins from polecat:gt:pr-fixup → small_model alone are likely meaningful. Look at any active rig — PR fixup beads are a non-trivial fraction of polecat dispatches, they're mechanically simpler than fresh feature work, and they're cleanly observable at dispatch time. That's the strongest single argument for shipping at least the role × label hook.
Recommended scoping (if pursued)
Phase 1 — wire what's free. Thread labels[] to resolveModel, add a label-keyed slot to townConfig.role_models (e.g. role_models.polecat_pr_fixup or a nested role_models.polecat.labels[label] → model map). Settings UI gets a few new dropdowns. No profile system, no rule builder. Ships in a week.
Phase 2 — profile system. Define Frugal, Balanced, Premium profiles centrally. Town config stores profile: <name> plus existing override map. Profile defaults update server-side as models evolve.
Phase 3 — resolution preview UI. Pick-a-bead dropdown that shows resolved model + matched rule. Necessary for trust.
Phase 4 (maybe): Wire the latent bead.metadata.model field as the highest-precedence override. Per-bead model selection from the sling form.
Phases 5+ deferred: Per-tool-call routing (SDK-side concern), cost budgeting, A/B testing, user-level preferences.
Phase 1 alone delivers most of the cost wins. Phases 2-3 are about making the system trustworthy enough that users will actually use it. Phase 4 is "nice to have." Don't skip ahead.
Out of scope for this issue
- Specific implementation tickets — this is design-space exploration, not a build plan.
- Cost budgeting / spend caps — separate problem, separate UI.
- Provider routing (
claude-sonnet-4.6 via Anthropic vs via Bedrock) — gateway concern, not user config.
- Per-end-user overrides — we configure per-town, not per-human.
- Model A/B testing or shadow runs — premature.
- Phase-level model selection as a user knob — SDK-side concern.
Decision needed
Whether/when to invest, and in what shape:
- Build Phase 1 only (role × label hook, minimal UI) — ships fastest, captures most cost wins.
- Build Phases 1–3 (profiles + preview) — ships a real config UX, takes weeks.
- Build Phases 1–4 (everything except SDK-side routing) — ships everything that's cleanly worker-resolvable.
- Defer entirely — architecture stays favourable; revisit when cycles permit.
No urgency. The hook point is small enough that this is bounded engineering work whenever it gets prioritized.
References
resolveModel: services/gastown/src/dos/town/config.ts:178
KILO_CONFIG_CONTENT builder (per-task slots already exist): services/gastown/container/src/agent-runner.ts:39-86
- Prewarm/dispatch agreement:
services/gastown/src/dos/Town.do.ts:2705-2732 and Town.do.ts:2743-2871
- Mayor model hot-reload:
services/gastown/src/trpc/router.ts:1244-1259
- Latent per-bead model field:
services/gastown/src/trpc/router.ts:849 (sling mutation accepts model, never read)
- Agent roles enum:
services/gastown/container/src/types.ts:5
- Bead label semantics:
services/gastown/src/dos/town/agents.ts:478-567 (label-driven prime context building)
- Town settings UI:
apps/web/src/app/(app)/gastown/[townId]/settings/TownSettingsPageClient.tsx:306-365
- Rig settings UI:
apps/web/src/app/(app)/gastown/[townId]/rigs/[rigId]/settings/RigSettingsPageClient.tsx:136-183
Summary
Explore adding per-task-type model selection to gastown — letting users configure different models for different kinds of agent work (e.g. small/cheap model for
gt:pr-fixupbeads, premium model for fresh feature beads, Opus for mayor planning, etc.).The architecture is already shaped favourably for this: there's exactly one function (
resolveModelatservices/gastown/src/dos/town/config.ts:178) that all four model-resolution call sites route through, and the kilo SDK'sKILO_CONFIG_CONTENTalready supports per-task routing viaagent.<slug>.modelslots that gastown currently flattens to a single primary model.This issue is exploratory. Not committing to building it, not committing to a timeline. Documenting the design space so we can decide whether/when to invest.
Why this might be worth doing
Today every polecat dispatch resolves to the same
default_modelregardless of what work it's doing. A polecat fixing a typo, rebasing a conflict, or addressing 3 review comments runs on the same model as a polecat designing and implementing OAuth from scratch. The cost asymmetry is real:gt:pr-fixup— polecat addresses specific review comments on an existing PR branch. Mostly mechanical edits with bounded scope. Sonnet is overkill.gt:pr-feedback— same shape; polecat addresses CI failures + reviewer comments.gt:pr-conflict— rebase work. Mostly syntactic; only escalates when there's semantic conflict.gt:rework— polecat continues on the same branch addressing refinery feedback. Smaller scope than freshissuebeads.gt:triage— single-pass judgment from a triage prompt. Currently runs asrole: polecatwith the same model resolution as full coding sessions.PR fixups are a non-trivial fraction of polecat dispatches in any active rig and they're cleanly observable at dispatch time. That's the strongest single argument for at least wiring
role × labelintoresolveModel.The differentiable axes (from the codebase, not speculation)
The signals that are structurally observable to the worker today — the only place model selection can happen given the current architecture:
mayor,polecat,refinery,triageservices/gastown/container/src/types.ts:5townConfig.role_models.{mayor,polecat,refinery}already exists. Triage has no slot (runs as polecat).gt:*)gt:rework,gt:pr-fixup,gt:pr-feedback,gt:pr-conflict,gt:triage,gt:held,gt:escalation,gt:merge-request,gt:convoy,gt:molecule,gt:messageissue,merge_request,escalation,message,convoy,molecule,agentmerge_request); no direct model couplingrigOverrideservices/gastown/src/dos/town/rigs.tspolecatandrefineryonly; mayor explicitly ignores rig overridesmerge_modereview-then-land,review-and-mergemodeltrpc/router.ts:849slingmutation accepts a model field that nothing consumes. Latent feature.kilo servesession.The most leveraged differentiator that's currently unwired is
role × label. Clean cost-asymmetry, fully observable at dispatch time.The hook point
Exactly one function to modify:
resolveModelatservices/gastown/src/dos/town/config.ts:178.Every read site (4 callers) routes through it:
container-dispatch.ts:491— dispatch payloadmodelfield onPOST /agents/startTown.do.ts:2727—getMayorPrewarmContext(must agree byte-identically with dispatch)config.ts:334—buildContainerConfigX-Town-Config header defaultrouter.ts:1244-1245— mayor model change detection inupdateTownConfigThreading
taskKindfrom dispatch site:dispatchAgentinservices/gastown/src/dos/town/scheduling.ts:64-153— already has the bead in hand; labels are right therestartAgentInContainerparams struct incontainer-dispatch.ts:346-382— needs ataskKindfieldKILO_CONFIG_CONTENTalready supports per-task routing.services/gastown/container/src/agent-runner.ts:39-86builds an SDK config withagent.code.model,agent.plan.model,agent.title.model,agent.explore.modelslots. Today gastown flattens them all to one primary model. If we wanted SDK-side per-task routing later, the wire format can carry it. For dispatch-time-only routing (the realistic v1), the worker resolves once and ships a single string.The realistic gap: phase-level routing isn't free
Phase-level selection ("small model for the planning sub-step within a polecat session, big model for implementation") is architecturally precluded without redesign.
Per
services/gastown/container/src/process-manager.ts:2010-2090, model changes require an SDK server restart. IftaskKindresolves to different models within one bead's session, that's a redesign, not a hook addition. Per-bead-at-dispatch-time selection is straightforward; per-tool-call selection is not.The honest answer: phase-level routing only works if the kilo SDK does it internally — i.e. we ship
agent.title.model = small, the SDK invokes the title agent for that sub-task, and we accept whatever heuristic the SDK uses to decide what a "title" task is. We can't drive it from gastown's config language without a restart-per-phase.Recommendation: don't expose phase-level config as a user knob in v1. Thread
labels[]toresolveModeland let users configure per-(role × label). The phase axis is an SDK-side concern and should stay there.Configuration UX — the interesting tradeoff
Four config shapes considered:
Recommended: profile + overrides
Wins:
What it absorbs cleanly:
default_model/small_modelusers → custom profile capturing current settings.bead.metadata.modeltRPC field can be wired as the highest-precedence override (per-bead override), or left as-is.Gotchas that constrain the redesign
Prewarm and dispatch must agree on the resolved model byte-identically.
getMayorPrewarmContext(Town.do.ts:2705-2732) and_ensureMayor(Town.do.ts:2743-2871) both callresolveModeland the prewarmed SDK gets evicted if they disagree. IftaskKindenters mayor resolution, prewarm has to know what task is coming. Today there's no "what task will the next mayor message handle" signal. Recommendation: don't addtaskKindto mayor resolution. Mayor stays role-only.Mayor explicitly ignores rig overrides (
config.ts:184-185). The schema even forbidsrole_models.mayorat the rig level. Any new design must preserve this — mayor is town-level only.Mayor model hot-reload (
router.ts:1244-1259) comparesresolveModel(old, null, 'mayor')againstresolveModel(new, null, 'mayor')to decide whether to restart. Per gotcha 1, mayor stays simple, so this stays simple.buildContainerConfigcallsresolveModel(config, null, '')with empty role (config.ts:334) for the X-Town-Config header default. Needs explicit handling — probably "ignore taskKind, return town default" since this is a dispatch-agnostic fallback.No fallback / degraded-mode logic exists at the gastown layer. Model fallback is the AI gateway's concern. Per-task selection composes cleanly with gateway-level fallback: gastown picks "what to ask for", gateway picks "what to actually serve". Don't try to put fallback into gastown config.
Triage is structurally underdetermined. The
AgentRoleenum includes'triage'but no codepath callsregisterAgent({ role: 'triage' }). Triage work runs asrole: 'polecat'with the triage system prompt overlaid. If we want triage to use a different model, the cleanest hook is via thegt:triagelabel, not via making triage a real role.The honest argument against doing this
Most users won't touch it, and the ones who do will get it wrong. Anthropic-class model selection is a moving target — Sonnet-of-today is Haiku-of-next-year in capability terms. Users who lock in
polecat:gt:pr-fixup → haiku-4.5today will silently keep paying Haiku-4.5 quality for fixups in 2027 when the right answer changes.The profile mechanism is the answer to this — but only if profiles are good and users actually use them. If profiles are mediocre and everyone immediately drops to overrides, the feature becomes a knob factory.
Counter-argument: the cost wins from
polecat:gt:pr-fixup → small_modelalone are likely meaningful. Look at any active rig — PR fixup beads are a non-trivial fraction of polecat dispatches, they're mechanically simpler than fresh feature work, and they're cleanly observable at dispatch time. That's the strongest single argument for shipping at least the role × label hook.Recommended scoping (if pursued)
Phase 1 — wire what's free. Thread
labels[]toresolveModel, add a label-keyed slot totownConfig.role_models(e.g.role_models.polecat_pr_fixupor a nestedrole_models.polecat.labels[label] → modelmap). Settings UI gets a few new dropdowns. No profile system, no rule builder. Ships in a week.Phase 2 — profile system. Define
Frugal,Balanced,Premiumprofiles centrally. Town config storesprofile: <name>plus existing override map. Profile defaults update server-side as models evolve.Phase 3 — resolution preview UI. Pick-a-bead dropdown that shows resolved model + matched rule. Necessary for trust.
Phase 4 (maybe): Wire the latent
bead.metadata.modelfield as the highest-precedence override. Per-bead model selection from the sling form.Phases 5+ deferred: Per-tool-call routing (SDK-side concern), cost budgeting, A/B testing, user-level preferences.
Phase 1 alone delivers most of the cost wins. Phases 2-3 are about making the system trustworthy enough that users will actually use it. Phase 4 is "nice to have." Don't skip ahead.
Out of scope for this issue
claude-sonnet-4.6 via Anthropicvsvia Bedrock) — gateway concern, not user config.Decision needed
Whether/when to invest, and in what shape:
No urgency. The hook point is small enough that this is bounded engineering work whenever it gets prioritized.
References
resolveModel:services/gastown/src/dos/town/config.ts:178KILO_CONFIG_CONTENTbuilder (per-task slots already exist):services/gastown/container/src/agent-runner.ts:39-86services/gastown/src/dos/Town.do.ts:2705-2732andTown.do.ts:2743-2871services/gastown/src/trpc/router.ts:1244-1259services/gastown/src/trpc/router.ts:849(sling mutation acceptsmodel, never read)services/gastown/container/src/types.ts:5services/gastown/src/dos/town/agents.ts:478-567(label-drivenprimecontext building)apps/web/src/app/(app)/gastown/[townId]/settings/TownSettingsPageClient.tsx:306-365apps/web/src/app/(app)/gastown/[townId]/rigs/[rigId]/settings/RigSettingsPageClient.tsx:136-183