Skip to content

Improve Gastown cold-start to mayor-ready latency #3368

@jrf0110

Description

@jrf0110

Summary

The current cold path from container start to mayor-ready appears to spend avoidable time in orchestration before the mayor can accept a session/request. This issue captures low-hanging opportunities found while tracing TownDO.ensureMayor / sendMayorMessage, TownContainerDO.warmUp, container bootHydration, /agents/start, runAgent, and startAgent.

Current Critical Path

  1. TownDO decides mayor is needed via ensureMayor or sendMayorMessage.
  2. startAgentInContainer() mints/pushes container auth, resolves env/config, and POSTs /agents/start.
  3. Cold container boots, starts the control server, and kicks off bootHydration().
  4. /agents/start waits for awaitHydration() before calling runAgent().
  5. Mayor runAgent() creates the lightweight workspace, sets up browse worktrees for all rigs, writes AGENTS.md, builds env, and starts the SDK session.
  6. startAgent() hydrates kilo.db, starts/reuses kilo serve, resumes or creates the mayor session, then marks mayor ready.

Recommendations

1. Avoid live /refresh-token before cold /agents/start

startAgentInContainer() currently calls ensureContainerToken() before posting /agents/start. On a cold container, that can start the container and route through /refresh-token before the actual start request. For a fresh mayor start, the token is already included in the /agents/start request env.

Suggested approach:

  • Split token minting from live token refresh.
  • For new starts, mint the container token and pass it in /agents/start only.
  • Only call /refresh-token when the container is known warm or when refreshing an already-running mayor/session.

Expected benefit:

  • Avoids an extra container request and hydration wait before the real mayor start request.

2. Make boot hydration mayor-first and background non-mayor resume

/agents/start waits on awaitHydration(), while bootHydration() can currently fetch registry and resume all registered agents serially before mayor prewarm. A fresh user request can therefore queue behind unrelated agent restoration.

Suggested approach:

  • If the registry contains the mayor, resume mayor first.
  • If the registry does not contain mayor, prewarm mayor before resuming non-mayor agents.
  • Release the hydration gate once mayor/prewarm critical work is complete.
  • Continue non-mayor registry resume in the background.

Expected benefit:

  • Prioritizes user-visible readiness over background restoration work.

3. Defer mayor browse worktree setup

Mayor startup currently waits for browse worktree setup for all known rigs before writing AGENTS.md and starting the SDK server. This may involve clone/fetch/auth/network work across multiple repos.

Suggested approach:

  • Start mayor immediately with existing browse worktrees.
  • Kick off browse worktree setup/refresh in the background.
  • Rewrite/update AGENTS.md after browse worktrees are ready.
  • Surface to mayor that code browsing may still be warming if needed.

Expected benefit:

  • Mayor can become ready and respond before all repos are refreshed.

4. Remove or background git credential refresh from ensureMayor

The tRPC ensureMayor path currently does best-effort git credential refresh before calling townStub.ensureMayor(). Dispatch already resolves fresh GitHub tokens through resolveGitHubTokenString().

Suggested approach:

  • Move credential refresh out of the user-visible ensureMayor path.
  • Run it in the background or rely on dispatch-time resolution.

Expected benefit:

  • Avoids blocking mayor readiness on repo listing and credential refresh calls.

5. Add targeted phase timing telemetry

Existing telemetry covers container.agent_start_fetch, agent.startup_phase (db_hydrated, sdk_ready, session_created), mayor.ready, mayor.prewarm_complete, and health-observed readiness. More granularity would make production bottlenecks obvious.

Suggested additional timings:

  • startAgentInContainer.token_mint_ms
  • startAgentInContainer.refresh_token_ms
  • bootHydration.registry_fetch_ms
  • bootHydration.mayor_resume_ms
  • bootHydration.non_mayor_resume_ms
  • bootHydration.mayor_prewarm_ms
  • mayor.workspace_created_ms
  • mayor.browse_worktree_setup_ms
  • mayor.write_agents_md_ms
  • ensureMayor.git_credential_refresh_ms

Expected benefit:

  • Lets us validate which quick wins matter most and avoid optimizing the wrong phase.

Files To Inspect

  • services/gastown/src/dos/Town.do.ts
  • services/gastown/src/dos/TownContainer.do.ts
  • services/gastown/src/dos/town/container-dispatch.ts
  • services/gastown/container/src/main.ts
  • services/gastown/container/src/control-server.ts
  • services/gastown/container/src/process-manager.ts
  • services/gastown/container/src/agent-runner.ts
  • services/gastown/container/src/git-manager.ts
  • services/gastown/src/trpc/router.ts

Notes

The likely largest compound path is:

cold container -> pre-start /refresh-token -> bootHydration gate -> /agents/start waits -> mayor browse worktree setup -> DB snapshot hydration -> SDK server startup -> session list/create -> mayor ready

The lowest-risk first pass is probably:

  1. Avoid /refresh-token before cold /agents/start.
  2. Move ensureMayor git credential refresh off the request path.
  3. Add phase telemetry.
  4. Defer mayor browse worktree setup.
  5. Rework boot hydration to prioritize mayor before non-mayor registry resume.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions