Skip to content

refactor(runtime): improve testability and simplify package structure#2554

Merged
dgageot merged 12 commits intodocker:mainfrom
dgageot:board/improving-runtime-package-testability-0681baaf
Apr 28, 2026
Merged

refactor(runtime): improve testability and simplify package structure#2554
dgageot merged 12 commits intodocker:mainfrom
dgageot:board/improving-runtime-package-testability-0681baaf

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented Apr 27, 2026

Improve the testability of pkg/runtime through targeted dependency injection, extraction of inline subdomains into focused types, and a final pass of cleanup/reorg. No public API changes; no observable behaviour changes.

Coverage of pkg/runtime rises from 53.6% → 56.5%; the package gains 17 new files (mostly tests) and runtime.go shrinks from 1081 to 963 lines.

Why

Several patterns in the runtime forced tests to either skip whole code paths or do invasive things to reach them:

  • time.Now() was hardcoded in 8 places (timestamps, fallback cooldowns, tool-call latency); tests had to scrub timestamps with reflection or sleep on real wall-clock.
  • NewLocalRuntime did disk I/O via modelsdev.NewStore and could fail at construction on hosts where os.UserHomeDir is unhappy.
  • Telemetry was emitted via package-level helpers, so tests had no way to assert what was recorded.
  • Magic numbers (128 event-channel buffer, maxOverflowCompactions = 1, 5*time.Second) lived inline; tests couldn't drive both branches of the overflow-recovery path without 1+ real overflows.
  • The RunStream goroutine was a 250-line anonymous closure mixing iteration limits, fallbacks, hooks, compaction, tool dispatch, steer/follow-up draining — none of it independently testable.
  • A handful of subdomains (fallback cooldowns, current-agent routing, the elicitation channel + mutex) were raw fields on LocalRuntime with concurrency contracts that could only be reasoned about by reading the surrounding methods.

model_switcher_test.go was the smoking gun: 5 tests bypassed NewLocalRuntime entirely with &LocalRuntime{...} struct literals because the constructor was too coupled.

What

12 focused commits, each independently reviewable:

# Commit Effect
1 test(runtime): cover resume helpers, ElicitationError, and tool loop detector reset +tests for previously 0%-covered helpers
2 refactor(runtime): inject clock via WithClock for deterministic tests All time.Now()r.now(); new WithClock Opt; cooldown-eviction now testable without sleeping
3 refactor(runtime): defer modelsdev.NewStore until first use New lazyModelStore; constructor no longer hits disk
4 refactor(runtime): introduce Telemetry interface for testable observability New Telemetry interface + WithTelemetry Opt; default forwards to pkg/telemetry
5 refactor(runtime): name magic numbers and let tests override the overflow cap defaults.go constants + WithMaxOverflowCompactions Opt
6 refactor(runtime): extract elicitationBridge to encapsulate the swap+send race contract Concurrency contract in 50 LOC, race-detector test
7 refactor(runtime): extract enforceMaxIterations and handleStreamError from RunStream -81 lines from RunStream goroutine, +9 unit tests for previously-unreachable branches
8 refactor(runtime): extract cooldownManager from fallback.go Eviction-on-read contract testable in isolation
9 refactor(runtime): extract agentRouter 5 methods → 1 type with race-detector test
10 refactor(runtime): drop pass-through wrappers and constructor double-build -42 net lines: deletes 4 one-line wrappers, builds cooldown manager once instead of twice, deduplicates the max-iterations stop message
11 refactor(runtime): hoist RunStream goroutine body into runStreamLoop method Anonymous go func(){...} → named method, real symbol in stack traces
12 refactor(runtime): regroup elicitation and resume code into focused files New resume.go + elicitation.go; runtime.go drops to 963 lines

New testability seams

Opt / type Use
WithClock(func() time.Time) Deterministic timestamps + cooldown windows
WithTelemetry(Telemetry) Recordable observability events
WithMaxOverflowCompactions(n) Drive both retry and exhausted branches
lazyModelStore Deferred catalog loading
elicitationBridge, cooldownManager, agentRouter Race-tested concurrency primitives
enforceMaxIterations, handleStreamError RunStream branches reachable without end-to-end mocks

All concurrency primitives have race-detector tests that drive the contract directly (e.g. TestElicitationBridge_SwapWaitsForInflightSenders exercises the swap-vs-inflight-send interleave that the previous raw mutex+field carried implicitly).

Validation

  • go build ./...
  • go test ./pkg/runtime/... -race -count=1
  • mise lint ✅ (golangci-lint, custom lint, go mod tidy)
  • mise test ✅ (full module, exit 0)
  • Coverage of pkg/runtime: 53.6% → 56.5%

Notes for reviewers

  • No public API changes. go build ./... confirms no callers outside pkg/runtime were affected.
  • One pre-existing bug in pkg/telemetry/context.go (argument order swap in the wrapper, cancelled by an inverse swap at the runtime callsite) was discovered during commit 4 and explicitly preserved to keep this PR's scope tight. Worth a separate PR.
  • event.go's time.Now() for AgentContext.Timestamp is intentionally untouched — wire-format only, scrubbed by the existing clearTimestamps test helper.
  • The 6 simplification opportunities found after the testability work (commits 10–12) are stacked on top, so the testability changes can be cherry-picked alone if preferred.

Assisted-By: docker-agent

@dgageot dgageot requested a review from a team as a code owner April 27, 2026 16:52
dgageot added 12 commits April 27, 2026 18:54
…detector reset

These trivial helpers had 0% coverage, leaving the constructor shape of every ResumeRequest variant untested. Add table-driven tests for IsValidResumeType / ValidResumeTypes (round-trip), each ResumeApprove* constructor, ElicitationError.Error formatting, and toolLoopDetector.reset (state cleared, counter restarts at 1).

Assisted-By: docker-agent
All time-dependent operations inside the runtime (assistant message CreatedAt, tool result CreatedAt, tool-call latency in runAgentTool, fallback cooldown windows, compaction prompt timestamps) now go through r.now() instead of time.Now(). The new WithClock(func() time.Time) Opt lets tests pin time deterministically and verify cooldown expiry without sleeping.

Event.go's AgentContext.Timestamp is intentionally left on time.Now: it is wire-format only and the existing clearTimestamps helper already scrubs it.

Includes a clock_test.go that verifies the seam: WithClock(nil) keeps the default, and the fallback-cooldown evict path is now testable without real wall-clock advancement.

Assisted-By: docker-agent
Move the default ModelStore behind a lazyModelStore wrapper that calls modelsdev.NewStore only when GetModel or GetDatabase are first invoked. NewLocalRuntime no longer performs disk I/O (os.UserHomeDir + os.MkdirAll for ~/.cagent) at construction time and no longer fails when the home-dir lookup fails.

Tests that don't touch the catalog can now construct a runtime cheaply, and tests that do can still inject a stub via WithModelStore — that path bypasses the lazy default entirely.

Assisted-By: docker-agent
…bility

Define a small Telemetry interface in pkg/runtime that captures the five observability events the runtime emits (session start/end, error, tool call, token usage). Default to a defaultTelemetry struct that forwards to the package-level pkg/telemetry helpers (preserving the existing context-attached-client behaviour).

Replace every direct telemetry.RecordX call site (loop.go x4, tool_dispatch.go, streaming.go) with r.telemetry.RecordX. handleStream stays a free function — its design contract is to be runtime-state-free — but now takes the Telemetry as an explicit parameter; fallback.go forwards r.telemetry when calling it.

Tests can now inject a recordingTelemetry via WithTelemetry to assert lifecycle ordering, agent/session correlation, and token-usage totals without standing up an OTel client.

Assisted-By: docker-agent
…flow cap

Pull every event-channel buffer literal (loop.go, client.go, persistent_runtime.go, remote_runtime.go) and the emitToolsChanged 5-second timeout into named constants in defaults.go so they can be reasoned about in one place.

Promote the inline maxOverflowCompactions=1 constant into a per-runtime field (LocalRuntime.maxOverflowCompactions) defaulted via defaultMaxOverflowCompactions. New WithMaxOverflowCompactions(n) Opt lets tests exercise both the "compaction succeeded" and "compaction exhausted" branches without driving many real overflows; negative values clamp to 0.

Includes overflow_compaction_test.go covering the cap=0, default-cap, and negative-clamp cases.

Assisted-By: docker-agent
…send race contract

The two raw fields elicitationEventsChannel + elicitationEventsChannelMux on LocalRuntime carried a non-trivial concurrency contract: the swap-back at finalizeEventChannel must not race with close(events), and a send must hold the read lock to keep the channel reference open until the send returns. That contract was implicit and hard to reason about while diff'ing the surrounding 250-line RunStream goroutine.

Move the channel + mutex + swap/send into a dedicated elicitationBridge type. LocalRuntime now exposes a single elicitation field, the surrounding methods are simple delegations, and the contract has its own race-detector test that drives the inflight-send-vs-swap interleave directly without spinning up a runtime, agent, or MCP toolset.

Note: an earlier plan to push the channel into context.WithValue and drop the mutex would have been a regression — the MCP elicitation handler can be invoked with a context that did not flow from RunStream (OAuth path), so a per-stream context value would not reach every callsite, and dropping the mutex would re-introduce the close-during-send race.

Assisted-By: docker-agent
… from RunStream

Two of the largest, most-branchy regions of the RunStream goroutine were unreachable from unit tests because they were anonymous code blocks inside a 250-line closure: the max-iterations enforcement / resume-or-reject flow (~60 lines) and the model-error recovery path that decides between auto-compaction retry and fatal exit (~30 lines).

Move both into named methods on *LocalRuntime in a new loop_steps.go: enforceMaxIterations returns (newMax, decision) and handleStreamError returns a streamErrorOutcome. The run loop becomes a small switch-on-outcome at each callsite, and each branch (below cap / non-interactive auto-stop / interactive approve / interactive reject / context cancel for max-iter; canceled / overflow-retry / overflow-exhausted / generic-fatal for stream errors) gets its own focused test.

Net effect on loop.go: -81 lines, +191 lines of testable helpers, +9 unit tests covering branches that previously needed full end-to-end runs to exercise.

Assisted-By: docker-agent
The fallback cooldown logic was scattered across three methods on *LocalRuntime (getCooldownState/setCooldownState/clearCooldownState), each repeating the same map+mutex+slog ceremony, plus the eviction-on-read invariant lived inline in getCooldownState. Reasoning about that invariant required reading 60 lines of fallback.go and was untestable without spinning up an agent + team + runtime.

Move the map, mutex, eviction logic, and slog emissions into a small cooldownManager type with Get/Set/Clear and an injectable clock. The three runtime methods become one-line delegations. The clock is wired through r.now so WithClock(fakeClock) makes the eviction window deterministic and the new test exercises the contract directly under -race without going through tryModelWithFallback.

Net effect on fallback.go: -50 lines, +50 lines of standalone testable type, +8 unit tests covering Set/Get/Clear, eviction, replacement, isolation, nil-clock default, and concurrent safety.

Assisted-By: docker-agent
…the testability series

Five methods on *LocalRuntime (CurrentAgentName, setCurrentAgent, SetCurrentAgent, CurrentAgent, resolveSessionAgent) all touched the same two raw fields (currentAgent string + currentAgentMu sync.RWMutex) and reimplemented the same patterns: name lookup, mutex-guarded write, validated write, team.Agent(...) resolve. Same story as cooldownManager — a small mutable subdomain hidden inside a 1100-line struct.

Move it to a dedicated agentRouter (Name/Set/SetValidated/Current/ResolveSession). LocalRuntime keeps single-line wrappers for the existing callsites; the router can be unit-tested directly with a team of two agents and a session, exercising the session-pinned-vs-current fallback, the unknown-pin fallback to current, the validated-set rejection of unknown names, and concurrent safety under -race.

Drive-by lint cleanup picked up while finishing the testability series:

- gci formatting on runtime.go and telemetry_test.go (struct field alignment)

- testifylint require-error in lazy_model_store_test.go

- modernize WaitGroup.Go in elicitation_bridge_test.go

Assisted-By: docker-agent
…build

Several thin wrappers added during the testability series can go now that the underlying types are stable:

- LocalRuntime.swapElicitationEventsChannel was a one-line delegation to elicitationBridge.swap with two callsites in loop.go. Replace both callsites with the direct call.

- LocalRuntime.setCooldownState / clearCooldownState were one-line delegations with single callsites in fallback.go. Inline them. getCooldownState had three callsites; inline those too — r.cooldowns.Get(...) reads the same as before.

- Move fallbackCooldownState from fallback.go to cooldown_manager.go where the manager that owns it lives. The struct is purely internal to the cooldown subsystem.

NewLocalRuntime built the cooldown manager twice: once before opts and again after, just in case WithClock changed r.now. Rebuild once after opts (no opt depends on r.cooldowns existing during construction). One initialisation, no "rebuild now that opts have run" comment to puzzle over.

enforceMaxIterations had two identical addAgentMessage(stop msg) blocks (non-interactive auto-stop + interactive reject). Pull the formatting and the assistant-message append into a tiny local helper.

Assisted-By: docker-agent
…method

RunStream's anonymous "go func() { ~250 lines }()" closure was the kind of thing that's hard to navigate in editors, hard to identify in stack traces, and hard to read because everything inside was indented two extra tab stops.

Move the body verbatim into a runStreamLoop method on *LocalRuntime. RunStream becomes a 4-line dispatcher (slog.Debug, channel make, go r.runStreamLoop, return). The body's logic is unchanged \u2014 every line is identical, just dedented by one tab \u2014 but it now has a real symbol name and a focused doc comment.

Net diff is mostly reindentation; +3 / -3 net lines of meaningful change.

Assisted-By: docker-agent
…iles

runtime.go was a 1081-line catch-all that mixed five distinct concerns: the LocalRuntime struct + Opts + constructor, current-agent helpers, MCP prompts, session/permissions/lifecycle plumbing, the Resume API surface, and the elicitation API + handler. Splitting the elicitation and resume concerns into their own files removes \u003c100 lines from runtime.go and groups related logic by topic.

Move ElicitationResult / ElicitationError / ElicitationRequestHandler / ResumeElicitation method / elicitationHandler method, plus the elicitationBridge type from elicitation_bridge.go, into a single elicitation.go. Everything elicitation-related now lives in one ~140-line file.

Move ResumeType + constants + ResumeRequest + ResumeApprove* helpers + IsValidResumeType / ValidResumeTypes (previously split across runtime.go and resume_validation.go) into resume.go. Doc comments expanded to make the type act as an entry point for readers.

Test files renamed correspondingly: resume_validation_test.go \u2192 resume_test.go, elicitation_bridge_test.go \u2192 elicitation_test.go (TestElicitationError_Error moved over too).

Net effect: runtime.go drops from 1081 to 963 lines; the package gains two well-named topical files; no behaviour changes.

Assisted-By: docker-agent
@dgageot dgageot force-pushed the board/improving-runtime-package-testability-0681baaf branch from 404001f to 38075cd Compare April 27, 2026 16:58
@dgageot dgageot merged commit cadd0a3 into docker:main Apr 28, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants