feat: unload local models when switching agents#2660
feat: unload local models when switching agents#2660dgageot wants to merge 2 commits intodocker:mainfrom
Conversation
- SetCurrentAgent runs Unload in a goroutine so the TUI agent picker, which calls it from the bubbletea Update loop, isn't frozen while the engine acknowledges the unload (up to 10s). - unloadOnSwitch now skips nil providers defensively. - New TestSetCurrentAgent_UnloadIsAsync asserts the picker returns before Unload completes.
|
Why isn't this implemented as a hook? |
|
@rumpl I'll give it a try |
aheritier
left a comment
There was a problem hiding this comment.
Overall LGTM — clean architecture, safe defaults, and comprehensive tests. Left a few inline nits below.
| return err | ||
| } | ||
| return base.PostUnloadModel(ctx, httpclient.NewHTTPClient(ctx), endpoint, c.ModelConfig.Model) | ||
| } |
There was a problem hiding this comment.
This creates a fresh *http.Client on every Unload call, which means any http_headers configured in provider_opts won't be forwarded to the unload endpoint. For the current use-case (unauthenticated local engines) this is fine, but it's asymmetric with the DMR path that reuses c.httpClient.
Worth a short comment so future maintainers don't wonder why custom headers aren't forwarded here:
// httpclient.NewHTTPClient is used instead of reusing the SDK client because
// the openai.Client wraps its transport and there's no clean way to extract a
// raw *http.Client from it. For local engines (the only use-case for unload_api)
// this is fine — they typically don't require auth headers.| unloader, ok := m.(provider.Unloader) | ||
| if !ok { | ||
| continue | ||
| } |
There was a problem hiding this comment.
Minor: cancel() is called after Unload returns, which is correct, but defer cancel() would be the idiomatic Go pattern and would also handle a potential panic inside Unload without leaking the timer goroutine. Up to you — the current form is functionally fine since Unload implementations don't panic.
| return | ||
| } | ||
| for _, m := range prev.ConfiguredModels() { | ||
| if m == nil { |
There was a problem hiding this comment.
Nit: FallbackModels() isn't iterated here. That's probably intentional — a fallback model may never have been loaded — but worth a one-liner comment to make the deliberate choice clear:
// Only ConfiguredModels are considered; FallbackModels are skipped because
// they may never have been loaded and unloading an absent model is harmless
// but wasteful.| go r.unloadOnSwitch(context.Background(), prev, next) | ||
| return nil | ||
| } | ||
|
|
There was a problem hiding this comment.
If the TUI agent picker fires two rapid switches (A→B→C), two goroutines can be in flight simultaneously: one unloading A, one unloading B. If A and C happen to share a model, that model gets unloaded just before C needs it, causing an extra reload. Not a correctness issue given the best-effort semantics, but worth a comment so it's an acknowledged trade-off rather than an oversight.
aheritier
left a comment
There was a problem hiding this comment.
LGTM. Clean architecture: optional Unloader interface, shared HTTP/URL helpers in base, best-effort semantics with 10s timeout, the async path for the TUI picker is a good catch and well-justified.
The four inline nits I left earlier are all non-blocking — feel free to address them in a follow-up if you prefer:
defer cancel()inunloadOnSwitch(idiom)- comment about
ConfiguredModels()only, skippingFallbackModels() - comment about the rapid A→B→C switch race
- comment about why the OpenAI path can't reuse the SDK client
CI all green, comprehensive test coverage including -race.
|
Closing in favour of #2684 |
Closes #2636.
What
Adds an opt-in mechanism for local inference engines (DMR, ollama, ramalama, ...) to release a model from memory when the active agent switches to one that uses a different model.
User-facing surface
Two new pieces of config (latest only — older versions are frozen):
For DMR specifically, the default `/_unload` endpoint is auto-derived from the OpenAI base URL (mirroring how `_configure` is derived), so DMR users don't need to set `unload_api` unless they want to override it.
```yaml
providers:
local_dmr:
provider: dmr
base_url: http://model-runner.docker.internal/engines/llama.cpp/v1
unload_api: /engines/_unload # optional for DMR; defaults to /_unload
models:
qwen:
provider: local_dmr
model: ai/qwen3
provider_opts:
unload_on_switch: true
```
A runnable example lives at `examples/unload_on_switch.yaml`.
Wiring
The unload runs at every agent-switch entry point:
Best-effort: each call is wrapped in a 10s timeout, providers that don't implement `Unloader` are silently skipped, and any error is logged but never propagated, so a slow or unreachable engine cannot break agent switching.
Architecture
TUI freeze fix
`SetCurrentAgent` is called from the bubbletea Update loop, so a synchronous unload would freeze the TUI for up to 10s. The runtime spawns the unload in a goroutine for that path only — fire-and-forget is safe because the new agent's model isn't loaded until the user sends a message, by which time the unload almost certainly finished. The other two switch paths (`swapCurrentAgent`, `handleHandoff`) stay synchronous because there a new model IS loaded immediately and we want unload to complete first.
Tests
Validation