Skip to content

feat: unload local models when switching agents#2660

Closed
dgageot wants to merge 2 commits intodocker:mainfrom
dgageot:unload-on-switch
Closed

feat: unload local models when switching agents#2660
dgageot wants to merge 2 commits intodocker:mainfrom
dgageot:unload-on-switch

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented May 6, 2026

Closes #2636.

What

Adds an opt-in mechanism for local inference engines (DMR, ollama, ramalama, ...) to release a model from memory when the active agent switches to one that uses a different model.

User-facing surface

Two new pieces of config (latest only — older versions are frozen):

  • `provider_opts.unload_on_switch: true` on a model — the runtime calls the engine's unload endpoint when switching away from any agent using this model.
  • `unload_api: ` on a `ProviderConfig` — endpoint the runtime hits with `POST {scheme://host}{unload_api}` and body `{"model": ""}`.

For DMR specifically, the default `/_unload` endpoint is auto-derived from the OpenAI base URL (mirroring how `_configure` is derived), so DMR users don't need to set `unload_api` unless they want to override it.

```yaml
providers:
local_dmr:
provider: dmr
base_url: http://model-runner.docker.internal/engines/llama.cpp/v1
unload_api: /engines/_unload # optional for DMR; defaults to /_unload

models:
qwen:
provider: local_dmr
model: ai/qwen3
provider_opts:
unload_on_switch: true
```

A runnable example lives at `examples/unload_on_switch.yaml`.

Wiring

The unload runs at every agent-switch entry point:

  • `swapCurrentAgent` (transfer_task forward + return)
  • `handleHandoff` (handoff)
  • `SetCurrentAgent` (TUI agent picker — async, see below)

Best-effort: each call is wrapped in a 10s timeout, providers that don't implement `Unloader` are silently skipped, and any error is logged but never propagated, so a slow or unreachable engine cannot break agent switching.

Architecture

  • `latest.ProviderConfig.UnloadAPI` — new field; merged into `ModelConfig.ProviderOpts["unload_api"]` so provider implementations don't need a back-reference to the parent config.
  • `latest.ModelConfig.UnloadOnSwitch()` / `UnloadAPI()` — small accessor methods next to the existing `DisplayOrModel()`.
  • `provider.Unloader` — optional interface in `pkg/model/provider/provider.go` (sibling of `RerankingProvider`).
  • `base.PostUnloadModel` / `base.JoinHostAndPath` — shared HTTP + URL helpers in `pkg/model/provider/base`. DMR and OpenAI both already import `base`, so no new package and no import cycle.
  • `dmr.Client.Unload` — DMR-specific URL derivation (`/v1` → `/_unload`) plus the shared HTTP helper.
  • `openai.Client.Unload` — no-op when no `unload_api` is configured (cloud providers don't have one).
  • `runtime.LocalRuntime.unloadOnSwitch` — iterates the previous agent's configured models and calls `Unload` on opted-in ones.

TUI freeze fix

`SetCurrentAgent` is called from the bubbletea Update loop, so a synchronous unload would freeze the TUI for up to 10s. The runtime spawns the unload in a goroutine for that path only — fire-and-forget is safe because the new agent's model isn't loaded until the user sends a message, by which time the unload almost certainly finished. The other two switch paths (`swapCurrentAgent`, `handleHandoff`) stay synchronous because there a new model IS loaded immediately and we want unload to complete first.

Tests

  • `pkg/config/latest/unload_test.go` — `UnloadOnSwitch` / `UnloadAPI` accessors.
  • `pkg/model/provider/base/unload_test.go` — URL resolution + HTTP behaviour (happy path, non-2xx, nil client fallback).
  • `pkg/model/provider/dmr/unload_test.go` — default `_unload` derivation, custom path override, error propagation, no-op when nothing is configured.
  • `pkg/model/provider/openai/unload_test.go` — no-op without `unload_api`, happy path, error path.
  • `pkg/model/provider/provider_defaults_test.go` — three new cases for the `UnloadAPI` plumbing.
  • `pkg/runtime/unload_test.go` — opt-in respected, no-op when not opted in or same agent or nil prev, errors don't propagate, providers without `Unloader` are skipped, and `TestSetCurrentAgent_UnloadIsAsync` (uses a blocking unloader) asserts the picker returns before `Unload` completes. Runs cleanly under `-race`.

Validation

  • `mise build` ✓
  • `mise lint` ✓ (`golangci-lint` and the in-tree `./lint` cop both clean)
  • `mise test` ✓

dgageot added 2 commits May 6, 2026 11:42
- SetCurrentAgent runs Unload in a goroutine so the TUI agent picker,
  which calls it from the bubbletea Update loop, isn't frozen while
  the engine acknowledges the unload (up to 10s).
- unloadOnSwitch now skips nil providers defensively.
- New TestSetCurrentAgent_UnloadIsAsync asserts the picker returns
  before Unload completes.
@dgageot dgageot requested a review from a team as a code owner May 6, 2026 12:05
@rumpl
Copy link
Copy Markdown
Member

rumpl commented May 6, 2026

Why isn't this implemented as a hook?

@dgageot
Copy link
Copy Markdown
Member Author

dgageot commented May 6, 2026

@rumpl I'll give it a try

Copy link
Copy Markdown
Contributor

@aheritier aheritier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM — clean architecture, safe defaults, and comprehensive tests. Left a few inline nits below.

return err
}
return base.PostUnloadModel(ctx, httpclient.NewHTTPClient(ctx), endpoint, c.ModelConfig.Model)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a fresh *http.Client on every Unload call, which means any http_headers configured in provider_opts won't be forwarded to the unload endpoint. For the current use-case (unauthenticated local engines) this is fine, but it's asymmetric with the DMR path that reuses c.httpClient.

Worth a short comment so future maintainers don't wonder why custom headers aren't forwarded here:

// httpclient.NewHTTPClient is used instead of reusing the SDK client because
// the openai.Client wraps its transport and there's no clean way to extract a
// raw *http.Client from it. For local engines (the only use-case for unload_api)
// this is fine — they typically don't require auth headers.

Comment thread pkg/runtime/unload.go
unloader, ok := m.(provider.Unloader)
if !ok {
continue
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: cancel() is called after Unload returns, which is correct, but defer cancel() would be the idiomatic Go pattern and would also handle a potential panic inside Unload without leaking the timer goroutine. Up to you — the current form is functionally fine since Unload implementations don't panic.

Comment thread pkg/runtime/unload.go
return
}
for _, m := range prev.ConfiguredModels() {
if m == nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: FallbackModels() isn't iterated here. That's probably intentional — a fallback model may never have been loaded — but worth a one-liner comment to make the deliberate choice clear:

// Only ConfiguredModels are considered; FallbackModels are skipped because
// they may never have been loaded and unloading an absent model is harmless
// but wasteful.

Comment thread pkg/runtime/runtime.go
go r.unloadOnSwitch(context.Background(), prev, next)
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the TUI agent picker fires two rapid switches (A→B→C), two goroutines can be in flight simultaneously: one unloading A, one unloading B. If A and C happen to share a model, that model gets unloaded just before C needs it, causing an extra reload. Not a correctness issue given the best-effort semantics, but worth a comment so it's an acknowledged trade-off rather than an oversight.

@aheritier aheritier added kind/feat PR adds a new feature (maps to feat: commit prefix) area/providers/docker-model-runner Docker Model Runner (DMR) local inference area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) priority:medium Normal priority, standard sprint work labels May 6, 2026
Copy link
Copy Markdown
Contributor

@aheritier aheritier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Clean architecture: optional Unloader interface, shared HTTP/URL helpers in base, best-effort semantics with 10s timeout, the async path for the TUI picker is a good catch and well-justified.

The four inline nits I left earlier are all non-blocking — feel free to address them in a follow-up if you prefer:

  • defer cancel() in unloadOnSwitch (idiom)
  • comment about ConfiguredModels() only, skipping FallbackModels()
  • comment about the rapid A→B→C switch race
  • comment about why the OpenAI path can't reuse the SDK client

CI all green, comprehensive test coverage including -race.

@dgageot
Copy link
Copy Markdown
Member Author

dgageot commented May 7, 2026

@rumpl do you prefer this one? #2684

@dgageot
Copy link
Copy Markdown
Member Author

dgageot commented May 7, 2026

Closing in favour of #2684

@dgageot dgageot closed this May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/providers/docker-model-runner Docker Model Runner (DMR) local inference area/providers For features/issues/fixes related to LLM providers (Bedrock, LiteLLM, Qwen, custom, etc.) kind/feat PR adds a new feature (maps to feat: commit prefix) priority:medium Normal priority, standard sprint work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unload Model from Memory on Agent Switch for Local Providers

3 participants