fix(tui): bound previously-unbounded caches to prevent OOM on long sessions by dgageot · Pull Request #2831 · docker/docker-agent

dgageot · 2026-05-20T09:17:16Z

Summary

Bounds three previously-unbounded in-memory caches in the TUI to prevent
memory growth (and eventual OOM) on long sessions:

#	Cache	Where	Cap
1	`cache` (edit_file render results, keyed by tool call ID)	`pkg/tui/components/tool/editfile/render.go`	64 entries
2	`renderedItems` (per-message rendered output)	`pkg/tui/components/messages/messages.go`	500 entries
3	`syntaxHighlightCache` (already capped at 128 — moved to shared package)	`pkg/tui/components/markdown/fast_renderer.go`	128 entries

To do this cleanly, the small private lruCache that already lived in the
markdown package is promoted to a new shared package, pkg/lrucache,
with proper tests and docs.

Why

On long coding sessions the agent accumulates many edit_file tool calls
and many rendered messages. The two caches above were map[K]Vs with no
eviction, so memory grew linearly with session length × output size. This
matches reports of cagent/docker-agent getting OOM-killed on long
sessions. The new caps trade a small amount of re-render CPU on misses for
hard memory bounds.

What's in `pkg/lrucache`

A small generic LRU[K, V] (~100 lines) backed by container/list:

New(maxSize) — clamped to ≥ 1
Get / Put / Delete / Clear / Len / Range
100% test coverage (testify)
Documented as not thread-safe — callers provide synchronization

Bonus: pre-existing data race fix

While running the test suite under -race, I uncovered an existing race in
the markdown syntax-highlight cache (and the same pattern in the new
editfile call site I'd just written): both used sync.RWMutex.RLock() to
call LRU.Get(), but Get mutates the recency list. Multiple goroutines
holding read locks would race on MoveToFront.

Fixed by:

Switching both call sites from sync.RWMutex to sync.Mutex
Documenting the gotcha on lrucache.Get's doc comment

task test (no -race) doesn't catch this, so it would have slipped
through CI. Now go test -race ./... is clean for the touched packages.

What it does NOT do

The conversation history (session.Session.Messages) itself is still
append-only and grows with session length. That's by design — the model
needs full context. A future PR could add summarization/compaction.
A few other map-backed caches in the TUI (e.g. styleSeqCache) are
naturally bounded by their key domain (fixed set of styles, languages,
chroma token types) and don't need this treatment.

Validation

✅ task lint — 0 issues across 1045 files
✅ task test — passes for all touched packages
✅ go test -race ./pkg/lrucache/... ./pkg/tui/... — clean
✅ 100% test coverage on pkg/lrucache

(Pre-existing failures in pkg/teamloader/TestLoadExamples for dmr.yaml
and unload_on_switch.yaml require Docker Model Runner running locally —
unrelated to this change, fail identically on origin/main.)

creates pkg/lrucache with a small generic LRU cache to prevent memory growth in long-running sessions. replaces the private lruCache in markdown with the shared package, and bounds the previously-unbounded render cache (64 entries) and renderedItems cache (500 entries) in editfile and messages components. also fixes a pre-existing data race in the syntax highlight and editfile caches: LRU.Get mutates the recency list, so call sites must hold a write lock, not a read lock. switches both call sites from sync.RWMutex to sync.Mutex and documents the gotcha on lrucache.Get.

docker-agent

Assessment: 🟢 APPROVE

docker-agent · 2026-05-20T09:27:51Z

@@ -71,37 +81,29 @@ type linePair struct {
 }

 func getOrCreateCache(toolCallID string) *toolRenderCache {


[LOW] Rendered result written to orphaned struct after LRU eviction

getOrCreateCache releases cacheMu before returning its pointer. If the LRU later evicts that entry (because 64 other tool-call IDs are Put between the two lock acquisitions in renderEditFile/countDiffLines), a subsequent getOrCreateCache for the same ID allocates a fresh *toolRenderCache. The original goroutine still holds the old pointer and writes c.rendered = result to the now-orphaned struct — the result never lands in the LRU and is lost on the next lookup.

Impact: Low — requires 64 concurrent evictions between two lock acquisitions in what is typically a single rendering goroutine. The consequence is an extra recomputation (cache miss), not data corruption or a crash. In practice the cap of 64 is well above the number of simultaneous active edit-file calls in a typical session.

Possible fix: After computing the result, call cache.Put(toolCall.ID, c) instead of (or in addition to) writing directly to c, so the rendered data is always associated with the live LRU entry, even if the old pointer was evicted.

dgageot requested a review from a team as a code owner May 20, 2026 09:17

docker-agent reviewed May 20, 2026

View reviewed changes

aheritier approved these changes May 20, 2026

View reviewed changes

dgageot merged commit 16d940f into docker:main May 20, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tui): bound previously-unbounded caches to prevent OOM on long sessions#2831

fix(tui): bound previously-unbounded caches to prevent OOM on long sessions#2831
dgageot merged 1 commit into
docker:mainfrom
dgageot:board/b821136e5f51759f

dgageot commented May 20, 2026

Uh oh!

docker-agent left a comment

Uh oh!

docker-agent May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -71,37 +81,29 @@ type linePair struct {
		}

		func getOrCreateCache(toolCallID string) *toolRenderCache {

Conversation

dgageot commented May 20, 2026

Summary

Why

What's in pkg/lrucache

Bonus: pre-existing data race fix

What it does NOT do

Validation

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟢 APPROVE

Uh oh!

docker-agent May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

What's in `pkg/lrucache`