Skip to content

Nil pointer dereference (SIGSEGV) in BM25Strategy.watchLoop during session teardown #2326

@aheritier

Description

@aheritier

Description

A nil pointer dereference panic occurs in BM25Strategy.watchLoop when a session is torn down while the file watcher goroutine is still running. The same bug exists in VectorStore.watchLoop.

Crash

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x10 pc=0x104048510]

goroutine 27441 [running]:
github.com/docker/docker-agent/pkg/rag/strategy.(*BM25Strategy).watchLoop(...)
    /Users/arnaud/Workspace/dev/docker/docker-agent/pkg/rag/strategy/bm25.go:738 +0x130
created by github.com/docker/docker-agent/pkg/rag/strategy.(*BM25Strategy).StartFileWatcher
    /Users/arnaud/Workspace/dev/docker/docker-agent/pkg/rag/strategy/bm25.go:401 +0x314
  • Version: v2.49.0
  • Observed on: voxxedlu2026, devoxxpl26 (2 parallel TUI sessions using the same agent config)

Root cause

There is a data race between watchLoop and Close() on the s.watcher field.

The race sequence

  1. watchLoop (goroutine 27441) is running its for/select loop, reading from s.watcher.Events (bm25.go:738) and s.watcher.Errors.
  2. A session tears down: StopToolSetsRAGTool.StopManager.CloseBM25Strategy.Close.
  3. Close() acquires watcherMu, calls s.watcher.Close() (which closes the Events and Errors channels), then sets s.watcher = nil, and releases the mutex.
  4. The watchLoop goroutine wakes up from the closed channel. Go's select returns with ok == false. However, before the ok check can cause a return, the goroutine may loop back to the for statement and the select re-evaluates all channel expressions — including s.watcher.Events.
  5. Since s.watcher is now nil, accessing .Events (at struct offset 0x10) dereferences a nil pointer → SIGSEGV.

Why addr=0x10

The Events channel field is at offset 0x10 within the fsnotify.Watcher struct. Dereferencing nil + 0x10 produces the observed fault address.

Why watchLoop does not hold the mutex

The watchLoop goroutine does not hold watcherMu when accessing s.watcher.Events and s.watcher.Errors in the select statement. Only the Create event handler correctly acquires the mutex before calling s.addPathToWatcher. The main select channels are unprotected.

Affected code

Both strategies have the identical pattern:

  • pkg/rag/strategy/bm25.go:738case event, ok := <-s.watcher.Events:
  • pkg/rag/strategy/vector_store.go:932case event, ok := <-s.watcher.Events:

Both Close() implementations (bm25.go:409-427, vector_store.go:473-491) set s.watcher = nil after closing.

Reproduction conditions

The crash occurs when a session is torn down while the watchLoop goroutine is alive. Running 2 parallel TUI sessions with RAG configured increases the likelihood because one session tearing down (tab close, /new, agent file reload) triggers Close() while the watcher goroutine is blocked in select. However, a single session can also trigger it on exit with the right timing.

Suggested fix

Capture the watcher's channels as local variables before entering the for loop, so the goroutine never reads s.watcher after initialization:

func (s *BM25Strategy) watchLoop(ctx context.Context, docPaths []string) {
    events := s.watcher.Events  // capture once at goroutine start
    errors := s.watcher.Errors  // capture once at goroutine start
    // ...
    for {
        select {
        case <-ctx.Done():
            // ...
        case event, ok := <-events:   // no longer touches s.watcher
            if !ok {
                return
            }
            // ...
        case err, ok := <-errors:     // no longer touches s.watcher
            if !ok {
                return
            }
            // ...
        }
    }
}

The same fix should be applied to VectorStore.watchLoop.

The addPathToWatcher call on Create events still needs the actual s.watcher under watcherMu, which is already correctly guarded — but it should also check for s.watcher == nil before calling addPathToWatcher to handle the case where Close() has already run.

Metadata

Metadata

Assignees

Labels

area/ragFor work/issues that have to do with the RAG featureskind/bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions