Skip to content

feat(memory): Atomic writes and cross-process locking for memory DB access #3022

@hamza-jeddad

Description

@hamza-jeddad

Background

Sub-issue of #3011.

docker-agent can run multiple concurrent instances — CLI sessions (cmd/root/chat.go, run.go), gateway workers (pkg/gateway/, pkg/chatserver/), cron / API agents — all sharing the same SQLite memory database in pkg/memory/database/sqlite/. The current write path opens a transaction but does not guard against concurrent writers at the OS level or ensure readers always see a fully committed state.

Two failure modes:

  1. Torn reads: a reader sees a partially-written row during a slow write.
  2. Lost updates: two concurrent writers both read generation N, both decide to write, and one silently overwrites the other's changes without the drift guard (#TBD-C, drift detection) having a chance to fire (because both read the same generation before either committed).

The fix is a two-layer approach: SQLite WAL + busy timeout for in-process safety, plus a fcntl/LockFileEx advisory lock for cross-process serialisation of read-modify-write cycles.

Proposed design

1. SQLite WAL mode + busy timeout

Enable Write-Ahead Logging and a generous busy timeout on every connection opened to the memory DB in pkg/memory/database/sqlite/:

db.Exec("PRAGMA journal_mode=WAL")
db.Exec("PRAGMA busy_timeout=5000") // 5 s

WAL allows concurrent readers and a single writer without blocking. The busy timeout prevents immediate SQLITE_BUSY errors when a writer is active.

2. Advisory file lock for multi-process write serialisation

For the write paths that require read-modify-write atomicity (add, update, delete with drift-check), acquire an exclusive advisory lock on a companion .lock file before the read-generation / write cycle:

// pkg/memory/database/lock.go
type FileLock struct { … }

func (l *FileLock) Lock() error   { … } // fcntl F_SETLKW on Linux/macOS; LockFileEx on Windows
func (l *FileLock) Unlock() error { … }

Lock path: <data_dir>/memory.lock.

The lock file is never deleted (avoids TOCTOU); its existence is benign.

3. Atomic snapshot export (for drift backups)

When the drift-detection guard (sibling sub-issue C) exports a .bak file, it must write to a temp file in the same directory and rename it into place atomically. Reuse pkg/atomicfile/ if its API suits, otherwise:

tmp, _ := os.CreateTemp(dir, ".mem_backup_*.json.tmp")
// … write JSON …
tmp.Sync()
tmp.Close()
os.Rename(tmp.Name(), finalPath) // atomic on POSIX; best-effort on Windows

This prevents a concurrent reader from seeing a half-written backup.

4. Connection pool limits

Limit the SQLite connection pool to 1 writer connection and allow multiple reader connections. This is enforced by using database/sql with db.SetMaxOpenConns(1) on the write connection and a separate read pool.

5. Cross-platform support

  • Linux / macOS: fcntl(2) F_SETLKW (blocking exclusive lock).
  • Windows: LockFileEx with LOCKFILE_EXCLUSIVE_LOCK.
  • Fallback (neither available): proceed without OS-level lock but log a warning; SQLite WAL + busy timeout still provide best-effort safety.

The repo already uses build-tagged lock files in pkg/cache/ (lock_unix.go, lock_windows.go, lock_js.go) — follow the same pattern.

Implementation checklist

  • pkg/memory/database/sqlite/db.go — set PRAGMA journal_mode=WAL and PRAGMA busy_timeout=5000 on connection open
  • pkg/memory/database/lock_unix.go / lock_windows.go / lock_js.goFileLock with Lock() / Unlock(); cross-platform (fcntl / LockFileEx / no-op fallback)
  • pkg/tools/builtin/memory/ — acquire FileLock before the read-generation → drift-check → write cycle in add_memory, update_memory, delete_memory; release in defer
  • pkg/memory/database/backup.go — atomic temp-file + rename for snapshot export (used by sub-issue C drift guard) — consider reusing pkg/atomicfile/
  • db.SetMaxOpenConns(1) on the writer connection; separate read pool
  • Unit tests: concurrent goroutines writing to the same DB; assert no torn reads, no lost updates; drift guard fires correctly under contention
  • go test -race passes
  • Windows CI: confirm LockFileEx path compiles and passes basic lock/unlock round-trip test

Acceptance criteria

  • WAL mode is set on every connection opened to the memory DB
  • busy_timeout=5000 prevents immediate SQLITE_BUSY errors under normal write contention
  • Concurrent write goroutines never produce a torn read or a silently lost update
  • The .lock file approach serialises cross-process writers on Linux/macOS and Windows
  • Atomic backup export never leaves a partial .bak file visible to readers
  • go test -race passes on pkg/memory/database/ and pkg/tools/builtin/memory/

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/agentFor work that has to do with the general agent loop/agentic features of the apparea/ragFor work/issues that have to do with the RAG featuresarea/toolsFor features/issues/fixes related to the usage of built-in and MCP tools

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions