Skip to content

BeginBulkWrite() disables crash safety — SIGKILL during index permanently corrupts DB #67

@halindrome

Description

@halindrome

Summary

BeginBulkWrite() switches the SQLite journal mode to MEMORY and synchronous to OFF for the duration of indexing. If the CMM process is killed (SIGKILL, OOM, user closing terminal) before EndBulkWrite() restores WAL mode, the partially-written B-tree pages are permanently flushed to the main DB file with no rollback journal. The database is unrecoverable without deleting and re-indexing.

Affected code

internal/store/store.go:

// line 223-225
func (s *Store) BeginBulkWrite(ctx context.Context) {
    _, _ = s.db.ExecContext(ctx, "PRAGMA journal_mode = MEMORY")
    _, _ = s.db.ExecContext(ctx, "PRAGMA synchronous = OFF")
// line 230-233
func (s *Store) EndBulkWrite(ctx context.Context) {
    _, _ = s.db.ExecContext(ctx, "PRAGMA synchronous = NORMAL")
    _, _ = s.db.ExecContext(ctx, "PRAGMA journal_mode = WAL")

Reproduction

  1. Start indexing a large repository (>50k files) with CMM
  2. Kill the process mid-index (kill -9 <pid>, OOM, or just close the terminal)
  3. Next session: any search_graph call returns search: database disk image is malformed

What we observed

  • DB at ~/.cache/codebase-memory-mcp/<project>.db was 125MB, 93933 edges
  • PRAGMA integrity_check returned dozens of "Tree XXXX page YYYY cell N: 2nd reference to page ZZZZ" errors — classic interrupted B-tree write
  • Specifically idx_edges_url_path (a large B-tree index on edges.url_path) was corrupted
  • No -wal or -shm companion files existed — WAL mode had already been abandoned when the crash occurred in MEMORY journal mode
  • SELECT count(*) FROM nodes and FROM edges both fail — core tables unreadable
  • Only fix: delete_project + index_repository to rebuild from scratch

Why it matters

MEMORY journal mode means SQLite writes pages directly to the main DB file during the transaction with no way to roll back if the process exits abnormally. For large codebases that take minutes to index, the probability of hitting a kill signal (OOM, user interrupt, power loss) during that window is not negligible.

WAL mode (the default) is crash-safe: an interrupted write leaves a partial WAL file that SQLite simply ignores on next open. Switching to MEMORY mode removes this safety.

Suggested fix

Remove the journal_mode = MEMORY switch in BeginBulkWrite(). WAL mode with a larger cache_size and synchronous = NORMAL provides nearly the same bulk-write throughput while remaining crash-safe:

func (s *Store) BeginBulkWrite(ctx context.Context) {
    // Keep WAL mode — switching to MEMORY disables crash recovery
    _, _ = s.db.ExecContext(ctx, "PRAGMA synchronous = OFF")
    _, _ = s.db.ExecContext(ctx, "PRAGMA cache_size = -65536") // 64MB page cache
}

If the MEMORY journal speedup is significant enough to keep, an alternative is the atomic swap pattern: index into a temp .db file, rename over the old file only on successful completion. This preserves the old DB if indexing is interrupted.

Environment

  • macOS 15.x (Darwin 25.2.0)
  • CMM version: from ../codebase-memory-mcp local build
  • Project: large Perl monorepo (~93k edges)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions