BeginBulkWrite() disables crash safety — SIGKILL during index permanently corrupts DB

## Summary

`BeginBulkWrite()` switches the SQLite journal mode to `MEMORY` and `synchronous` to `OFF` for the duration of indexing. If the CMM process is killed (SIGKILL, OOM, user closing terminal) before `EndBulkWrite()` restores WAL mode, the partially-written B-tree pages are permanently flushed to the main DB file with no rollback journal. The database is unrecoverable without deleting and re-indexing.

## Affected code

`internal/store/store.go`:

```go
// line 223-225
func (s *Store) BeginBulkWrite(ctx context.Context) {
    _, _ = s.db.ExecContext(ctx, "PRAGMA journal_mode = MEMORY")
    _, _ = s.db.ExecContext(ctx, "PRAGMA synchronous = OFF")
```

```go
// line 230-233
func (s *Store) EndBulkWrite(ctx context.Context) {
    _, _ = s.db.ExecContext(ctx, "PRAGMA synchronous = NORMAL")
    _, _ = s.db.ExecContext(ctx, "PRAGMA journal_mode = WAL")
```

## Reproduction

1. Start indexing a large repository (>50k files) with CMM
2. Kill the process mid-index (`kill -9 <pid>`, OOM, or just close the terminal)
3. Next session: any `search_graph` call returns `search: database disk image is malformed`

## What we observed

- DB at `~/.cache/codebase-memory-mcp/<project>.db` was 125MB, 93933 edges
- `PRAGMA integrity_check` returned dozens of `"Tree XXXX page YYYY cell N: 2nd reference to page ZZZZ"` errors — classic interrupted B-tree write
- Specifically `idx_edges_url_path` (a large B-tree index on `edges.url_path`) was corrupted
- No `-wal` or `-shm` companion files existed — WAL mode had already been abandoned when the crash occurred in MEMORY journal mode
- `SELECT count(*) FROM nodes` and `FROM edges` both fail — core tables unreadable
- Only fix: `delete_project` + `index_repository` to rebuild from scratch

## Why it matters

MEMORY journal mode means SQLite writes pages directly to the main DB file during the transaction with no way to roll back if the process exits abnormally. For large codebases that take minutes to index, the probability of hitting a kill signal (OOM, user interrupt, power loss) during that window is not negligible.

WAL mode (the default) is crash-safe: an interrupted write leaves a partial WAL file that SQLite simply ignores on next open. Switching to MEMORY mode removes this safety.

## Suggested fix

Remove the `journal_mode = MEMORY` switch in `BeginBulkWrite()`. WAL mode with a larger `cache_size` and `synchronous = NORMAL` provides nearly the same bulk-write throughput while remaining crash-safe:

```go
func (s *Store) BeginBulkWrite(ctx context.Context) {
    // Keep WAL mode — switching to MEMORY disables crash recovery
    _, _ = s.db.ExecContext(ctx, "PRAGMA synchronous = OFF")
    _, _ = s.db.ExecContext(ctx, "PRAGMA cache_size = -65536") // 64MB page cache
}
```

If the MEMORY journal speedup is significant enough to keep, an alternative is the **atomic swap pattern**: index into a temp `.db` file, rename over the old file only on successful completion. This preserves the old DB if indexing is interrupted.

## Environment

- macOS 15.x (Darwin 25.2.0)
- CMM version: from `../codebase-memory-mcp` local build
- Project: large Perl monorepo (~93k edges)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BeginBulkWrite() disables crash safety — SIGKILL during index permanently corrupts DB #67

Summary

Affected code

Reproduction

What we observed

Why it matters

Suggested fix

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

BeginBulkWrite() disables crash safety — SIGKILL during index permanently corrupts DB #67

Description

Summary

Affected code

Reproduction

What we observed

Why it matters

Suggested fix

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions