CodebaseIndexer: runaway embedding loop with 0% cache hits + 4GB+ data/query memleak

## Summary
`CodebaseIndexer` (TS, `src/system/rag/services/CodebaseIndexer.ts`) enters an embedding-generation tight loop ~120s after server start and never stops. Combined with the data/query memleak in continuum-core, the system becomes unable to serve persona inference within minutes.

## Symptoms (observed 2026-04-19, Mac M5)
- Server starts → 120s grace → indexer scheduled
- Once running: `Generated 16 embeddings (384d)` log lines fire every ~100-500ms continuously, cache hits 0/16
- `continuum-core-server` CPU climbs to 1100%+, RSS climbs from ~500MB to 2.2GB
- `[MEMLEAK] data/query:+4807MB cumulative` (largest single leaker)
- ALL persona inference requests hang with no response — DataDaemon is starved
- AIProviderDaemonServer reports "Appears stuck (60s, 90s, ... 360s+ since last success)" indefinitely

## Workaround (already shipped)
PR adds `SKIP_CODEBASE_INDEX=1` env var (commit `048a8235f`, branch `feature/shared-cognition-rust`). Setting the var skips `initializeCodebaseIndexing()` entirely. With the var set, personas respond normally (validated 2026-04-19 — see PR description).

## Root cause (not yet diagnosed)
Two intertwined issues to investigate:
1. **0/16 cache hit ratio.** The indexer is supposed to skip files whose `contentHash` matches the existing `code_index` entry (`removeEntriesForFiles` + `loadContentHashes`). With 0% hit rate, EITHER the hash compare is broken OR every cycle truly sees new content.
2. **`data/query` memleak.** The indexer's read path (`ORM.query` over `code_index` with `limit: 10000`) appears to leak ~5-30MB per call. After thousands of calls, gigabytes accumulated. Could be:
   - SQLite connection / cursor not released
   - Vector buffers (384d × thousands of rows) retained in IPC layer
   - Embedding cache growing without bound

## Fix priority
Workaround unblocks chat-validation. Real fix needs investigation under low-cost reproduction (single-folder repo? unit test that runs N indexing cycles and asserts steady RSS?).

## Related
- Workaround commit: `048a8235f feat(ServiceInitializer): SKIP_CODEBASE_INDEX env gate`
- Branch: `feature/shared-cognition-rust`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodebaseIndexer: runaway embedding loop with 0% cache hits + 4GB+ data/query memleak #944

Summary

Symptoms (observed 2026-04-19, Mac M5)

Workaround (already shipped)

Root cause (not yet diagnosed)

Fix priority

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CodebaseIndexer: runaway embedding loop with 0% cache hits + 4GB+ data/query memleak #944

Description

Summary

Symptoms (observed 2026-04-19, Mac M5)

Workaround (already shipped)

Root cause (not yet diagnosed)

Fix priority

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions