Observed
During shared-cognition shim validation (branch feature/shared-cognition-rust, 2026-04-19), both memento and anvil hit the same failure mode on independent dev machines:
- continuum-core-server RSS climbed steadily under moderate load (indexer + a few personas)
- `data/query` request timeouts started firing repeatedly (Rust-side WARN in rag.log: "CodebaseIndexer: Failed to load content hashes: Request timeout: data/query")
- anvil's independent measurement: data/query leaked ~4.8GB cumulative before Rust core OOM-crashed
- My machine: Rust core RSS hit 939MB, system swap 11.6GB, memory pressure 100% for ~2 minutes, Rust process then crashed and restarted
Impact
Chat becomes unresponsive system-wide. Personas get stuck mid-evaluation when ChatRAGBuilder's ORM.query hits data/query, never returning. Result: end-user ping → 10+ min silence or no response.
Workaround
Added `SKIP_CODEBASE_INDEX=1` env gate to `ServiceInitializer.initializeCodebaseIndexing` (commit 048a823). Skipping the 120s-after-boot codebase index prevents the saturation event.
Real fix
Two layers:
- data/query shouldn't leak. Whatever object holds query result sets isn't being dropped. Candidate suspects: SQLite row set in orm/sqlite.rs, in-flight IPC response buffers, or a retained tokio channel. Instrument an RSS delta per query invocation and bisect.
- Indexer backpressure. Even without the leak, the indexer's 16-embeddings-per-500ms rate is not throttled by data/query saturation. Add a circuit breaker: if data/query p99 > N ms, pause the indexer; resume when it recovers. The indexer is already yield-polite (EMBEDDING_BATCH_PAUSE_MS), but yielding alone isn't enough when downstream storage is degraded.
Repro
- `npm start` (cold)
- Wait 120s for the indexer to kick in
- Post a few chat messages to general
- Watch RSS in `.continuum/jtag/logs/system/core.log` (memory_pressure ticks)
- Observe data/query timeouts in `.continuum/jtag/logs/system/rag.log`
- Within a few minutes, Rust core hits critical memory and OOM-crashes
Related
- feature/shared-cognition-rust branch — indexer disable is tracked there, not the actual leak fix
Observed
During shared-cognition shim validation (branch feature/shared-cognition-rust, 2026-04-19), both memento and anvil hit the same failure mode on independent dev machines:
Impact
Chat becomes unresponsive system-wide. Personas get stuck mid-evaluation when ChatRAGBuilder's ORM.query hits data/query, never returning. Result: end-user ping → 10+ min silence or no response.
Workaround
Added `SKIP_CODEBASE_INDEX=1` env gate to `ServiceInitializer.initializeCodebaseIndexing` (commit 048a823). Skipping the 120s-after-boot codebase index prevents the saturation event.
Real fix
Two layers:
Repro
Related