fix: ingestion pipeline — ghost recursion, double-ingest, backlog races by buildingjoshbetter · Pull Request #315 · buildingjoshbetter/TrueMemory

buildingjoshbetter · 2026-05-13T19:21:57Z

Summary

Fixes critical ingestion pipeline bugs that caused:

Ghost session recursion: claude -p extraction calls created infinite feedback loops, burning 56% of a Claude 20x Max subscription in <2 hours
Double/triple ingestion: Same transcript extracted 2-3x by Stop + Compact + UserPromptSubmit triggers
Backlog race conditions: Three drainers could process the same marker simultaneously
Silent crash data loss: Stderr went to /dev/null in drain paths

Changes

TRUEMEMORY_EXTRACTION env var — Set in models.py for claude -p calls; all hooks + MCP drainer bail on it
Transcript-size idempotency — Per-session markers in ~/.truememory/extracted/ track file size at last extraction; skip if unchanged
Disable UserPromptSubmit extraction — Violated one-session-one-ingest rule; had 3 bugs (global marker, missing marker loop, TOCTOU race)
Marker deletion inside flock — Moved unlink() inside spawn_gate() in 3 drain paths
Stderr to log files — Drain paths now log to ~/.truememory/logs/ instead of DEVNULL
Dynamic cap in log messages — Use _load_cap_state() instead of static SPAWN_CAP=2
Path traversal defense — Sanitize session_id + guard empty transcript_path in _shared.py

Test plan

Full pytest suite passes (611 passed, 0 failed)
Ruff lint passes (0 errors)
Spawn gate intact in all 5 Popen files
Ghost recursion: TRUEMEMORY_EXTRACTION checked in all hooks + MCP drainer
Idempotency: should_extract_session checked in Stop + Compact
No interval=0 in compact.py
No _run_background_ingestion in user_prompt_submit.py
marker_path.unlink at same indent as register_spawned_pid in 3 drain files
4-model OpenRouter consensus: 3/4 APPROVE (Codex non-responsive)

claude -p extraction calls created full Claude Code sessions that triggered all hooks recursively, spawning infinite MCP servers and drainers. Each extraction call burned API quota. A single session close could spawn 18+ ghost sessions in 9 minutes. Fix: models.py sets TRUEMEMORY_EXTRACTION=1 in the claude -p env. All 4 hooks (stop, session_start, user_prompt_submit, compact) bail immediately on this var. MCP server drainer skips starting when the var is set.

Stop, Compact, and UserPromptSubmit could all independently extract the same transcript. A long session could be extracted 2-3 times, each time processing the ENTIRE transcript from message 1 (no delta mode), wasting LLM API calls. Fix: per-session markers in ~/.truememory/extracted/<session_id> track the transcript file size at last extraction. Triggers skip extraction if the file hasn't grown by >1KB. The ingest CLI writes the authoritative marker on successful completion.

Mid-session extraction violated one-session-one-ingest rule and had 3 bugs: global marker blocking cross-session extraction, missing marker causing fire-on-every-prompt loop, and TOCTOU race. Extraction now happens only on Stop (session close) and Compact (context compression), both gated by per-session transcript-size idempotency.

marker_path.unlink() was outside the with spawn_gate() block in all 3 drain paths (session_start.py, mcp_server.py, cli.py). Two concurrent drainers could both read the same marker, both spawn ingest, then both delete it. Moving unlink inside the flock makes the read→spawn→delete sequence atomic.

SessionStart and MCP drainer sent ingest stderr to /dev/null. If the ingest process crashed, there was no log, no trace. Now stderr goes to ~/.truememory/logs/<session_id>.log for diagnosability.

Log warnings and backlog reason strings reported static SPAWN_CAP=2 even when the dynamic cap was 1 or 5. Now reads the persisted cap state from _load_cap_state() for accurate diagnostics without triggering subprocess calls in test environments.

should_extract_session and mark_session_extracted used unsanitized session_id as filesystem path component. Added _safe_session_id() (alphanumeric + dash/underscore, max 64 chars) for defense-in-depth. Hooks already sanitize, but the shared layer should too.

1. Sanitize session_id in log file paths (session_start.py, mcp_server.py) using _safe_session_id() — same as EXTRACTED_DIR markers. 2. Return True in should_extract_session() when transcript shrinks (file truncation/rotation), not just when it grows. 3. Remove dead should_extract()/mark_extracted() functions and their constants from _shared.py — no callers remain after PR #315.

buildingjoshbetter added 7 commits May 13, 2026 13:25

fix: redirect backlog drain stderr to log files instead of DEVNULL

5467f1d

SessionStart and MCP drainer sent ingest stderr to /dev/null. If the ingest process crashed, there was no log, no trace. Now stderr goes to ~/.truememory/logs/<session_id>.log for diagnosability.

This was referenced May 14, 2026

fix: stale session scanner — long sessions with hard kill lose memories #320

Closed

chore: EXTRACTED_DIR marker cleanup + log rotation + dead code removal #321

Closed

buildingjoshbetter merged commit d9a932f into main May 15, 2026
14 checks passed

buildingjoshbetter deleted the fix/ingestion-pipeline-v3 branch May 21, 2026 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ingestion pipeline — ghost recursion, double-ingest, backlog races#315

fix: ingestion pipeline — ghost recursion, double-ingest, backlog races#315
buildingjoshbetter merged 8 commits into
mainfrom
fix/ingestion-pipeline-v3

buildingjoshbetter commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buildingjoshbetter commented May 13, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant