You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inside this subprocess, the EncodingGate._compute_prediction_error() method loads the full Qwen3-Embedding-0.6B model (1.5GB) just to compute 4 embeddings per fact:
# truememory/ingest/encoding_gate.py:417-420ifself._embed_modelisNone:
fromtruememory.vector_searchimportget_modelself._embed_model=get_model() # Loads 1.5GB Qwen3 into this subprocess
With SPAWN_CAP=2, up to 2 concurrent ingest processes can each load a 1.5GB model = 3 GB of duplicated weights that already exist in the running MCP server processes.
What prediction error does
PE is Signal 3 in the encoding gate (weight 0.30). It embeds (fact, nearest_memory) as a pair and compares to a (memory, memory) self-pair to detect when a new fact contradicts or updates existing knowledge.
Gate AUC WITH PE: 0.816
Gate AUC WITHOUT PE (novelty + salience only): ~0.75
PE uses the embedding model for 4 model.encode() calls per fact
Novelty (Signal 1) uses gzip compression — no model needed
Salience (Signal 2) uses rule-based scoring — no model needed
Proposed solutions (pick one)
Option A: Skip PE in ingest (simplest, 5 lines)
Add env var TRUEMEMORY_GATE_NO_PE=1 that the stop hook sets when spawning ingest:
# encoding_gate.py:_compute_prediction_errordef_compute_prediction_error(self, fact: str) ->float:
ifos.environ.get("TRUEMEMORY_GATE_NO_PE", "") =="1":
return0.0# Skip — no model needed# ... existing PE logic ...
# stop.py:_run_background_ingestion — add to envenv=os.environ.copy()
env["TRUEMEMORY_GATE_NO_PE"] ="1"subprocess.Popen(cmd, env=env, ...)
Savings: 1.5 GB per ingest process (model never loads) Cost: Gate AUC drops from 0.816 to ~0.75 (still effective for cold-path filtering)
Option B: Ingest calls running MCP server for embeddings
The MCP server is already alive with the model loaded. Instead of loading a second copy, the ingest process could call back to the MCP server for embeddings.
Challenge: MCP uses stdio transport (JSON-RPC over stdin/stdout). The ingest process can't easily call it. Would need:
Savings: 1.47 GB per ingest process (30MB vs 1500MB) Cost: PE signal slightly less accurate with simpler embeddings
Recommendation
Option A for now (zero-model ingest). The 0.066 AUC drop is acceptable for a cold-path filter — facts that slip through will still be deduplicated. When #335 (model server) lands, switch to Option B.
Problem
The ingest pipeline spawns as a separate subprocess (via
stop.py:343):Inside this subprocess, the
EncodingGate._compute_prediction_error()method loads the full Qwen3-Embedding-0.6B model (1.5GB) just to compute 4 embeddings per fact:With
SPAWN_CAP=2, up to 2 concurrent ingest processes can each load a 1.5GB model = 3 GB of duplicated weights that already exist in the running MCP server processes.What prediction error does
PE is Signal 3 in the encoding gate (weight 0.30). It embeds
(fact, nearest_memory)as a pair and compares to a(memory, memory)self-pair to detect when a new fact contradicts or updates existing knowledge.model.encode()calls per factProposed solutions (pick one)
Option A: Skip PE in ingest (simplest, 5 lines)
Add env var
TRUEMEMORY_GATE_NO_PE=1that the stop hook sets when spawning ingest:Savings: 1.5 GB per ingest process (model never loads)
Cost: Gate AUC drops from 0.816 to ~0.75 (still effective for cold-path filtering)
Option B: Ingest calls running MCP server for embeddings
The MCP server is already alive with the model loaded. Instead of loading a second copy, the ingest process could call back to the MCP server for embeddings.
Challenge: MCP uses stdio transport (JSON-RPC over stdin/stdout). The ingest process can't easily call it. Would need:
This is the "right" solution but depends on #335 (shared model server).
Option C: Use a lighter model for PE in ingest only
Load Model2Vec (30MB) instead of Qwen3 (1.5GB) for PE scoring in ingest. PE accuracy degrades slightly but the signal is still useful.
Savings: 1.47 GB per ingest process (30MB vs 1500MB)
Cost: PE signal slightly less accurate with simpler embeddings
Recommendation
Option A for now (zero-model ingest). The 0.066 AUC drop is acceptable for a cold-path filter — facts that slip through will still be deduplicated. When #335 (model server) lands, switch to Option B.
Relates to