Ingest process loads 1.5GB model for prediction error when it could be skipped or remote

## Problem

The ingest pipeline spawns as a separate subprocess (via `stop.py:343`):
```python
cmd = [sys.executable, "-m", "truememory.ingest.cli", "ingest", transcript_path]
subprocess.Popen(cmd, start_new_session=True, ...)
```

Inside this subprocess, the `EncodingGate._compute_prediction_error()` method loads the full Qwen3-Embedding-0.6B model (1.5GB) just to compute 4 embeddings per fact:

```python
# truememory/ingest/encoding_gate.py:417-420
if self._embed_model is None:
    from truememory.vector_search import get_model
    self._embed_model = get_model()  # Loads 1.5GB Qwen3 into this subprocess
```

With `SPAWN_CAP=2`, up to 2 concurrent ingest processes can each load a 1.5GB model = **3 GB** of duplicated weights that already exist in the running MCP server processes.

## What prediction error does

PE is Signal 3 in the encoding gate (weight 0.30). It embeds `(fact, nearest_memory)` as a pair and compares to a `(memory, memory)` self-pair to detect when a new fact contradicts or updates existing knowledge.

- Gate AUC WITH PE: 0.816
- Gate AUC WITHOUT PE (novelty + salience only): ~0.75
- PE uses the embedding model for 4 `model.encode()` calls per fact
- Novelty (Signal 1) uses gzip compression — no model needed
- Salience (Signal 2) uses rule-based scoring — no model needed

## Proposed solutions (pick one)

### Option A: Skip PE in ingest (simplest, 5 lines)

Add env var `TRUEMEMORY_GATE_NO_PE=1` that the stop hook sets when spawning ingest:

```python
# encoding_gate.py:_compute_prediction_error
def _compute_prediction_error(self, fact: str) -> float:
    if os.environ.get("TRUEMEMORY_GATE_NO_PE", "") == "1":
        return 0.0  # Skip — no model needed
    # ... existing PE logic ...
```

```python
# stop.py:_run_background_ingestion — add to env
env = os.environ.copy()
env["TRUEMEMORY_GATE_NO_PE"] = "1"
subprocess.Popen(cmd, env=env, ...)
```

**Savings**: 1.5 GB per ingest process (model never loads)
**Cost**: Gate AUC drops from 0.816 to ~0.75 (still effective for cold-path filtering)

### Option B: Ingest calls running MCP server for embeddings

The MCP server is already alive with the model loaded. Instead of loading a second copy, the ingest process could call back to the MCP server for embeddings.

**Challenge**: MCP uses stdio transport (JSON-RPC over stdin/stdout). The ingest process can't easily call it. Would need:
1. A sidecar UDS endpoint on the MCP server (part of #335 model server proposal)
2. OR a lightweight HTTP endpoint on the MCP server

This is the "right" solution but depends on #335 (shared model server).

### Option C: Use a lighter model for PE in ingest only

Load Model2Vec (30MB) instead of Qwen3 (1.5GB) for PE scoring in ingest. PE accuracy degrades slightly but the signal is still useful.

```python
# encoding_gate.py — ingest-specific lighter model
if os.environ.get("TRUEMEMORY_INGEST_LIGHT_PE", ""):
    from model2vec import StaticModel
    self._embed_model = StaticModel.from_pretrained("minishlab/potion-base-8M")
```

**Savings**: 1.47 GB per ingest process (30MB vs 1500MB)
**Cost**: PE signal slightly less accurate with simpler embeddings

## Recommendation

**Option A for now** (zero-model ingest). The 0.066 AUC drop is acceptable for a cold-path filter — facts that slip through will still be deduplicated. When #335 (model server) lands, switch to Option B.

## Relates to

- #333 (core duplication problem)
- #335 (model server — enables Option B)
- #336 (lazy loading — complementary)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingest process loads 1.5GB model for prediction error when it could be skipped or remote #337

Problem

What prediction error does

Proposed solutions (pick one)

Option A: Skip PE in ingest (simplest, 5 lines)

Option B: Ingest calls running MCP server for embeddings

Option C: Use a lighter model for PE in ingest only

Recommendation

Relates to

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Ingest process loads 1.5GB model for prediction error when it could be skipped or remote #337

Description

Problem

What prediction error does

Proposed solutions (pick one)

Option A: Skip PE in ingest (simplest, 5 lines)

Option B: Ingest calls running MCP server for embeddings

Option C: Use a lighter model for PE in ingest only

Recommendation

Relates to

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions