Background
Several drift-mitigation techniques require attention-layer hooks that the Anthropic API cannot expose:
- Split-softmax (arXiv:2402.10962) — amplifies attention to system prompt
- SinkTrack (arXiv:2604.10027, ICLR 2026) — attention sink anchoring, "negligible inference overhead"
These would become viable if VirtualMe supported open-weight LLM backends (Llama 4 / Qwen3 / similar via Ollama / vLLM / SGLang).
v0.6 scope
Three big things:
A. Open-weight backend support
- Add
LLMBackend abstraction
- Implement OpenAI-compatible client for Ollama / vLLM
- Add
SinkTrack hook for open-weight models
- Keep Anthropic Claude as the default
B. Letta evaluation
Letta (MemGPT) is a full agent runtime with three-tier memory (core / recall / archival). It's apache 2.0 and may suit VirtualMe's "long-running persona agent" use case better than building tiered memory ourselves. Evaluate migration cost: estimated 2-6 weeks.
C. PSI benchmark integration
arXiv:2502.12109 (PSI) provides a psychometric-grounded benchmark for persona accuracy. Integrating PSI evaluation would let us quantify the marginal benefit of 8-week vs 2-hour interview — currently NOT empirically validated.
D. Graphiti temporal persona tracking
Graphiti (Apache 2.0) implements temporal knowledge graphs with valid_at / invalid_at timestamps. Would let VirtualMe handle "2026-01 said likes skateboarding, 2026-04 said gave it up" → auto-invalidate stale facts. Better than Mem0 for time-varying personas, but Neo4j adds ops overhead.
Acceptance criteria (high-level)
Estimated effort
1+ month. Some items are deliberately conditional on user demand.
Background
Several drift-mitigation techniques require attention-layer hooks that the Anthropic API cannot expose:
These would become viable if VirtualMe supported open-weight LLM backends (Llama 4 / Qwen3 / similar via Ollama / vLLM / SGLang).
v0.6 scope
Three big things:
A. Open-weight backend support
LLMBackendabstractionSinkTrackhook for open-weight modelsB. Letta evaluation
Letta (MemGPT) is a full agent runtime with three-tier memory (core / recall / archival). It's apache 2.0 and may suit VirtualMe's "long-running persona agent" use case better than building tiered memory ourselves. Evaluate migration cost: estimated 2-6 weeks.
C. PSI benchmark integration
arXiv:2502.12109 (PSI) provides a psychometric-grounded benchmark for persona accuracy. Integrating PSI evaluation would let us quantify the marginal benefit of 8-week vs 2-hour interview — currently NOT empirically validated.
D. Graphiti temporal persona tracking
Graphiti (Apache 2.0) implements temporal knowledge graphs with
valid_at/invalid_attimestamps. Would let VirtualMe handle "2026-01 said likes skateboarding, 2026-04 said gave it up" → auto-invalidate stale facts. Better than Mem0 for time-varying personas, but Neo4j adds ops overhead.Acceptance criteria (high-level)
Estimated effort
1+ month. Some items are deliberately conditional on user demand.