v0.6 roadmap: open-weight model support (SinkTrack + Letta + PSI benchmark)

## Background
Several drift-mitigation techniques require attention-layer hooks that the Anthropic API cannot expose:

- **Split-softmax** ([arXiv:2402.10962](https://arxiv.org/abs/2402.10962)) — amplifies attention to system prompt
- **SinkTrack** ([arXiv:2604.10027](https://arxiv.org/abs/2604.10027), ICLR 2026) — attention sink anchoring, "negligible inference overhead"

These would become viable if VirtualMe supported open-weight LLM backends (Llama 4 / Qwen3 / similar via Ollama / vLLM / SGLang). 

## v0.6 scope
Three big things:

### A. Open-weight backend support
- Add `LLMBackend` abstraction
- Implement OpenAI-compatible client for Ollama / vLLM
- Add `SinkTrack` hook for open-weight models
- Keep Anthropic Claude as the default

### B. Letta evaluation
[Letta (MemGPT)](https://www.letta.com/) is a full agent runtime with three-tier memory (core / recall / archival). It's apache 2.0 and may suit VirtualMe's "long-running persona agent" use case better than building tiered memory ourselves. Evaluate migration cost: estimated 2-6 weeks.

### C. PSI benchmark integration
[arXiv:2502.12109 (PSI)](https://arxiv.org/abs/2502.12109) provides a psychometric-grounded benchmark for persona accuracy. Integrating PSI evaluation would let us quantify the marginal benefit of 8-week vs 2-hour interview — currently NOT empirically validated.

### D. Graphiti temporal persona tracking
[Graphiti](https://github.com/getzep/graphiti) (Apache 2.0) implements temporal knowledge graphs with `valid_at` / `invalid_at` timestamps. Would let VirtualMe handle "2026-01 said likes skateboarding, 2026-04 said gave it up" → auto-invalidate stale facts. Better than Mem0 for time-varying personas, but Neo4j adds ops overhead.

## Acceptance criteria (high-level)

- [ ] Open-weight backend documented and runnable
- [ ] SinkTrack hook implemented for at least one open-weight model
- [ ] PSI benchmark runs as part of blind test protocol
- [ ] Decision: Letta migration accepted / rejected (with rationale)
- [ ] Decision: Graphiti integration accepted / rejected (with rationale)

## Estimated effort
1+ month. Some items are deliberately conditional on user demand.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6 roadmap: open-weight model support (SinkTrack + Letta + PSI benchmark) #7

Background

v0.6 scope

A. Open-weight backend support

B. Letta evaluation

C. PSI benchmark integration

D. Graphiti temporal persona tracking

Acceptance criteria (high-level)

Estimated effort

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

v0.6 roadmap: open-weight model support (SinkTrack + Letta + PSI benchmark) #7

Description

Background

v0.6 scope

A. Open-weight backend support

B. Letta evaluation

C. PSI benchmark integration

D. Graphiti temporal persona tracking

Acceptance criteria (high-level)

Estimated effort

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions