Skip to content

v0.6 roadmap: open-weight model support (SinkTrack + Letta + PSI benchmark) #7

@MakiDevelop

Description

@MakiDevelop

Background

Several drift-mitigation techniques require attention-layer hooks that the Anthropic API cannot expose:

  • Split-softmax (arXiv:2402.10962) — amplifies attention to system prompt
  • SinkTrack (arXiv:2604.10027, ICLR 2026) — attention sink anchoring, "negligible inference overhead"

These would become viable if VirtualMe supported open-weight LLM backends (Llama 4 / Qwen3 / similar via Ollama / vLLM / SGLang).

v0.6 scope

Three big things:

A. Open-weight backend support

  • Add LLMBackend abstraction
  • Implement OpenAI-compatible client for Ollama / vLLM
  • Add SinkTrack hook for open-weight models
  • Keep Anthropic Claude as the default

B. Letta evaluation

Letta (MemGPT) is a full agent runtime with three-tier memory (core / recall / archival). It's apache 2.0 and may suit VirtualMe's "long-running persona agent" use case better than building tiered memory ourselves. Evaluate migration cost: estimated 2-6 weeks.

C. PSI benchmark integration

arXiv:2502.12109 (PSI) provides a psychometric-grounded benchmark for persona accuracy. Integrating PSI evaluation would let us quantify the marginal benefit of 8-week vs 2-hour interview — currently NOT empirically validated.

D. Graphiti temporal persona tracking

Graphiti (Apache 2.0) implements temporal knowledge graphs with valid_at / invalid_at timestamps. Would let VirtualMe handle "2026-01 said likes skateboarding, 2026-04 said gave it up" → auto-invalidate stale facts. Better than Mem0 for time-varying personas, but Neo4j adds ops overhead.

Acceptance criteria (high-level)

  • Open-weight backend documented and runnable
  • SinkTrack hook implemented for at least one open-weight model
  • PSI benchmark runs as part of blind test protocol
  • Decision: Letta migration accepted / rejected (with rationale)
  • Decision: Graphiti integration accepted / rejected (with rationale)

Estimated effort

1+ month. Some items are deliberately conditional on user demand.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions