Skip to content

Suggestion: Benchmark database for evaluating MemCube retrieval across memory states #1829

@yubingz

Description

@yubingz

Hi MemOS team,

The three-state memory model (Activation / Plaintext / Parameter) is one of the most ambitious memory architectures I've seen. As the system matures, I think there's a growing need for standardized benchmarking of cross-state retrieval quality.

Specifically: when MemScheduler migrates memories between states, how do we verify that the right memories are accessible at the right time? This is hard to evaluate with ad-hoc tests because:

  • Activation memory (KV-Cache) has different access patterns than Plaintext memory (vector search)
  • The scheduler's migration decisions (what gets promoted/demoted) directly impact retrieval quality
  • Parameter memory (LoRA) effectiveness is difficult to measure quantitatively

I've been working on MemTest, a benchmark database design system for AI memory evaluation. It provides:

  • Controlled test databases with known ground truth (21,793 memories from Chinese classical literature, 750 queries)
  • 6 evaluation dimensions including temporal retrieval, forgetting curves, and multi-hop reasoning
  • Corpus-driven builder that generates test data from any text corpus
  • Adapter pattern: implement 3 methods and get a full evaluation report

For MemOS, benchmarks could help:

  • Evaluate whether MemScheduler's state transitions preserve retrieval quality
  • Compare retrieval accuracy across Activation vs Plaintext states
  • Track regression as the scheduler logic evolves

We found that retrieval method choice matters enormously — TF-IDF + LLM reranking achieves 87% precision vs 9.1% for sentence-transformers on Chinese text, a 10x difference that would be invisible without systematic evaluation.

Would standardized benchmarking align with your roadmap? Happy to discuss how MemTest could adapt to MemOS's unique three-state architecture.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ai-taskAutoDev taskpluginPlugin/adapter/bridge layer (apps/ directory)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions