Suggestion: Benchmark database for evaluating MemCube retrieval across memory states

Hi MemOS team,

The three-state memory model (Activation / Plaintext / Parameter) is one of the most ambitious memory architectures I've seen. As the system matures, I think there's a growing need for **standardized benchmarking of cross-state retrieval quality**.

Specifically: when MemScheduler migrates memories between states, how do we verify that the right memories are accessible at the right time? This is hard to evaluate with ad-hoc tests because:
- Activation memory (KV-Cache) has different access patterns than Plaintext memory (vector search)
- The scheduler's migration decisions (what gets promoted/demoted) directly impact retrieval quality
- Parameter memory (LoRA) effectiveness is difficult to measure quantitatively

I've been working on [MemTest](https://github.com/yubingz/memtest), a benchmark database design system for AI memory evaluation. It provides:

- **Controlled test databases** with known ground truth (21,793 memories from Chinese classical literature, 750 queries)
- **6 evaluation dimensions** including temporal retrieval, forgetting curves, and multi-hop reasoning
- **Corpus-driven builder** that generates test data from any text corpus
- **Adapter pattern**: implement 3 methods and get a full evaluation report

For MemOS, benchmarks could help:
- Evaluate whether MemScheduler's state transitions preserve retrieval quality
- Compare retrieval accuracy across Activation vs Plaintext states
- Track regression as the scheduler logic evolves

We found that retrieval method choice matters enormously — TF-IDF + LLM reranking achieves 87% precision vs 9.1% for sentence-transformers on Chinese text, a 10x difference that would be invisible without systematic evaluation.

Would standardized benchmarking align with your roadmap? Happy to discuss how MemTest could adapt to MemOS's unique three-state architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Benchmark database for evaluating MemCube retrieval across memory states #1829

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Suggestion: Benchmark database for evaluating MemCube retrieval across memory states #1829

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions