Hi team! We're Bench'd — an independent benchmark platform for AI memory systems.
We ran CrewAI Memory through LongMemEval (500 questions).
Results
| Benchmark |
Score |
Questions |
Status |
| LongMemEval v1.0 |
46.0% |
500 |
Verified |
Per-dimension: Recall 74.4% · Temporal 35.5% · Reasoning 29.3%
Full results: benchd.ai/system/crewai-memory
The LLM baseline (no memory) scores 57.6%. CrewAI's recall is strong at 74.4% but temporal and reasoning pull the overall score down.
Run it yourself
```bash
pip install benchd-harness
benchd run -a crewai-memory -b longmemeval-v1 --judge --key ./keys/private.key
```
Bench'd — the neutral benchmark standard for AI memory systems.
Hi team! We're Bench'd — an independent benchmark platform for AI memory systems.
We ran CrewAI Memory through LongMemEval (500 questions).
Results
Per-dimension: Recall 74.4% · Temporal 35.5% · Reasoning 29.3%
Full results: benchd.ai/system/crewai-memory
The LLM baseline (no memory) scores 57.6%. CrewAI's recall is strong at 74.4% but temporal and reasoning pull the overall score down.
Run it yourself
```bash
pip install benchd-harness
benchd run -a crewai-memory -b longmemeval-v1 --judge --key ./keys/private.key
```
Bench'd — the neutral benchmark standard for AI memory systems.