Skip to content

Bench'd Independent Benchmark Results for CrewAI Memory #5800

@1vanbui

Description

@1vanbui

Hi team! We're Bench'd — an independent benchmark platform for AI memory systems.

We ran CrewAI Memory through LongMemEval (500 questions).

Results

Benchmark Score Questions Status
LongMemEval v1.0 46.0% 500 Verified

Per-dimension: Recall 74.4% · Temporal 35.5% · Reasoning 29.3%

Full results: benchd.ai/system/crewai-memory

The LLM baseline (no memory) scores 57.6%. CrewAI's recall is strong at 74.4% but temporal and reasoning pull the overall score down.

Run it yourself

```bash
pip install benchd-harness
benchd run -a crewai-memory -b longmemeval-v1 --judge --key ./keys/private.key
```


Bench'd — the neutral benchmark standard for AI memory systems.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions