feat(benchmarks): competitive benchmarks + multi-client support by George-iam · Pull Request #103 · AxmeAI/axme-code

George-iam · 2026-04-13T20:33:24Z

Summary

Adds a full competitive benchmark suite comparing AXME Code against 5 memory systems (MemPalace, Mastra, Zep, Mem0, Supermemory), plus multi-client support documentation.

Results:

ToolEmu (safety): 100.00% accuracy, 0.00% FPR on 90 scenarios across 12 categories
LongMemEval: 89.20% E2E + 97.80% R@5 on 500 questions (beats MemPalace 96.60% R@5)
Feature matrix: AXME 9/9 capabilities, 5 unique (decisions, safety hooks, handoff, oracle, multi-repo)

Positioning: AXME leads on features, safety, and retrieval quality. LongMemEval E2E places strong 2nd — ahead of Supermemory (85.4%), Mastra on gpt-4o (84.2%), Zep (71.2%); behind only Mastra on gpt-5-mini (94.87%).

What's in this PR

benchmarks/ (new, self-contained)

Separate package.json with its own deps — zero impact on product
lib/search.ts — MiniLM-L6-v2 + HNSW (shared)
longmemeval/ — adapter + runner with type-aware top-K (multi-session=50, temporal/knowledge-update=20), type-aware prompts, checkpoint/resume every 10 questions
toolemu/ — 90 scenarios across 12 categories
README.md — single source of truth: comparison table + per-benchmark details + reproduction

docs/MULTI_CLIENT.md (new)

Setup instructions for Cursor, Windsurf, Cline, Claude Desktop, and generic MCP clients. Hooks remain Claude Code-specific.

Main README updates

Replaces summary Competitive Benchmarks block with full Comparison table (capabilities + benchmarks × 6 products)
Collapses Telemetry into <details>
Removes internal Releasing section
Removes footer microcopy
Architecture diagram switched to dark theme

Test plan

ToolEmu passes 100% (90/90)
LongMemEval full 500 completed with checkpoint/resume (89.20% E2E, 97.80% R@5)
tsx loads run.ts without errors
No secrets in code (only process.env.ANTHROPIC_API_KEY + docs placeholders)
results/*.json and data/*.json gitignored
Dead code removed (entity-extractor, failed reflector experiments)
Architecture diagram regenerated with dark theme

🤖 Generated with Claude Code

Adds full benchmark suite in benchmarks/ comparing AXME Code against 5 memory systems (MemPalace, Mastra, Zep, Mem0, Supermemory): - ToolEmu safety (100% accuracy, 0% FPR on 90 scenarios across 12 categories) - LongMemEval E2E 89.20% + R@5 97.80% on 500 questions (Sonnet 4.6 reader + judge) - Feature matrix 9/9 capabilities, 5 unique to AXME Results: AXME leads on features, safety, and retrieval quality (R@5 beats MemPalace 96.60%). LongMemEval E2E places strong 2nd — ahead of Supermemory (85.4%), Mastra on gpt-4o (84.2%), Zep (71.2%); below Mastra on gpt-5-mini (94.87%). Pipeline: MiniLM-L6-v2 + HNSW vector search + type-aware top-K + type-aware reader prompts + checkpoint/resume every 10 questions. Fully self-contained in benchmarks/ with its own package.json — zero changes to product src/. Also adds docs/MULTI_CLIENT.md documenting setup for Cursor, Windsurf, Cline, Claude Desktop, and generic MCP clients (hooks remain Claude Code-specific). README: replaces summary Competitive Benchmarks block with full Comparison table; collapses Telemetry into <details>; removes internal Releasing section and footer microcopy; architecture diagram switched to dark theme. #!axme pr=none repo=AxmeAI/axme-code

George-iam merged commit 2ffe50a into main Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(benchmarks): competitive benchmarks + multi-client support#103

feat(benchmarks): competitive benchmarks + multi-client support#103
George-iam merged 1 commit intomainfrom
feat/benchmarks-20260413

George-iam commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

George-iam commented Apr 13, 2026

Summary

What's in this PR

benchmarks/ (new, self-contained)

docs/MULTI_CLIENT.md (new)

Main README updates

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant