-
Notifications
You must be signed in to change notification settings - Fork 173
Open
Description
Context
Both MemMachine and Backboard use rerankers to boost retrieval quality. MemMachine uses Cohere rerank-v3-5 (cloud) and saw significant accuracy improvements. We should add reranking to BM search.
Proposal: Tiered Reranking
1. Cloud (BM Cloud subscribers)
- Server-side reranking on Fly.io infrastructure
- Cohere reranker or self-hosted cross-encoder model
- Zero config for the user — just works when using Cloud
- Natural upsell: 'Basic Memory search is already 86% on LoCoMo. Cloud reranking takes it further.'
2. Local via OpenAI-compatible API key
- User provides an API endpoint (OpenAI, Ollama, LM Studio, any OpenAI-compatible provider)
- BM sends top-K candidates to the reranking endpoint
- Config:
reranker: { enabled: true, api_base: 'http://localhost:11434', model: 'bge-reranker-v2-m3' } - Works with any provider that speaks the OpenAI reranking/embedding protocol
3. No key (default, unchanged)
- Search works exactly as today — BM25 + vector hybrid
- No degradation, no API required
- Pure local, zero dependencies
How Reranking Works
- Initial search returns top-K candidates (K=20-50, wider net)
- Reranker rescores each candidate against the original query using cross-attention
- Return top-N reranked results (N=5-10, what the user asked for)
Cross-encoders process query+document together (unlike bi-encoders which encode separately), capturing fine-grained relevance. They're slower but much more accurate — perfect for reranking a small candidate set.
Models to Evaluate
- Cohere rerank-v3-5 (cloud, what MemMachine uses)
BAAI/bge-reranker-v2-m3(runs via Ollama/vLLM locally)jinaai/jina-reranker-v2-base-multilingual(good quality/speed balance)cross-encoder/ms-marco-MiniLM-L-6-v2(fast, small baseline)
Implementation
- New search parameter:
rerank: true(opt-in per query) - Or global config:
reranker.enabled,reranker.provider,reranker.model,reranker.api_base - Cloud users get it automatically
- Measure impact on LoCoMo benchmark before/after
Evidence
- MemMachine: reranker is key to their 91% LoCoMo score
- QMD: uses local LLM reranking via node-llama-cpp GGUF models
- Cross-encoder reranking typically adds 5-15% to retrieval quality in academic benchmarks
Milestone
v0.19.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels