Skip to content

Add reranking step to search pipeline (local cross-encoder) #618

@bm-clawd

Description

@bm-clawd

Context

Both MemMachine and Backboard use rerankers to boost retrieval quality. MemMachine uses Cohere rerank-v3-5 (cloud) and saw significant accuracy improvements. We should add reranking to BM search.

Proposal: Tiered Reranking

1. Cloud (BM Cloud subscribers)

  • Server-side reranking on Fly.io infrastructure
  • Cohere reranker or self-hosted cross-encoder model
  • Zero config for the user — just works when using Cloud
  • Natural upsell: 'Basic Memory search is already 86% on LoCoMo. Cloud reranking takes it further.'

2. Local via OpenAI-compatible API key

  • User provides an API endpoint (OpenAI, Ollama, LM Studio, any OpenAI-compatible provider)
  • BM sends top-K candidates to the reranking endpoint
  • Config: reranker: { enabled: true, api_base: 'http://localhost:11434', model: 'bge-reranker-v2-m3' }
  • Works with any provider that speaks the OpenAI reranking/embedding protocol

3. No key (default, unchanged)

  • Search works exactly as today — BM25 + vector hybrid
  • No degradation, no API required
  • Pure local, zero dependencies

How Reranking Works

  1. Initial search returns top-K candidates (K=20-50, wider net)
  2. Reranker rescores each candidate against the original query using cross-attention
  3. Return top-N reranked results (N=5-10, what the user asked for)

Cross-encoders process query+document together (unlike bi-encoders which encode separately), capturing fine-grained relevance. They're slower but much more accurate — perfect for reranking a small candidate set.

Models to Evaluate

  • Cohere rerank-v3-5 (cloud, what MemMachine uses)
  • BAAI/bge-reranker-v2-m3 (runs via Ollama/vLLM locally)
  • jinaai/jina-reranker-v2-base-multilingual (good quality/speed balance)
  • cross-encoder/ms-marco-MiniLM-L-6-v2 (fast, small baseline)

Implementation

  • New search parameter: rerank: true (opt-in per query)
  • Or global config: reranker.enabled, reranker.provider, reranker.model, reranker.api_base
  • Cloud users get it automatically
  • Measure impact on LoCoMo benchmark before/after

Evidence

  • MemMachine: reranker is key to their 91% LoCoMo score
  • QMD: uses local LLM reranking via node-llama-cpp GGUF models
  • Cross-encoder reranking typically adds 5-15% to retrieval quality in academic benchmarks

Milestone

v0.19.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions