Skip to content

Fix: ENG-1785 — apply composite ranker to manual recall (parity with non-manual)#185

Merged
hungtranphamminh merged 1 commit into
devfrom
feature/eng-1785-manual-recall-ranking
May 22, 2026
Merged

Fix: ENG-1785 — apply composite ranker to manual recall (parity with non-manual)#185
hungtranphamminh merged 1 commit into
devfrom
feature/eng-1785-manual-recall-ranking

Conversation

@hungtranphamminh
Copy link
Copy Markdown
Collaborator

@hungtranphamminh hungtranphamminh commented May 22, 2026

Summary

Why

The three recall paths returned different orderings for the same query:

  • /api/recall and /api/ask apply the CompositeRanker (recency + importance signals, opt-in via scoring_weights) — they run search_similar → hydrate → rank → return.
  • /api/recall/manual returned raw pgvector cosine order — it exited right after search_similar, never applying the ranker. It also validated scoring_weights and then silently ignored them.

Before the composite ranker existed, this was invisible: all paths were pure cosine order, so they matched. Once the ranker shipped, any caller passing scoring_weights got reordered results from /api/recall and /api/ask but unranked results from /api/recall/manual — the same query, different order. (ENG-1785.)

What

Manual recall now applies the same ranker as the other two paths, while keeping its lightweight contract — it ranks without hydrating (no Walrus fetch, no SEAL decrypt) and still returns (blob_id, distance, …) for the client to hydrate itself.

Solution

Theory. The composite ranker only needs three fields — distance, created_at, importance — and all three live on the SearchHit returned by the vector search. Decryption only produces the memory text, which the ranker never reads. So manual recall can apply the identical ranking on the SearchHit data alone, before (and without) any hydration.

Reuse, don't re-implement. Rather than write a second scoring function over SearchHit (which would risk drifting from the real ranker — the exact bug class this fixes), the new rank_search_hits helper maps each SearchHit into a throwaway HydratedMemory carrying only the ranked fields (empty text), calls the same Ranker::rank, then reassembles the original SearchHits in the ranked order. One ordering implementation, shared by all three paths.

Index-based reassembly (not blob_id-keyed). blob_id is not unique — vector_entries has no UNIQUE constraint on it, search_similar does not SELECT DISTINCT, and restore can insert multiple rows with the same blob_id. Reassembling by blob_id would collapse duplicates, silently dropping hits and reordering them — re-introducing the very divergence this fixes (the hydrating paths keep duplicates 1:1). Instead each hit's input index is carried through the ranker's opaque blob_id slot and used to reorder, so no hit is ever dropped and the result count always equals the search-hit count.

Backward compatible. At default weights the ranker short-circuits, so the pgvector cosine order is returned unchanged — existing callers are unaffected. The response wire shape is unchanged (Vec<SearchHit>); only the order changes when scoring_weights are set. recall_manual now validates scoring_weights up front (400 on malformed) exactly like recall, and the weights actually apply.

Technical change

Area Change
services/server/src/types.rs RecallManualRequest gains optional scoring_weights; SearchHit derives Clone
services/server/src/routes/recall.rs New rank_search_hits helper; recall_manual validates weights + applies the shared ranker; total computed from the ranked result count

No schema change, no migration, no new dependency, no change to the retrieval / storage / decrypt paths.

Types of Changes

  • Breaking change
  • New feature
  • Bug fix (non-breaking change which fixes an issue)
  • Performance optimization
  • Refactor
  • Library update
  • Documentation
  • Test
  • Security awareness

Testing

  • I have tested this code locally
  • I have added/updated unit tests
  • I have added/updated integration tests
  • I have tested in multiple browsers (if applicable)

Full server suite passes (236/236); clippy clean on the changed files. New recall tests cover:

  • manual ≡ non-manual ordering parity under importance-heavy, recency-heavy, and combined (all-three-signals) weights
  • default weights preserve cosine order (no-op / backward compatibility)
  • duplicate-blob_id hits are not dropped (default + active weights)
  • an 8-item non-trivial permutation round-trips exactly (exercises the index reassembly)
  • empty hits, single hit, field preservation

A follow-up end-to-end smoke (live /api/recall vs /api/recall/manual with matching weights) is recommended before merge to confirm at the handler level; the ordering logic itself is fully unit-covered. The retrieval-quality benchmarks (LOCOMO / LongMemEval) are not applicable — they exercise only /api/recall at default weights, where this change is a no-op.

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

Related Issues

Additional Notes

  • Reviewed via a multi-agent deep review (ordering parity / code+test integrity / security). The review caught a duplicate-blob_id correctness issue in an earlier blob_id-keyed approach; the index-based reassembly above is the fix, with a regression test pinning it.
  • The MEM-57 pre-extraction dedup retrieval also calls search_similar but is intentionally not ranked — it's an internal dedup-context read for the extractor prompt, never returned to a caller as recall results.

…non-manual)

`/api/recall/manual` returned raw pgvector cosine order while `/api/recall`
and `/api/ask` applied the CompositeRanker (recency + importance, opt-in via
scoring_weights), so the same query + weights gave different orderings across
endpoints. Manual recall also validated scoring_weights and then ignored them.

Manual recall now applies the same ranker, keeping its lightweight contract:
it ranks on the SearchHit fields directly (distance / created_at / importance,
all present pre-decrypt) and still returns blob ids + distances WITHOUT a
Walrus fetch or SEAL decrypt. All three recall paths now share one ordering
logic and agree for the same query + weights.

- New `rank_search_hits` reuses the exact `Ranker::rank` the hydrating paths
  use (no re-implementation of scoring on SearchHit — that would risk drift).
- Reorder is index-based, not blob_id-keyed: blob_id is not unique
  (search_similar has no DISTINCT; restore can produce duplicate-blob_id rows),
  so a blob_id-keyed round-trip would collapse duplicates and drop hits.
- recall_manual validates scoring_weights up front (400 on malformed) like recall.
- Default weights short-circuit → cosine order unchanged → existing callers
  unaffected. Wire shape unchanged (Vec<SearchHit>); only order changes.

Tests: 236/236. New recall tests cover manual≡non-manual parity (importance /
recency / combined weights), default no-op, duplicate-blob_id no-drop, an
8-item permutation round-trip, and empty/single/field-preservation cases.

Closes ENG-1785.
@hungtranphamminh
Copy link
Copy Markdown
Collaborator Author

E2E smoke passed — verified at the handler level against a live benchmark-mode server (this branch's build).

Ingested 5 memories with varied content (so importance buckets differ), embedded a query with the same model the server uses, then called /api/recall (by query text) and /api/recall/manual (by query vector) with matching weights:

  • default weights: both endpoints returned the same 8 memories in the same (cosine) order ✅
  • importance_heavy weights: both endpoints returned the same reordered order ✅ — and that order differs from the default-weights order, confirming the ranker actually reorders and manual now follows it (not a coincidental match).

This reproduces the reporter's exact scenario (same query + weights → previously manual gave cosine order while non-manual reordered) and confirms parity end-to-end, not just at the unit level. Counts matched (8 = 8) on both runs, so no hit dropped.

@hungtranphamminh hungtranphamminh merged commit ab43f09 into dev May 22, 2026
8 checks passed
@hungtranphamminh hungtranphamminh deleted the feature/eng-1785-manual-recall-ranking branch May 22, 2026 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants