Fix: ENG-1785 — apply composite ranker to manual recall (parity with non-manual)#185
Merged
Merged
Conversation
…non-manual) `/api/recall/manual` returned raw pgvector cosine order while `/api/recall` and `/api/ask` applied the CompositeRanker (recency + importance, opt-in via scoring_weights), so the same query + weights gave different orderings across endpoints. Manual recall also validated scoring_weights and then ignored them. Manual recall now applies the same ranker, keeping its lightweight contract: it ranks on the SearchHit fields directly (distance / created_at / importance, all present pre-decrypt) and still returns blob ids + distances WITHOUT a Walrus fetch or SEAL decrypt. All three recall paths now share one ordering logic and agree for the same query + weights. - New `rank_search_hits` reuses the exact `Ranker::rank` the hydrating paths use (no re-implementation of scoring on SearchHit — that would risk drift). - Reorder is index-based, not blob_id-keyed: blob_id is not unique (search_similar has no DISTINCT; restore can produce duplicate-blob_id rows), so a blob_id-keyed round-trip would collapse duplicates and drop hits. - recall_manual validates scoring_weights up front (400 on malformed) like recall. - Default weights short-circuit → cosine order unchanged → existing callers unaffected. Wire shape unchanged (Vec<SearchHit>); only order changes. Tests: 236/236. New recall tests cover manual≡non-manual parity (importance / recency / combined weights), default no-op, duplicate-blob_id no-drop, an 8-item permutation round-trip, and empty/single/field-preservation cases. Closes ENG-1785.
Collaborator
Author
|
✅ E2E smoke passed — verified at the handler level against a live benchmark-mode server (this branch's build). Ingested 5 memories with varied content (so importance buckets differ), embedded a query with the same model the server uses, then called
This reproduces the reporter's exact scenario (same query + weights → previously manual gave cosine order while non-manual reordered) and confirms parity end-to-end, not just at the unit level. Counts matched (8 = 8) on both runs, so no hit dropped. |
ducnmm
approved these changes
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why
The three recall paths returned different orderings for the same query:
/api/recalland/api/askapply theCompositeRanker(recency + importance signals, opt-in viascoring_weights) — they runsearch_similar→ hydrate → rank → return./api/recall/manualreturned raw pgvector cosine order — it exited right aftersearch_similar, never applying the ranker. It also validatedscoring_weightsand then silently ignored them.Before the composite ranker existed, this was invisible: all paths were pure cosine order, so they matched. Once the ranker shipped, any caller passing
scoring_weightsgot reordered results from/api/recalland/api/askbut unranked results from/api/recall/manual— the same query, different order. (ENG-1785.)What
Manual recall now applies the same ranker as the other two paths, while keeping its lightweight contract — it ranks without hydrating (no Walrus fetch, no SEAL decrypt) and still returns
(blob_id, distance, …)for the client to hydrate itself.Solution
Theory. The composite ranker only needs three fields —
distance,created_at,importance— and all three live on theSearchHitreturned by the vector search. Decryption only produces the memory text, which the ranker never reads. So manual recall can apply the identical ranking on theSearchHitdata alone, before (and without) any hydration.Reuse, don't re-implement. Rather than write a second scoring function over
SearchHit(which would risk drifting from the real ranker — the exact bug class this fixes), the newrank_search_hitshelper maps eachSearchHitinto a throwawayHydratedMemorycarrying only the ranked fields (emptytext), calls the sameRanker::rank, then reassembles the originalSearchHits in the ranked order. One ordering implementation, shared by all three paths.Index-based reassembly (not blob_id-keyed).
blob_idis not unique —vector_entrieshas no UNIQUE constraint on it,search_similardoes notSELECT DISTINCT, andrestorecan insert multiple rows with the sameblob_id. Reassembling byblob_idwould collapse duplicates, silently dropping hits and reordering them — re-introducing the very divergence this fixes (the hydrating paths keep duplicates 1:1). Instead each hit's input index is carried through the ranker's opaqueblob_idslot and used to reorder, so no hit is ever dropped and the result count always equals the search-hit count.Backward compatible. At default weights the ranker short-circuits, so the pgvector cosine order is returned unchanged — existing callers are unaffected. The response wire shape is unchanged (
Vec<SearchHit>); only the order changes whenscoring_weightsare set.recall_manualnow validatesscoring_weightsup front (400 on malformed) exactly likerecall, and the weights actually apply.Technical change
services/server/src/types.rsRecallManualRequestgains optionalscoring_weights;SearchHitderivesCloneservices/server/src/routes/recall.rsrank_search_hitshelper;recall_manualvalidates weights + applies the shared ranker;totalcomputed from the ranked result countNo schema change, no migration, no new dependency, no change to the retrieval / storage / decrypt paths.
Types of Changes
Testing
Full server suite passes (236/236); clippy clean on the changed files. New recall tests cover:
blob_idhits are not dropped (default + active weights)A follow-up end-to-end smoke (live
/api/recallvs/api/recall/manualwith matching weights) is recommended before merge to confirm at the handler level; the ordering logic itself is fully unit-covered. The retrieval-quality benchmarks (LOCOMO / LongMemEval) are not applicable — they exercise only/api/recallat default weights, where this change is a no-op.Checklist
Related Issues
Additional Notes
blob_idcorrectness issue in an earlier blob_id-keyed approach; the index-based reassembly above is the fix, with a regression test pinning it.search_similarbut is intentionally not ranked — it's an internal dedup-context read for the extractor prompt, never returned to a caller as recall results.