feat: cross-encoder reranking on journal search (v2 backlog #2)#33
Conversation
Opt-in `--rerank` flag on `tempyr journal search` and a `rerank: bool` parameter on the MCP `journal_search` tool. When set, the search takes the top 50 RRF candidates and re-scores them via the BGE- Reranker-base cross-encoder (~280 MB ONNX model, fastembed). The reranker scores `(query, summary + detail)` pairs directly, so ordering is driven by relevance instead of recency/kind boosts on close calls. Implementation: - New `tempyr_journal_index::rerank` module wrapping `fastembed::TextRerank`. Same shape as the existing `Embedder`: process-wide singleton via `try_shared_reranker()`, retry on transient load failure, warn-once on persistent failures so a hard "no model" environment doesn't spam stderr. - `SearchOptions` gains `rerank: bool`; default false. When true, the search pipeline calls `maybe_apply_rerank` after RRF + dedup but before the token-budget pass. - `ScoreBreakdown` gains a `rerank: f64` field that's populated in `--explain` mode when the cross-encoder ran. The pre-rerank components (bm25 / vector / rrf / recency / kind) stay populated for inspection; they no longer drive the sort. - Pure sort logic split into `apply_rerank_scores` for unit testing without needing the model. NaN / non-finite scores are demoted to the bottom of the head rather than corrupting ordering. - CLI `--rerank` flag with usage doc explaining the model-download cost. MCP `JournalSearchParams.rerank: Option<bool>`. - Failure mode mirrors the bi-encoder's: model load or inference errors fall back transparently to the unreranked RRF order with a single warn-once stderr line. Tests: 4 new pure unit tests on `apply_rerank_scores` (sort by score desc, tail preserved, NaN demoted, breakdown written), plus 3 `#[ignore]`-marked model-backed tests on the rerank module. Total: 44 journal-index tests passing, 11 ignored. Docs: CLAUDE.md / AGENTS.md `journal_search` line updated to distinguish RRF (always on) from cross-encoder rerank (opt-in via flag). `docs/journal-spec.md` v2 backlog item struck through with implementation summary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds an opt-in cross‑encoder reranking stage to journal search (flag Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as CLI / MCP
participant Search as Search Pipeline
participant Reranker as Reranker (BGE)
participant Results as Results
User->>CLI: journal_search(query, --rerank)
CLI->>Search: SearchOptions { rerank: true, ... }
Search->>Search: BM25 + vec0 RRF (+ recency/kind boost)
Search->>Search: Select top 50 candidates
Search->>Reranker: rerank(query, documents...)
Reranker->>Reranker: Load model if cold (OnceLock + mutex)
Reranker-->>Search: Vec<f32] rerank scores
Search->>Search: Replace head ordering by rerank scores
Search->>Search: Update SearchHit.score & ScoreBreakdown.total
Search-->>Results: Final ranked results
Results-->>User: Return hits
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/tempyr-cli/src/commands/journal_cmd.rs`:
- Around line 694-701: The plain-text --explain output omits the rerank
component so b.total (which becomes the cross-encoder score when rerank is
enabled) no longer matches the printed component breakdown; update the non-JSON
explain branch that assembles/prints the components list (the code that prints
"bm25/vector/rrf/recency/kind") to include "rerank" when the rerank flag (pub
rerank: bool) is set and ensure any textual sums or labels reflect that b.total
is the cross-encoder score; locate the block that checks explain/json mode and
the variable b.total and add logic to append "rerank" into the printed
components string (and adjust wording if it distinguishes total vs components)
so the human-readable explain output matches the actual scoring when rerank is
enabled.
In `@crates/tempyr-journal-index/src/rerank.rs`:
- Around line 112-137: The cold-start path needs to be serialized to avoid
duplicate model loads: add a static init mutex (e.g. static INIT:
std::sync::Mutex<()> = std::sync::Mutex::new(())) and, inside
try_shared_reranker(), acquire the INIT lock before calling Reranker::new();
immediately after acquiring the lock check RR.get() again and return if Some,
otherwise call Reranker::new(), set RR with the created Reranker, and then drop
the lock; preserve the existing WARNED behavior on Err and keep using
RR.get()/RR.set() so other threads still see the stored instance. Use the
existing symbols try_shared_reranker, RR, WARNED, and Reranker::new to locate
where to add the mutex and the second RR.get() check.
In `@docs/journal-spec.md`:
- Around line 384-385: The docs currently mark cross-encoder reranking as a
backlog item while code/CLI already exposes an opt-in --rerank flag for tempyr
journal search / journal_search using the BGE-Reranker-base over top-50 RRF
candidates; update the main retrieval spec (Section 9 "Slice 3b — Hybrid
retrieval") to fold this opt-in rerank stage into the canonical pipeline
description: describe that after initial retrieval and RRF fusion the system
will optionally run the BGE-Reranker-base cross-encoder over the top 50 RRF
candidates when --rerank is provided, note fallback behavior (use RRF order on
model-load or inference failure), and update any diagrams/examples to match the
new flow so the spec is internally consistent with the CLI and implementation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 17aaeb9d-e8b5-4f73-bb35-fbc34d6afd75
📒 Files selected for processing (8)
AGENTS.mdCLAUDE.mdcrates/tempyr-cli/src/commands/journal_cmd.rscrates/tempyr-journal-index/src/lib.rscrates/tempyr-journal-index/src/rerank.rscrates/tempyr-journal-index/src/search.rscrates/tempyr-mcp/src/handler.rsdocs/journal-spec.md
Three findings on PR #33: 1. The plain-text `--explain` printer wasn't surfacing the rerank component, so when `--rerank` was on the printed bm25/vector/rrf/ recency/kind sum no longer matched the displayed total (which had become the cross-encoder score). Branch the print on `args.rerank`: show "score: N = rerank (...components, informational only)" so the reader sees what's driving the order. 2. `try_shared_reranker` and `try_shared_embedder` had a cold-start race — under concurrent first calls (two MCP `journal_search` requests racing) the `OnceLock` alone let both threads start independent ~280 MB / ~80 MB downloads + ONNX warmups, of which only one's value would actually be stored. Add a static `Mutex<()>` barrier with double-checked locking around the load: hot path (slot already populated) still skips the lock; cold path acquires it, re-checks the slot, then runs the single load. Applied to both functions for symmetry — same pattern, same hazard. 3. The cross-encoder rerank stage was sitting in the v2 backlog section of `docs/journal-spec.md` even though the implementation matched the rest of the slice 3b retrieval pipeline. Moved the description into the Slice 3b — Hybrid retrieval pipeline list as the optional opt-in step it is, and dropped the backlog entry (replaced with an HTML comment pointer for changelog continuity). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/tempyr-cli/src/commands/journal_cmd.rs`:
- Around line 792-806: The CLI should not assume reranking succeeded based on
the requested flag; change the print branch to inspect an explicit result-state
flag (e.g., a boolean like reranked on ScoreBreakdown or SearchHit) instead of
using args.rerank. Locate the printing code that references args.rerank and
switch it to check the result object (e.g., b.reranked or hit.reranked), and if
that flag does not exist add and populate a reranked field in
ScoreBreakdown/SearchHit in the search logic so the CLI can accurately display
"rerank" only when reranking actually occurred.
In `@crates/tempyr-journal-index/src/rerank.rs`:
- Around line 141-156: The code should short-circuit repeated cold-start
attempts by caching a recent failure timestamp: introduce an atomic last-failure
(e.g., an AtomicU64 or AtomicI64 like LAST_FAIL_MS) and a fixed backoff duration
(e.g., BACKOFF_MS). In the entry path where you call Reranker::new() (same spot
that references RR and WARNED), first read LAST_FAIL_MS and if now < last_fail +
BACKOFF_MS return None immediately; on Err(err) set LAST_FAIL_MS = now and then
run the existing WARNED swap/log and return None. Ensure reads/writes to
LAST_FAIL_MS use ordering appropriate for visibility (Relaxed is fine here) so
queued callers see the backoff and avoid repeated model loads.
In `@docs/journal-spec.md`:
- Around line 359-363: The doc currently claims Slice 3b and Phase 4 features
(items like the optional BGE-Reranker rerank flow, deduplication by (summary,
kind), token-budget greedy fill, the new MCP tool journal_search with
rerank/explain, and CLIs such as tempyr journal search/show/sessions/tail) are
implemented but the top-level status metadata and the phase table still show
Phases 3/4 as pending; update the top-level status metadata and the phase table
entries to mark Slice 3b and Phase 4 as completed (or change the overall
document-level status to “stale -> updated/implemented”), ensuring the Phase 4
bullets and Slice 3b text remain consistent with the updated status so readers
don’t see conflicting signals.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2f662810-cefc-4454-b882-1b813ab1b341
📒 Files selected for processing (4)
crates/tempyr-cli/src/commands/journal_cmd.rscrates/tempyr-journal-index/src/embed.rscrates/tempyr-journal-index/src/rerank.rsdocs/journal-spec.md
Three findings on PR #33: 1. The CLI `--explain` print branched on `args.rerank`, which is the *requested* state, not the *result* state — when rerank was asked for but the model failed to load and the search fell back to RRF, the printed line still claimed "= rerank". Add `reranked: bool` to `ScoreBreakdown`, set only by `apply_rerank_scores` when scores actually flow into the hit; CLI now branches on `b.reranked`. 2. After a failed cold-start load, `try_shared_*` would retry on every subsequent call. In a hard "no model" environment (no network, ONNX runtime missing) that means every search waits on the slow timeout serialized behind the INIT mutex. Add an atomic last-failure timestamp + 5-second backoff: cheap atomic read on the hot path skips the load if a recent attempt failed; on success, the stamp clears so transient failures still recover quickly. Applied to both `try_shared_reranker` and `try_shared_embedder` for symmetry — same hazard, same fix. 3. The spec's top-level status line claimed "Phases 1–2 implemented; 3 and 4 pending" and the phase table marked both as "📋 Planned" even though Phase 3 (PRs #26, #27, #28) and Phase 4 (PRs #29–32) shipped weeks ago. Update both to ✅ Shipped with PR refs and note the cross-encoder rerank as a post-Phase-4 addition. Status summary now accurately reflects merged work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/tempyr-journal-index/src/search.rs (1)
208-212:⚠️ Potential issue | 🟠 Major
--reranknever reaches 50 candidates in the default case.With the default
--limit 10, Line 211 still pulls onlymax(limit * 4, 40)rows, so the reranker tops out at 40 candidates even thoughRERANK_CANDIDATE_COUNTis 50. That silently reduces recall in the common path this feature is meant to improve.Proposed fix
- let pull_usize = opts.limit.max(1).saturating_mul(4).max(40); + let pull_usize = opts + .limit + .max(1) + .saturating_mul(4) + .max(40) + .max(if opts.rerank { + RERANK_CANDIDATE_COUNT + } else { + 0 + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/tempyr-journal-index/src/search.rs` around lines 208 - 212, The current pull size calculation (pull_usize/pull_i64) uses opts.limit.max(1).saturating_mul(4).max(40) which for the default --limit 10 yields 40 and prevents the reranker from ever seeing RERANK_CANDIDATE_COUNT (50); update the pull_usize computation to ensure it is at least RERANK_CANDIDATE_COUNT (e.g. take the max of the existing expression and RERANK_CANDIDATE_COUNT), then convert to i64 exactly as before (pull_i64 = i64::try_from(pull_usize).unwrap_or(i64::MAX)) so the reranker can receive up to the configured RERANK_CANDIDATE_COUNT.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/tempyr-journal-index/src/search.rs`:
- Around line 382-386: The current rerank branch uses opts.rerank alone and may
rerank BM25-only fallbacks; change the condition so maybe_apply_rerank is only
called when both opts.rerank is true and vector-based retrieval is available
(e.g., hybrid_mode is true or opts.query_vector.is_some()); update the branch
around deduped to check that hybrid_mode || opts.query_vector.is_some() (in
addition to opts.rerank) before invoking maybe_apply_rerank to ensure pure BM25
results remain untouched.
In `@docs/journal-spec.md`:
- Around line 7-8: The document header now claims Phases 1–4 are implemented but
later text still uses future tense (e.g., phrases like "no SQLite (yet — Phase
3)", "Phase 4 will add ...", "worth answering before implementing 3a"); update
those passages in docs/journal-spec.md to reflect current status by either
converting future-tense statements into past-tense implementation notes or
explicitly marking them as historical/archival (e.g., "Historical: Phase 3 added
SQLite" or "Historical note: Phase 4 introduced X"), and ensure occurrences of
"Phase 3", "Phase 4", and "3a" (search for those strings) are reconciled so the
doc consistently reports shipped features rather than future work.
---
Outside diff comments:
In `@crates/tempyr-journal-index/src/search.rs`:
- Around line 208-212: The current pull size calculation (pull_usize/pull_i64)
uses opts.limit.max(1).saturating_mul(4).max(40) which for the default --limit
10 yields 40 and prevents the reranker from ever seeing RERANK_CANDIDATE_COUNT
(50); update the pull_usize computation to ensure it is at least
RERANK_CANDIDATE_COUNT (e.g. take the max of the existing expression and
RERANK_CANDIDATE_COUNT), then convert to i64 exactly as before (pull_i64 =
i64::try_from(pull_usize).unwrap_or(i64::MAX)) so the reranker can receive up to
the configured RERANK_CANDIDATE_COUNT.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 43f48fc6-da93-4c72-84d8-53dab0de7eb6
📒 Files selected for processing (5)
crates/tempyr-cli/src/commands/journal_cmd.rscrates/tempyr-journal-index/src/embed.rscrates/tempyr-journal-index/src/rerank.rscrates/tempyr-journal-index/src/search.rsdocs/journal-spec.md
| let deduped = if opts.rerank { | ||
| maybe_apply_rerank(deduped, &opts.query) | ||
| } else { | ||
| deduped | ||
| }; |
There was a problem hiding this comment.
Don't rerank BM25-only fallbacks.
This branch keys only off opts.rerank. If query embedding is unavailable, the function still pays reranker load/inference cost on BM25-only candidates, which breaks the documented "rerank the RRF head" contract and turns embedder fallback into a slower second-model path. Gate this on hybrid_mode (or opts.query_vector.is_some()) and leave pure BM25 results untouched when vector retrieval is unavailable.
Proposed fix
- let deduped = if opts.rerank {
- maybe_apply_rerank(deduped, &opts.query)
+ let deduped = if opts.rerank && hybrid_mode {
+ maybe_apply_rerank(deduped, trimmed)
} else {
deduped
};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/tempyr-journal-index/src/search.rs` around lines 382 - 386, The
current rerank branch uses opts.rerank alone and may rerank BM25-only fallbacks;
change the condition so maybe_apply_rerank is only called when both opts.rerank
is true and vector-based retrieval is available (e.g., hybrid_mode is true or
opts.query_vector.is_some()); update the branch around deduped to check that
hybrid_mode || opts.query_vector.is_some() (in addition to opts.rerank) before
invoking maybe_apply_rerank to ensure pure BM25 results remain untouched.
Two of three findings fixed; pushing back on the third. 1. **pull_usize too small for the reranker (valid).** Default `--limit 10` produced `pull_usize = 40`, less than `RERANK_CANDIDATE_COUNT = 50`. The reranker would never see the full 50-candidate pool it's sized for. Bump pull_usize to at least `RERANK_CANDIDATE_COUNT` when `opts.rerank` is on; non-rerank queries keep the existing 4× heuristic so we don't pull more than we need. 2. **Future-tense passages in the spec (valid).** Reconciled the doc with the merged status: `Phase 3 builds` → `Phase 3 built`, both phase headers gain "(shipped — PRs ...)" markers, "no SQLite (yet — Phase 3)" rewritten to describe the derived index that's now live, the "Phase 4 will add a journal report" line points at the v2 backlog instead, and "Open Questions for Phase 3" is now labelled historical with each item's actual resolution recorded inline. Bottom-of-doc PRs list now includes #26-32. 3. **Skip rerank when query_vector is None (declined).** The cross-encoder is a relevance scorer that takes (query, document) pairs — it works on any candidate pool, BM25-only or hybrid. The user opting in via `--rerank` explicitly asks for that scorer, and even a pure-BM25 candidate list benefits from the cross-encoder's semantic ordering. Reranker availability is also independent of embedder availability (different model files, different load paths), so "embedder unavailable" doesn't imply "skip the rerank attempt". Posting an explanation on the PR rather than applying the change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushing back on the "only rerank in hybrid mode" finding — I think the suggested change would be a regression. The cross-encoder is a relevance scorer that takes (query, document) pairs and scores them directly. It doesn't depend on a vector being available; it works equally well on a BM25-only candidate set. In fact, that's where it earns its keep — BM25 produces lexically-matching hits, and the cross-encoder is exactly the thing that re-ranks those by semantic relevance. Skipping rerank when `query_vector` is None would also be surprising UX: the user explicitly opted into `--rerank` and silently getting back unreranked BM25 results would be a footgun. And reranker availability isn't tied to embedder availability — they're separate model files with separate load paths, so "embedder unavailable → skip rerank attempt" isn't a sound implication. The reranker's own load path already handles the no-model environment cheaply (warn-once + 5s backoff after the first failure). If you have a case in mind where reranking BM25-only candidates produces worse results than not reranking, I'd love to see it — happy to add a guard if there's a real failure mode. Otherwise leaving the rerank pass independent of the candidate-source signal. The other two findings (`pull_usize` under-feeding the reranker, future-tense doc passages) are fixed in `930a244`. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/tempyr-journal-index/src/search.rs`:
- Around line 413-421: Currently the code calls crate::try_shared_reranker()
before checking whether there are any hits, causing the reranker to load even
when n == 0; change the logic so you compute n =
hits.len().min(RERANK_CANDIDATE_COUNT) and immediately return hits if n == 0,
and only after that call crate::try_shared_reranker() and bind Some(reranker);
keep the existing comment about the warn-once behavior and ensure the early
return happens before any attempt to load the reranker (references: hits,
RERANK_CANDIDATE_COUNT, try_shared_reranker, reranker).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 461ef8d2-3fe5-41b8-b42b-cc155683ea18
📒 Files selected for processing (2)
crates/tempyr-journal-index/src/search.rsdocs/journal-spec.md
`maybe_apply_rerank` was calling `try_shared_reranker()` before checking whether `hits` had anything to score. A zero-result search with `--rerank` would still trigger the ~280 MB model download on first call (and pay the slow timeout in a no-network environment even after the load failure backoff kicks in). Swap the order: compute `n = hits.len().min(RERANK_CANDIDATE_COUNT)`, early-return on `n == 0`, *then* try to grab the reranker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
First v2-backlog item from `docs/journal-spec.md`. Adds an opt-in cross-encoder rerank pass on top of the existing BM25 + vec0 RRF hybrid retrieval.
When enabled, the top 50 RRF candidates get re-scored via the BGE-Reranker-base cross-encoder (fastembed). Scores from `(query, summary+detail)` pairs replace the sort key — recency and kind boosts are still computed (and shown in `--explain`) but no longer drive ordering. Bigger model than the bi-encoder (~280 MB on first download) and ~200 ms inference per query, so it's opt-in rather than always-on.
Failure mode mirrors the bi-encoder's: model load or inference errors fall back transparently to the unreranked RRF order with a single warn-once stderr line.
Implementation
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Bug Fixes