feat: cross-encoder reranking on journal search (v2 backlog #2) by cleak · Pull Request #33 · cleak/tempyr

cleak · 2026-04-29T21:52:40Z

Summary

First v2-backlog item from `docs/journal-spec.md`. Adds an opt-in cross-encoder rerank pass on top of the existing BM25 + vec0 RRF hybrid retrieval.

CLI: `tempyr journal search --rerank`
MCP: `journal_search` tool gains `rerank: Option`

When enabled, the top 50 RRF candidates get re-scored via the BGE-Reranker-base cross-encoder (fastembed). Scores from `(query, summary+detail)` pairs replace the sort key — recency and kind boosts are still computed (and shown in `--explain`) but no longer drive ordering. Bigger model than the bi-encoder (~280 MB on first download) and ~200 ms inference per query, so it's opt-in rather than always-on.

Failure mode mirrors the bi-encoder's: model load or inference errors fall back transparently to the unreranked RRF order with a single warn-once stderr line.

Implementation

File	Change
`tempyr-journal-index/src/rerank.rs` (new)	`Reranker` wrapper, `try_shared_reranker()` singleton, `warn_query_rerank_failure_once`
`tempyr-journal-index/src/search.rs`	`SearchOptions::rerank`, `ScoreBreakdown::rerank`, `maybe_apply_rerank`, `apply_rerank_scores` (pure sort)
`tempyr-cli/src/commands/journal_cmd.rs`	`SearchArgs::rerank` flag
`tempyr-mcp/src/handler.rs`	`JournalSearchParams::rerank` field
`CLAUDE.md` / `AGENTS.md`	Distinguish always-on RRF from opt-in cross-encoder
`docs/journal-spec.md`	V2 backlog item struck through

Test plan

`cargo test --workspace` (44 journal-index tests pass; 4 new pure unit tests on `apply_rerank_scores`, 3 new `#[ignore]`-marked model-backed tests)
`cargo clippy --workspace --all-targets -- -D warnings`
`cargo fmt --check`
Manual smoke: `tempyr journal search "" --rerank` in a real repo, confirm the model downloads and ordering shifts vs. without `--rerank`

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Optional --rerank for journal_search: opt-in cross-encoder rerank of the top 50 candidates that can reorder results and update reported scores; off by default and safely falls back if unavailable.
- Explain output now indicates when results were reranked and shows component values for inspection.
Documentation
- CLI, spec, and help updated to describe BM25 + vec0 RRF with recency/kind boosts, rerank semantics, model note, and fallback behavior.
Bug Fixes
- Rerank failures are non-fatal and emit a one-time warning.

Opt-in `--rerank` flag on `tempyr journal search` and a `rerank: bool` parameter on the MCP `journal_search` tool. When set, the search takes the top 50 RRF candidates and re-scores them via the BGE- Reranker-base cross-encoder (~280 MB ONNX model, fastembed). The reranker scores `(query, summary + detail)` pairs directly, so ordering is driven by relevance instead of recency/kind boosts on close calls. Implementation: - New `tempyr_journal_index::rerank` module wrapping `fastembed::TextRerank`. Same shape as the existing `Embedder`: process-wide singleton via `try_shared_reranker()`, retry on transient load failure, warn-once on persistent failures so a hard "no model" environment doesn't spam stderr. - `SearchOptions` gains `rerank: bool`; default false. When true, the search pipeline calls `maybe_apply_rerank` after RRF + dedup but before the token-budget pass. - `ScoreBreakdown` gains a `rerank: f64` field that's populated in `--explain` mode when the cross-encoder ran. The pre-rerank components (bm25 / vector / rrf / recency / kind) stay populated for inspection; they no longer drive the sort. - Pure sort logic split into `apply_rerank_scores` for unit testing without needing the model. NaN / non-finite scores are demoted to the bottom of the head rather than corrupting ordering. - CLI `--rerank` flag with usage doc explaining the model-download cost. MCP `JournalSearchParams.rerank: Option<bool>`. - Failure mode mirrors the bi-encoder's: model load or inference errors fall back transparently to the unreranked RRF order with a single warn-once stderr line. Tests: 4 new pure unit tests on `apply_rerank_scores` (sort by score desc, tail preserved, NaN demoted, breakdown written), plus 3 `#[ignore]`-marked model-backed tests on the rerank module. Total: 44 journal-index tests passing, 11 ignored. Docs: CLAUDE.md / AGENTS.md `journal_search` line updated to distinguish RRF (always on) from cross-encoder rerank (opt-in via flag). `docs/journal-spec.md` v2 backlog item struck through with implementation summary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-29T21:53:49Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f0e9c28f-fbea-4187-a5aa-9bd0c4f01e73

📥 Commits

Reviewing files that changed from the base of the PR and between 930a244 and c3a1d48.

📒 Files selected for processing (1)

crates/tempyr-journal-index/src/search.rs

📝 Walkthrough

Walkthrough

Adds an opt-in cross‑encoder reranking stage to journal search (flag --rerank / rerank: true) that reranks the top 50 RRF candidates using a BGE reranker (BAAI/bge-reranker-base), threads the flag through CLI and MCP, exposes a rerank module and safe singleton loader, and preserves explain fields with non-fatal fallbacks.

Changes

Cohort / File(s)	Summary
Documentation `AGENTS.md`, `CLAUDE.md`, `docs/journal-spec.md`	Docs revised to specify retrieval as BM25 + vec0 RRF with recency/kind boosts and to document the optional `--rerank`/`rerank: true` BGE cross‑encoder stage (top 50) and fallback behavior.
CLI Integration `crates/tempyr-cli/src/commands/journal_cmd.rs`	Adds `rerank: bool` to `SearchArgs`, threads `--rerank` into `SearchOptions`, and updates `--explain` output to annotate reranked hits.
MCP Server Integration `crates/tempyr-mcp/src/handler.rs`	Adds `pub rerank: Option<bool>` to `JournalSearchParams` and propagates it into `SearchOptions` (defaults false when absent).
Library Exports `crates/tempyr-journal-index/src/lib.rs`	Exports new `rerank` module and re-exports `Reranker`, `try_shared_reranker`, and `warn_query_rerank_failure_once`.
Reranking Module `crates/tempyr-journal-index/src/rerank.rs`	New module wrapping `fastembed::TextRerank` with `RERANK_MODEL_NAME` (`BAAI/bge-reranker-base`), `Reranker` type, thread-safe singleton init (OnceLock + mutex), load/infer failure backoff, warn-once behavior, and ignored integration tests.
Search Pipeline Integration `crates/tempyr-journal-index/src/search.rs`	Adds `SearchOptions.rerank`, `RERANK_CANDIDATE_COUNT = 50`; when enabled, scores head candidates with reranker, replaces head scores and re-sorts by reranker score, populates `ScoreBreakdown.rerank`/`reranked` and sets `total` to rerank score for explain, preserves tail ordering, and falls back on load/infer failure.
Embedder Init Serialization `crates/tempyr-journal-index/src/embed.rs`	Prevents concurrent embedder cold-starts with an `INIT: Mutex<()>`, double-checked locking around `EMB.get()`, and adds retry backoff (`LAST_FAIL_MS`, `EMBED_RETRY_BACKOFF_MS`) plus helper `unix_epoch_ms()`.

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as CLI / MCP
    participant Search as Search Pipeline
    participant Reranker as Reranker (BGE)
    participant Results as Results

    User->>CLI: journal_search(query, --rerank)
    CLI->>Search: SearchOptions { rerank: true, ... }
    Search->>Search: BM25 + vec0 RRF (+ recency/kind boost)
    Search->>Search: Select top 50 candidates
    Search->>Reranker: rerank(query, documents...)
    Reranker->>Reranker: Load model if cold (OnceLock + mutex)
    Reranker-->>Search: Vec<f32] rerank scores
    Search->>Search: Replace head ordering by rerank scores
    Search->>Search: Update SearchHit.score & ScoreBreakdown.total
    Search-->>Results: Final ranked results
    Results-->>User: Return hits

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(journal): hybrid retrieval with fastembed + sqlite-vec + RRF (Phase 3 slice 3b2/3) #28: Extends the hybrid BM25 + vector RRF retrieval pipeline; touches the same search options and score breakdown fields.
feat(journal): FTS5 + BM25 search with recency + kind boosts (Phase 3 slice 3b1/3) #27: Earlier journal search pipeline changes (FTS5/BM25) that this PR builds upon and extends with rerank plumbing.
feat(journal): SQLite index foundation + journal_get (Phase 3 slice 3a/3) #26: Introduced the tempyr-journal-index crate exports and foundations that this PR extends with a rerank module and re-exports.

Poem

🐇
I sniffed the query in the night,
Fifty carrots lined in light,
I hop, I score, I softly reframe,
If models nap, the old order stays the same.
Relevance finds its cozy home tonight.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding cross-encoder reranking to journal search as an opt-in feature, referenced as a v2 backlog item.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

_{Review rate limit: 4/5 reviews remaining, refill in 12 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-cli/src/commands/journal_cmd.rs`:
- Around line 694-701: The plain-text --explain output omits the rerank
component so b.total (which becomes the cross-encoder score when rerank is
enabled) no longer matches the printed component breakdown; update the non-JSON
explain branch that assembles/prints the components list (the code that prints
"bm25/vector/rrf/recency/kind") to include "rerank" when the rerank flag (pub
rerank: bool) is set and ensure any textual sums or labels reflect that b.total
is the cross-encoder score; locate the block that checks explain/json mode and
the variable b.total and add logic to append "rerank" into the printed
components string (and adjust wording if it distinguishes total vs components)
so the human-readable explain output matches the actual scoring when rerank is
enabled.

In `@crates/tempyr-journal-index/src/rerank.rs`:
- Around line 112-137: The cold-start path needs to be serialized to avoid
duplicate model loads: add a static init mutex (e.g. static INIT:
std::sync::Mutex<()> = std::sync::Mutex::new(())) and, inside
try_shared_reranker(), acquire the INIT lock before calling Reranker::new();
immediately after acquiring the lock check RR.get() again and return if Some,
otherwise call Reranker::new(), set RR with the created Reranker, and then drop
the lock; preserve the existing WARNED behavior on Err and keep using
RR.get()/RR.set() so other threads still see the stored instance. Use the
existing symbols try_shared_reranker, RR, WARNED, and Reranker::new to locate
where to add the mutex and the second RR.get() check.

In `@docs/journal-spec.md`:
- Around line 384-385: The docs currently mark cross-encoder reranking as a
backlog item while code/CLI already exposes an opt-in --rerank flag for tempyr
journal search / journal_search using the BGE-Reranker-base over top-50 RRF
candidates; update the main retrieval spec (Section 9 "Slice 3b — Hybrid
retrieval") to fold this opt-in rerank stage into the canonical pipeline
description: describe that after initial retrieval and RRF fusion the system
will optionally run the BGE-Reranker-base cross-encoder over the top 50 RRF
candidates when --rerank is provided, note fallback behavior (use RRF order on
model-load or inference failure), and update any diagrams/examples to match the
new flow so the spec is internally consistent with the CLI and implementation.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 17aaeb9d-e8b5-4f73-bb35-fbc34d6afd75

📥 Commits

Reviewing files that changed from the base of the PR and between fceea24 and 36c3fc0.

📒 Files selected for processing (8)

AGENTS.md
CLAUDE.md
crates/tempyr-cli/src/commands/journal_cmd.rs
crates/tempyr-journal-index/src/lib.rs
crates/tempyr-journal-index/src/rerank.rs
crates/tempyr-journal-index/src/search.rs
crates/tempyr-mcp/src/handler.rs
docs/journal-spec.md

Three findings on PR #33: 1. The plain-text `--explain` printer wasn't surfacing the rerank component, so when `--rerank` was on the printed bm25/vector/rrf/ recency/kind sum no longer matched the displayed total (which had become the cross-encoder score). Branch the print on `args.rerank`: show "score: N = rerank (...components, informational only)" so the reader sees what's driving the order. 2. `try_shared_reranker` and `try_shared_embedder` had a cold-start race — under concurrent first calls (two MCP `journal_search` requests racing) the `OnceLock` alone let both threads start independent ~280 MB / ~80 MB downloads + ONNX warmups, of which only one's value would actually be stored. Add a static `Mutex<()>` barrier with double-checked locking around the load: hot path (slot already populated) still skips the lock; cold path acquires it, re-checks the slot, then runs the single load. Applied to both functions for symmetry — same pattern, same hazard. 3. The cross-encoder rerank stage was sitting in the v2 backlog section of `docs/journal-spec.md` even though the implementation matched the rest of the slice 3b retrieval pipeline. Moved the description into the Slice 3b — Hybrid retrieval pipeline list as the optional opt-in step it is, and dropped the backlog entry (replaced with an HTML comment pointer for changelog continuity). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-cli/src/commands/journal_cmd.rs`:
- Around line 792-806: The CLI should not assume reranking succeeded based on
the requested flag; change the print branch to inspect an explicit result-state
flag (e.g., a boolean like reranked on ScoreBreakdown or SearchHit) instead of
using args.rerank. Locate the printing code that references args.rerank and
switch it to check the result object (e.g., b.reranked or hit.reranked), and if
that flag does not exist add and populate a reranked field in
ScoreBreakdown/SearchHit in the search logic so the CLI can accurately display
"rerank" only when reranking actually occurred.

In `@crates/tempyr-journal-index/src/rerank.rs`:
- Around line 141-156: The code should short-circuit repeated cold-start
attempts by caching a recent failure timestamp: introduce an atomic last-failure
(e.g., an AtomicU64 or AtomicI64 like LAST_FAIL_MS) and a fixed backoff duration
(e.g., BACKOFF_MS). In the entry path where you call Reranker::new() (same spot
that references RR and WARNED), first read LAST_FAIL_MS and if now < last_fail +
BACKOFF_MS return None immediately; on Err(err) set LAST_FAIL_MS = now and then
run the existing WARNED swap/log and return None. Ensure reads/writes to
LAST_FAIL_MS use ordering appropriate for visibility (Relaxed is fine here) so
queued callers see the backoff and avoid repeated model loads.

In `@docs/journal-spec.md`:
- Around line 359-363: The doc currently claims Slice 3b and Phase 4 features
(items like the optional BGE-Reranker rerank flow, deduplication by (summary,
kind), token-budget greedy fill, the new MCP tool journal_search with
rerank/explain, and CLIs such as tempyr journal search/show/sessions/tail) are
implemented but the top-level status metadata and the phase table still show
Phases 3/4 as pending; update the top-level status metadata and the phase table
entries to mark Slice 3b and Phase 4 as completed (or change the overall
document-level status to “stale -> updated/implemented”), ensuring the Phase 4
bullets and Slice 3b text remain consistent with the updated status so readers
don’t see conflicting signals.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2f662810-cefc-4454-b882-1b813ab1b341

📥 Commits

Reviewing files that changed from the base of the PR and between 36c3fc0 and 5c1a0e9.

📒 Files selected for processing (4)

crates/tempyr-cli/src/commands/journal_cmd.rs
crates/tempyr-journal-index/src/embed.rs
crates/tempyr-journal-index/src/rerank.rs
docs/journal-spec.md

Three findings on PR #33: 1. The CLI `--explain` print branched on `args.rerank`, which is the *requested* state, not the *result* state — when rerank was asked for but the model failed to load and the search fell back to RRF, the printed line still claimed "= rerank". Add `reranked: bool` to `ScoreBreakdown`, set only by `apply_rerank_scores` when scores actually flow into the hit; CLI now branches on `b.reranked`. 2. After a failed cold-start load, `try_shared_*` would retry on every subsequent call. In a hard "no model" environment (no network, ONNX runtime missing) that means every search waits on the slow timeout serialized behind the INIT mutex. Add an atomic last-failure timestamp + 5-second backoff: cheap atomic read on the hot path skips the load if a recent attempt failed; on success, the stamp clears so transient failures still recover quickly. Applied to both `try_shared_reranker` and `try_shared_embedder` for symmetry — same hazard, same fix. 3. The spec's top-level status line claimed "Phases 1–2 implemented; 3 and 4 pending" and the phase table marked both as "📋 Planned" even though Phase 3 (PRs #26, #27, #28) and Phase 4 (PRs #29–32) shipped weeks ago. Update both to ✅ Shipped with PR refs and note the cross-encoder rerank as a post-Phase-4 addition. Status summary now accurately reflects merged work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/tempyr-journal-index/src/search.rs (1)

208-212: ⚠️ Potential issue | 🟠 Major

--rerank never reaches 50 candidates in the default case.

With the default --limit 10, Line 211 still pulls only max(limit * 4, 40) rows, so the reranker tops out at 40 candidates even though RERANK_CANDIDATE_COUNT is 50. That silently reduces recall in the common path this feature is meant to improve.

Proposed fix

-    let pull_usize = opts.limit.max(1).saturating_mul(4).max(40);
+    let pull_usize = opts
+        .limit
+        .max(1)
+        .saturating_mul(4)
+        .max(40)
+        .max(if opts.rerank {
+            RERANK_CANDIDATE_COUNT
+        } else {
+            0
+        });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/tempyr-journal-index/src/search.rs` around lines 208 - 212, The
current pull size calculation (pull_usize/pull_i64) uses
opts.limit.max(1).saturating_mul(4).max(40) which for the default --limit 10
yields 40 and prevents the reranker from ever seeing RERANK_CANDIDATE_COUNT
(50); update the pull_usize computation to ensure it is at least
RERANK_CANDIDATE_COUNT (e.g. take the max of the existing expression and
RERANK_CANDIDATE_COUNT), then convert to i64 exactly as before (pull_i64 =
i64::try_from(pull_usize).unwrap_or(i64::MAX)) so the reranker can receive up to
the configured RERANK_CANDIDATE_COUNT.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-journal-index/src/search.rs`:
- Around line 382-386: The current rerank branch uses opts.rerank alone and may
rerank BM25-only fallbacks; change the condition so maybe_apply_rerank is only
called when both opts.rerank is true and vector-based retrieval is available
(e.g., hybrid_mode is true or opts.query_vector.is_some()); update the branch
around deduped to check that hybrid_mode || opts.query_vector.is_some() (in
addition to opts.rerank) before invoking maybe_apply_rerank to ensure pure BM25
results remain untouched.

In `@docs/journal-spec.md`:
- Around line 7-8: The document header now claims Phases 1–4 are implemented but
later text still uses future tense (e.g., phrases like "no SQLite (yet — Phase
3)", "Phase 4 will add ...", "worth answering before implementing 3a"); update
those passages in docs/journal-spec.md to reflect current status by either
converting future-tense statements into past-tense implementation notes or
explicitly marking them as historical/archival (e.g., "Historical: Phase 3 added
SQLite" or "Historical note: Phase 4 introduced X"), and ensure occurrences of
"Phase 3", "Phase 4", and "3a" (search for those strings) are reconciled so the
doc consistently reports shipped features rather than future work.

---

Outside diff comments:
In `@crates/tempyr-journal-index/src/search.rs`:
- Around line 208-212: The current pull size calculation (pull_usize/pull_i64)
uses opts.limit.max(1).saturating_mul(4).max(40) which for the default --limit
10 yields 40 and prevents the reranker from ever seeing RERANK_CANDIDATE_COUNT
(50); update the pull_usize computation to ensure it is at least
RERANK_CANDIDATE_COUNT (e.g. take the max of the existing expression and
RERANK_CANDIDATE_COUNT), then convert to i64 exactly as before (pull_i64 =
i64::try_from(pull_usize).unwrap_or(i64::MAX)) so the reranker can receive up to
the configured RERANK_CANDIDATE_COUNT.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 43f48fc6-da93-4c72-84d8-53dab0de7eb6

📥 Commits

Reviewing files that changed from the base of the PR and between 5c1a0e9 and 39f0a3f.

📒 Files selected for processing (5)

crates/tempyr-cli/src/commands/journal_cmd.rs
crates/tempyr-journal-index/src/embed.rs
crates/tempyr-journal-index/src/rerank.rs
crates/tempyr-journal-index/src/search.rs
docs/journal-spec.md

coderabbitai · 2026-04-29T23:02:40Z

+    let deduped = if opts.rerank {
+        maybe_apply_rerank(deduped, &opts.query)
+    } else {
+        deduped
+    };


⚠️ Potential issue | 🟠 Major

Don't rerank BM25-only fallbacks.

This branch keys only off opts.rerank. If query embedding is unavailable, the function still pays reranker load/inference cost on BM25-only candidates, which breaks the documented "rerank the RRF head" contract and turns embedder fallback into a slower second-model path. Gate this on hybrid_mode (or opts.query_vector.is_some()) and leave pure BM25 results untouched when vector retrieval is unavailable.

Proposed fix

- let deduped = if opts.rerank { - maybe_apply_rerank(deduped, &opts.query) + let deduped = if opts.rerank && hybrid_mode { + maybe_apply_rerank(deduped, trimmed) } else { deduped };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/tempyr-journal-index/src/search.rs` around lines 382 - 386, The current rerank branch uses opts.rerank alone and may rerank BM25-only fallbacks; change the condition so maybe_apply_rerank is only called when both opts.rerank is true and vector-based retrieval is available (e.g., hybrid_mode is true or opts.query_vector.is_some()); update the branch around deduped to check that hybrid_mode || opts.query_vector.is_some() (in addition to opts.rerank) before invoking maybe_apply_rerank to ensure pure BM25 results remain untouched.

Two of three findings fixed; pushing back on the third. 1. **pull_usize too small for the reranker (valid).** Default `--limit 10` produced `pull_usize = 40`, less than `RERANK_CANDIDATE_COUNT = 50`. The reranker would never see the full 50-candidate pool it's sized for. Bump pull_usize to at least `RERANK_CANDIDATE_COUNT` when `opts.rerank` is on; non-rerank queries keep the existing 4× heuristic so we don't pull more than we need. 2. **Future-tense passages in the spec (valid).** Reconciled the doc with the merged status: `Phase 3 builds` → `Phase 3 built`, both phase headers gain "(shipped — PRs ...)" markers, "no SQLite (yet — Phase 3)" rewritten to describe the derived index that's now live, the "Phase 4 will add a journal report" line points at the v2 backlog instead, and "Open Questions for Phase 3" is now labelled historical with each item's actual resolution recorded inline. Bottom-of-doc PRs list now includes #26-32. 3. **Skip rerank when query_vector is None (declined).** The cross-encoder is a relevance scorer that takes (query, document) pairs — it works on any candidate pool, BM25-only or hybrid. The user opting in via `--rerank` explicitly asks for that scorer, and even a pure-BM25 candidate list benefits from the cross-encoder's semantic ordering. Reranker availability is also independent of embedder availability (different model files, different load paths), so "embedder unavailable" doesn't imply "skip the rerank attempt". Posting an explanation on the PR rather than applying the change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cleak · 2026-04-29T23:08:27Z

Pushing back on the "only rerank in hybrid mode" finding — I think the suggested change would be a regression.

The cross-encoder is a relevance scorer that takes (query, document) pairs and scores them directly. It doesn't depend on a vector being available; it works equally well on a BM25-only candidate set. In fact, that's where it earns its keep — BM25 produces lexically-matching hits, and the cross-encoder is exactly the thing that re-ranks those by semantic relevance.

Skipping rerank when `query_vector` is None would also be surprising UX: the user explicitly opted into `--rerank` and silently getting back unreranked BM25 results would be a footgun. And reranker availability isn't tied to embedder availability — they're separate model files with separate load paths, so "embedder unavailable → skip rerank attempt" isn't a sound implication. The reranker's own load path already handles the no-model environment cheaply (warn-once + 5s backoff after the first failure).

If you have a case in mind where reranking BM25-only candidates produces worse results than not reranking, I'd love to see it — happy to add a guard if there's a real failure mode. Otherwise leaving the rerank pass independent of the candidate-source signal.

The other two findings (`pull_usize` under-feeding the reranker, future-tense doc passages) are fixed in `930a244`.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-journal-index/src/search.rs`:
- Around line 413-421: Currently the code calls crate::try_shared_reranker()
before checking whether there are any hits, causing the reranker to load even
when n == 0; change the logic so you compute n =
hits.len().min(RERANK_CANDIDATE_COUNT) and immediately return hits if n == 0,
and only after that call crate::try_shared_reranker() and bind Some(reranker);
keep the existing comment about the warn-once behavior and ensure the early
return happens before any attempt to load the reranker (references: hits,
RERANK_CANDIDATE_COUNT, try_shared_reranker, reranker).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 461ef8d2-3fe5-41b8-b42b-cc155683ea18

📥 Commits

Reviewing files that changed from the base of the PR and between 39f0a3f and 930a244.

📒 Files selected for processing (2)

crates/tempyr-journal-index/src/search.rs
docs/journal-spec.md

`maybe_apply_rerank` was calling `try_shared_reranker()` before checking whether `hits` had anything to score. A zero-result search with `--rerank` would still trigger the ~280 MB model download on first call (and pay the slow timeout in a no-network environment even after the load failure backoff kicks in). Swap the order: compute `n = hits.len().min(RERANK_CANDIDATE_COUNT)`, early-return on `n == 0`, *then* try to grab the reranker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread crates/tempyr-cli/src/commands/journal_cmd.rs

Comment thread crates/tempyr-journal-index/src/rerank.rs

Comment thread docs/journal-spec.md Outdated

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread crates/tempyr-cli/src/commands/journal_cmd.rs Outdated

Comment thread crates/tempyr-journal-index/src/rerank.rs

Comment thread docs/journal-spec.md

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread crates/tempyr-journal-index/src/search.rs Outdated

cleak merged commit 84bccc8 into master Apr 29, 2026
1 check passed

cleak deleted the claude/journal-cross-encoder-rerank branch April 29, 2026 23:46

Conversation

cleak commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Implementation

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cleak commented Apr 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cleak commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading