Skip to content

feat: cross-encoder reranking on journal search (v2 backlog #2)#33

Merged
cleak merged 5 commits into
masterfrom
claude/journal-cross-encoder-rerank
Apr 29, 2026
Merged

feat: cross-encoder reranking on journal search (v2 backlog #2)#33
cleak merged 5 commits into
masterfrom
claude/journal-cross-encoder-rerank

Conversation

@cleak
Copy link
Copy Markdown
Owner

@cleak cleak commented Apr 29, 2026

Summary

First v2-backlog item from `docs/journal-spec.md`. Adds an opt-in cross-encoder rerank pass on top of the existing BM25 + vec0 RRF hybrid retrieval.

  • CLI: `tempyr journal search --rerank`
  • MCP: `journal_search` tool gains `rerank: Option`

When enabled, the top 50 RRF candidates get re-scored via the BGE-Reranker-base cross-encoder (fastembed). Scores from `(query, summary+detail)` pairs replace the sort key — recency and kind boosts are still computed (and shown in `--explain`) but no longer drive ordering. Bigger model than the bi-encoder (~280 MB on first download) and ~200 ms inference per query, so it's opt-in rather than always-on.

Failure mode mirrors the bi-encoder's: model load or inference errors fall back transparently to the unreranked RRF order with a single warn-once stderr line.

Implementation

File Change
`tempyr-journal-index/src/rerank.rs` (new) `Reranker` wrapper, `try_shared_reranker()` singleton, `warn_query_rerank_failure_once`
`tempyr-journal-index/src/search.rs` `SearchOptions::rerank`, `ScoreBreakdown::rerank`, `maybe_apply_rerank`, `apply_rerank_scores` (pure sort)
`tempyr-cli/src/commands/journal_cmd.rs` `SearchArgs::rerank` flag
`tempyr-mcp/src/handler.rs` `JournalSearchParams::rerank` field
`CLAUDE.md` / `AGENTS.md` Distinguish always-on RRF from opt-in cross-encoder
`docs/journal-spec.md` V2 backlog item struck through

Test plan

  • `cargo test --workspace` (44 journal-index tests pass; 4 new pure unit tests on `apply_rerank_scores`, 3 new `#[ignore]`-marked model-backed tests)
  • `cargo clippy --workspace --all-targets -- -D warnings`
  • `cargo fmt --check`
  • Manual smoke: `tempyr journal search "" --rerank` in a real repo, confirm the model downloads and ordering shifts vs. without `--rerank`

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Optional --rerank for journal_search: opt-in cross-encoder rerank of the top 50 candidates that can reorder results and update reported scores; off by default and safely falls back if unavailable.
    • Explain output now indicates when results were reranked and shows component values for inspection.
  • Documentation

    • CLI, spec, and help updated to describe BM25 + vec0 RRF with recency/kind boosts, rerank semantics, model note, and fallback behavior.
  • Bug Fixes

    • Rerank failures are non-fatal and emit a one-time warning.

Opt-in `--rerank` flag on `tempyr journal search` and a `rerank: bool`
parameter on the MCP `journal_search` tool. When set, the search
takes the top 50 RRF candidates and re-scores them via the BGE-
Reranker-base cross-encoder (~280 MB ONNX model, fastembed). The
reranker scores `(query, summary + detail)` pairs directly, so
ordering is driven by relevance instead of recency/kind boosts on
close calls.

Implementation:
- New `tempyr_journal_index::rerank` module wrapping
  `fastembed::TextRerank`. Same shape as the existing `Embedder`:
  process-wide singleton via `try_shared_reranker()`, retry on
  transient load failure, warn-once on persistent failures so a
  hard "no model" environment doesn't spam stderr.
- `SearchOptions` gains `rerank: bool`; default false. When true,
  the search pipeline calls `maybe_apply_rerank` after RRF + dedup
  but before the token-budget pass.
- `ScoreBreakdown` gains a `rerank: f64` field that's populated in
  `--explain` mode when the cross-encoder ran. The pre-rerank
  components (bm25 / vector / rrf / recency / kind) stay populated
  for inspection; they no longer drive the sort.
- Pure sort logic split into `apply_rerank_scores` for unit testing
  without needing the model. NaN / non-finite scores are demoted to
  the bottom of the head rather than corrupting ordering.
- CLI `--rerank` flag with usage doc explaining the model-download
  cost. MCP `JournalSearchParams.rerank: Option<bool>`.
- Failure mode mirrors the bi-encoder's: model load or inference
  errors fall back transparently to the unreranked RRF order with
  a single warn-once stderr line.

Tests: 4 new pure unit tests on `apply_rerank_scores` (sort by
score desc, tail preserved, NaN demoted, breakdown written), plus
3 `#[ignore]`-marked model-backed tests on the rerank module.
Total: 44 journal-index tests passing, 11 ignored.

Docs: CLAUDE.md / AGENTS.md `journal_search` line updated to
distinguish RRF (always on) from cross-encoder rerank (opt-in via
flag). `docs/journal-spec.md` v2 backlog item struck through with
implementation summary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f0e9c28f-fbea-4187-a5aa-9bd0c4f01e73

📥 Commits

Reviewing files that changed from the base of the PR and between 930a244 and c3a1d48.

📒 Files selected for processing (1)
  • crates/tempyr-journal-index/src/search.rs

📝 Walkthrough

Walkthrough

Adds an opt-in cross‑encoder reranking stage to journal search (flag --rerank / rerank: true) that reranks the top 50 RRF candidates using a BGE reranker (BAAI/bge-reranker-base), threads the flag through CLI and MCP, exposes a rerank module and safe singleton loader, and preserves explain fields with non-fatal fallbacks.

Changes

Cohort / File(s) Summary
Documentation
AGENTS.md, CLAUDE.md, docs/journal-spec.md
Docs revised to specify retrieval as BM25 + vec0 RRF with recency/kind boosts and to document the optional --rerank/rerank: true BGE cross‑encoder stage (top 50) and fallback behavior.
CLI Integration
crates/tempyr-cli/src/commands/journal_cmd.rs
Adds rerank: bool to SearchArgs, threads --rerank into SearchOptions, and updates --explain output to annotate reranked hits.
MCP Server Integration
crates/tempyr-mcp/src/handler.rs
Adds pub rerank: Option<bool> to JournalSearchParams and propagates it into SearchOptions (defaults false when absent).
Library Exports
crates/tempyr-journal-index/src/lib.rs
Exports new rerank module and re-exports Reranker, try_shared_reranker, and warn_query_rerank_failure_once.
Reranking Module
crates/tempyr-journal-index/src/rerank.rs
New module wrapping fastembed::TextRerank with RERANK_MODEL_NAME (BAAI/bge-reranker-base), Reranker type, thread-safe singleton init (OnceLock + mutex), load/infer failure backoff, warn-once behavior, and ignored integration tests.
Search Pipeline Integration
crates/tempyr-journal-index/src/search.rs
Adds SearchOptions.rerank, RERANK_CANDIDATE_COUNT = 50; when enabled, scores head candidates with reranker, replaces head scores and re-sorts by reranker score, populates ScoreBreakdown.rerank/reranked and sets total to rerank score for explain, preserves tail ordering, and falls back on load/infer failure.
Embedder Init Serialization
crates/tempyr-journal-index/src/embed.rs
Prevents concurrent embedder cold-starts with an INIT: Mutex<()>, double-checked locking around EMB.get(), and adds retry backoff (LAST_FAIL_MS, EMBED_RETRY_BACKOFF_MS) plus helper unix_epoch_ms().

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as CLI / MCP
    participant Search as Search Pipeline
    participant Reranker as Reranker (BGE)
    participant Results as Results

    User->>CLI: journal_search(query, --rerank)
    CLI->>Search: SearchOptions { rerank: true, ... }
    Search->>Search: BM25 + vec0 RRF (+ recency/kind boost)
    Search->>Search: Select top 50 candidates
    Search->>Reranker: rerank(query, documents...)
    Reranker->>Reranker: Load model if cold (OnceLock + mutex)
    Reranker-->>Search: Vec<f32] rerank scores
    Search->>Search: Replace head ordering by rerank scores
    Search->>Search: Update SearchHit.score & ScoreBreakdown.total
    Search-->>Results: Final ranked results
    Results-->>User: Return hits
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇
I sniffed the query in the night,
Fifty carrots lined in light,
I hop, I score, I softly reframe,
If models nap, the old order stays the same.
Relevance finds its cozy home tonight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding cross-encoder reranking to journal search as an opt-in feature, referenced as a v2 backlog item.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-cli/src/commands/journal_cmd.rs`:
- Around line 694-701: The plain-text --explain output omits the rerank
component so b.total (which becomes the cross-encoder score when rerank is
enabled) no longer matches the printed component breakdown; update the non-JSON
explain branch that assembles/prints the components list (the code that prints
"bm25/vector/rrf/recency/kind") to include "rerank" when the rerank flag (pub
rerank: bool) is set and ensure any textual sums or labels reflect that b.total
is the cross-encoder score; locate the block that checks explain/json mode and
the variable b.total and add logic to append "rerank" into the printed
components string (and adjust wording if it distinguishes total vs components)
so the human-readable explain output matches the actual scoring when rerank is
enabled.

In `@crates/tempyr-journal-index/src/rerank.rs`:
- Around line 112-137: The cold-start path needs to be serialized to avoid
duplicate model loads: add a static init mutex (e.g. static INIT:
std::sync::Mutex<()> = std::sync::Mutex::new(())) and, inside
try_shared_reranker(), acquire the INIT lock before calling Reranker::new();
immediately after acquiring the lock check RR.get() again and return if Some,
otherwise call Reranker::new(), set RR with the created Reranker, and then drop
the lock; preserve the existing WARNED behavior on Err and keep using
RR.get()/RR.set() so other threads still see the stored instance. Use the
existing symbols try_shared_reranker, RR, WARNED, and Reranker::new to locate
where to add the mutex and the second RR.get() check.

In `@docs/journal-spec.md`:
- Around line 384-385: The docs currently mark cross-encoder reranking as a
backlog item while code/CLI already exposes an opt-in --rerank flag for tempyr
journal search / journal_search using the BGE-Reranker-base over top-50 RRF
candidates; update the main retrieval spec (Section 9 "Slice 3b — Hybrid
retrieval") to fold this opt-in rerank stage into the canonical pipeline
description: describe that after initial retrieval and RRF fusion the system
will optionally run the BGE-Reranker-base cross-encoder over the top 50 RRF
candidates when --rerank is provided, note fallback behavior (use RRF order on
model-load or inference failure), and update any diagrams/examples to match the
new flow so the spec is internally consistent with the CLI and implementation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 17aaeb9d-e8b5-4f73-bb35-fbc34d6afd75

📥 Commits

Reviewing files that changed from the base of the PR and between fceea24 and 36c3fc0.

📒 Files selected for processing (8)
  • AGENTS.md
  • CLAUDE.md
  • crates/tempyr-cli/src/commands/journal_cmd.rs
  • crates/tempyr-journal-index/src/lib.rs
  • crates/tempyr-journal-index/src/rerank.rs
  • crates/tempyr-journal-index/src/search.rs
  • crates/tempyr-mcp/src/handler.rs
  • docs/journal-spec.md

Comment thread crates/tempyr-cli/src/commands/journal_cmd.rs
Comment thread crates/tempyr-journal-index/src/rerank.rs
Comment thread docs/journal-spec.md Outdated
Three findings on PR #33:

1. The plain-text `--explain` printer wasn't surfacing the rerank
   component, so when `--rerank` was on the printed bm25/vector/rrf/
   recency/kind sum no longer matched the displayed total (which had
   become the cross-encoder score). Branch the print on `args.rerank`:
   show "score: N = rerank (...components, informational only)" so
   the reader sees what's driving the order.

2. `try_shared_reranker` and `try_shared_embedder` had a cold-start
   race — under concurrent first calls (two MCP `journal_search`
   requests racing) the `OnceLock` alone let both threads start
   independent ~280 MB / ~80 MB downloads + ONNX warmups, of which
   only one's value would actually be stored. Add a static `Mutex<()>`
   barrier with double-checked locking around the load: hot path
   (slot already populated) still skips the lock; cold path acquires
   it, re-checks the slot, then runs the single load. Applied to
   both functions for symmetry — same pattern, same hazard.

3. The cross-encoder rerank stage was sitting in the v2 backlog
   section of `docs/journal-spec.md` even though the implementation
   matched the rest of the slice 3b retrieval pipeline. Moved the
   description into the Slice 3b — Hybrid retrieval pipeline list
   as the optional opt-in step it is, and dropped the backlog entry
   (replaced with an HTML comment pointer for changelog continuity).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-cli/src/commands/journal_cmd.rs`:
- Around line 792-806: The CLI should not assume reranking succeeded based on
the requested flag; change the print branch to inspect an explicit result-state
flag (e.g., a boolean like reranked on ScoreBreakdown or SearchHit) instead of
using args.rerank. Locate the printing code that references args.rerank and
switch it to check the result object (e.g., b.reranked or hit.reranked), and if
that flag does not exist add and populate a reranked field in
ScoreBreakdown/SearchHit in the search logic so the CLI can accurately display
"rerank" only when reranking actually occurred.

In `@crates/tempyr-journal-index/src/rerank.rs`:
- Around line 141-156: The code should short-circuit repeated cold-start
attempts by caching a recent failure timestamp: introduce an atomic last-failure
(e.g., an AtomicU64 or AtomicI64 like LAST_FAIL_MS) and a fixed backoff duration
(e.g., BACKOFF_MS). In the entry path where you call Reranker::new() (same spot
that references RR and WARNED), first read LAST_FAIL_MS and if now < last_fail +
BACKOFF_MS return None immediately; on Err(err) set LAST_FAIL_MS = now and then
run the existing WARNED swap/log and return None. Ensure reads/writes to
LAST_FAIL_MS use ordering appropriate for visibility (Relaxed is fine here) so
queued callers see the backoff and avoid repeated model loads.

In `@docs/journal-spec.md`:
- Around line 359-363: The doc currently claims Slice 3b and Phase 4 features
(items like the optional BGE-Reranker rerank flow, deduplication by (summary,
kind), token-budget greedy fill, the new MCP tool journal_search with
rerank/explain, and CLIs such as tempyr journal search/show/sessions/tail) are
implemented but the top-level status metadata and the phase table still show
Phases 3/4 as pending; update the top-level status metadata and the phase table
entries to mark Slice 3b and Phase 4 as completed (or change the overall
document-level status to “stale -> updated/implemented”), ensuring the Phase 4
bullets and Slice 3b text remain consistent with the updated status so readers
don’t see conflicting signals.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2f662810-cefc-4454-b882-1b813ab1b341

📥 Commits

Reviewing files that changed from the base of the PR and between 36c3fc0 and 5c1a0e9.

📒 Files selected for processing (4)
  • crates/tempyr-cli/src/commands/journal_cmd.rs
  • crates/tempyr-journal-index/src/embed.rs
  • crates/tempyr-journal-index/src/rerank.rs
  • docs/journal-spec.md

Comment thread crates/tempyr-cli/src/commands/journal_cmd.rs Outdated
Comment thread crates/tempyr-journal-index/src/rerank.rs
Comment thread docs/journal-spec.md
Three findings on PR #33:

1. The CLI `--explain` print branched on `args.rerank`, which is the
   *requested* state, not the *result* state — when rerank was asked
   for but the model failed to load and the search fell back to RRF,
   the printed line still claimed "= rerank". Add `reranked: bool` to
   `ScoreBreakdown`, set only by `apply_rerank_scores` when scores
   actually flow into the hit; CLI now branches on `b.reranked`.

2. After a failed cold-start load, `try_shared_*` would retry on
   every subsequent call. In a hard "no model" environment (no
   network, ONNX runtime missing) that means every search waits on
   the slow timeout serialized behind the INIT mutex. Add an atomic
   last-failure timestamp + 5-second backoff: cheap atomic read on
   the hot path skips the load if a recent attempt failed; on
   success, the stamp clears so transient failures still recover
   quickly. Applied to both `try_shared_reranker` and
   `try_shared_embedder` for symmetry — same hazard, same fix.

3. The spec's top-level status line claimed "Phases 1–2 implemented;
   3 and 4 pending" and the phase table marked both as "📋 Planned"
   even though Phase 3 (PRs #26, #27, #28) and Phase 4 (PRs #29–32)
   shipped weeks ago. Update both to ✅ Shipped with PR refs and
   note the cross-encoder rerank as a post-Phase-4 addition. Status
   summary now accurately reflects merged work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/tempyr-journal-index/src/search.rs (1)

208-212: ⚠️ Potential issue | 🟠 Major

--rerank never reaches 50 candidates in the default case.

With the default --limit 10, Line 211 still pulls only max(limit * 4, 40) rows, so the reranker tops out at 40 candidates even though RERANK_CANDIDATE_COUNT is 50. That silently reduces recall in the common path this feature is meant to improve.

Proposed fix
-    let pull_usize = opts.limit.max(1).saturating_mul(4).max(40);
+    let pull_usize = opts
+        .limit
+        .max(1)
+        .saturating_mul(4)
+        .max(40)
+        .max(if opts.rerank {
+            RERANK_CANDIDATE_COUNT
+        } else {
+            0
+        });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tempyr-journal-index/src/search.rs` around lines 208 - 212, The
current pull size calculation (pull_usize/pull_i64) uses
opts.limit.max(1).saturating_mul(4).max(40) which for the default --limit 10
yields 40 and prevents the reranker from ever seeing RERANK_CANDIDATE_COUNT
(50); update the pull_usize computation to ensure it is at least
RERANK_CANDIDATE_COUNT (e.g. take the max of the existing expression and
RERANK_CANDIDATE_COUNT), then convert to i64 exactly as before (pull_i64 =
i64::try_from(pull_usize).unwrap_or(i64::MAX)) so the reranker can receive up to
the configured RERANK_CANDIDATE_COUNT.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-journal-index/src/search.rs`:
- Around line 382-386: The current rerank branch uses opts.rerank alone and may
rerank BM25-only fallbacks; change the condition so maybe_apply_rerank is only
called when both opts.rerank is true and vector-based retrieval is available
(e.g., hybrid_mode is true or opts.query_vector.is_some()); update the branch
around deduped to check that hybrid_mode || opts.query_vector.is_some() (in
addition to opts.rerank) before invoking maybe_apply_rerank to ensure pure BM25
results remain untouched.

In `@docs/journal-spec.md`:
- Around line 7-8: The document header now claims Phases 1–4 are implemented but
later text still uses future tense (e.g., phrases like "no SQLite (yet — Phase
3)", "Phase 4 will add ...", "worth answering before implementing 3a"); update
those passages in docs/journal-spec.md to reflect current status by either
converting future-tense statements into past-tense implementation notes or
explicitly marking them as historical/archival (e.g., "Historical: Phase 3 added
SQLite" or "Historical note: Phase 4 introduced X"), and ensure occurrences of
"Phase 3", "Phase 4", and "3a" (search for those strings) are reconciled so the
doc consistently reports shipped features rather than future work.

---

Outside diff comments:
In `@crates/tempyr-journal-index/src/search.rs`:
- Around line 208-212: The current pull size calculation (pull_usize/pull_i64)
uses opts.limit.max(1).saturating_mul(4).max(40) which for the default --limit
10 yields 40 and prevents the reranker from ever seeing RERANK_CANDIDATE_COUNT
(50); update the pull_usize computation to ensure it is at least
RERANK_CANDIDATE_COUNT (e.g. take the max of the existing expression and
RERANK_CANDIDATE_COUNT), then convert to i64 exactly as before (pull_i64 =
i64::try_from(pull_usize).unwrap_or(i64::MAX)) so the reranker can receive up to
the configured RERANK_CANDIDATE_COUNT.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 43f48fc6-da93-4c72-84d8-53dab0de7eb6

📥 Commits

Reviewing files that changed from the base of the PR and between 5c1a0e9 and 39f0a3f.

📒 Files selected for processing (5)
  • crates/tempyr-cli/src/commands/journal_cmd.rs
  • crates/tempyr-journal-index/src/embed.rs
  • crates/tempyr-journal-index/src/rerank.rs
  • crates/tempyr-journal-index/src/search.rs
  • docs/journal-spec.md

Comment on lines +382 to +386
let deduped = if opts.rerank {
maybe_apply_rerank(deduped, &opts.query)
} else {
deduped
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't rerank BM25-only fallbacks.

This branch keys only off opts.rerank. If query embedding is unavailable, the function still pays reranker load/inference cost on BM25-only candidates, which breaks the documented "rerank the RRF head" contract and turns embedder fallback into a slower second-model path. Gate this on hybrid_mode (or opts.query_vector.is_some()) and leave pure BM25 results untouched when vector retrieval is unavailable.

Proposed fix
-    let deduped = if opts.rerank {
-        maybe_apply_rerank(deduped, &opts.query)
+    let deduped = if opts.rerank && hybrid_mode {
+        maybe_apply_rerank(deduped, trimmed)
     } else {
         deduped
     };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tempyr-journal-index/src/search.rs` around lines 382 - 386, The
current rerank branch uses opts.rerank alone and may rerank BM25-only fallbacks;
change the condition so maybe_apply_rerank is only called when both opts.rerank
is true and vector-based retrieval is available (e.g., hybrid_mode is true or
opts.query_vector.is_some()); update the branch around deduped to check that
hybrid_mode || opts.query_vector.is_some() (in addition to opts.rerank) before
invoking maybe_apply_rerank to ensure pure BM25 results remain untouched.

Comment thread docs/journal-spec.md
Two of three findings fixed; pushing back on the third.

1. **pull_usize too small for the reranker (valid).** Default
   `--limit 10` produced `pull_usize = 40`, less than
   `RERANK_CANDIDATE_COUNT = 50`. The reranker would never see the
   full 50-candidate pool it's sized for. Bump pull_usize to at least
   `RERANK_CANDIDATE_COUNT` when `opts.rerank` is on; non-rerank
   queries keep the existing 4× heuristic so we don't pull more than
   we need.

2. **Future-tense passages in the spec (valid).** Reconciled the doc
   with the merged status: `Phase 3 builds` → `Phase 3 built`, both
   phase headers gain "(shipped — PRs ...)" markers, "no SQLite (yet
   — Phase 3)" rewritten to describe the derived index that's now
   live, the "Phase 4 will add a journal report" line points at the
   v2 backlog instead, and "Open Questions for Phase 3" is now
   labelled historical with each item's actual resolution recorded
   inline. Bottom-of-doc PRs list now includes #26-32.

3. **Skip rerank when query_vector is None (declined).** The
   cross-encoder is a relevance scorer that takes (query, document)
   pairs — it works on any candidate pool, BM25-only or hybrid. The
   user opting in via `--rerank` explicitly asks for that scorer, and
   even a pure-BM25 candidate list benefits from the cross-encoder's
   semantic ordering. Reranker availability is also independent of
   embedder availability (different model files, different load
   paths), so "embedder unavailable" doesn't imply "skip the rerank
   attempt". Posting an explanation on the PR rather than applying
   the change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cleak
Copy link
Copy Markdown
Owner Author

cleak commented Apr 29, 2026

Pushing back on the "only rerank in hybrid mode" finding — I think the suggested change would be a regression.

The cross-encoder is a relevance scorer that takes (query, document) pairs and scores them directly. It doesn't depend on a vector being available; it works equally well on a BM25-only candidate set. In fact, that's where it earns its keep — BM25 produces lexically-matching hits, and the cross-encoder is exactly the thing that re-ranks those by semantic relevance.

Skipping rerank when `query_vector` is None would also be surprising UX: the user explicitly opted into `--rerank` and silently getting back unreranked BM25 results would be a footgun. And reranker availability isn't tied to embedder availability — they're separate model files with separate load paths, so "embedder unavailable → skip rerank attempt" isn't a sound implication. The reranker's own load path already handles the no-model environment cheaply (warn-once + 5s backoff after the first failure).

If you have a case in mind where reranking BM25-only candidates produces worse results than not reranking, I'd love to see it — happy to add a guard if there's a real failure mode. Otherwise leaving the rerank pass independent of the candidate-source signal.

The other two findings (`pull_usize` under-feeding the reranker, future-tense doc passages) are fixed in `930a244`.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-journal-index/src/search.rs`:
- Around line 413-421: Currently the code calls crate::try_shared_reranker()
before checking whether there are any hits, causing the reranker to load even
when n == 0; change the logic so you compute n =
hits.len().min(RERANK_CANDIDATE_COUNT) and immediately return hits if n == 0,
and only after that call crate::try_shared_reranker() and bind Some(reranker);
keep the existing comment about the warn-once behavior and ensure the early
return happens before any attempt to load the reranker (references: hits,
RERANK_CANDIDATE_COUNT, try_shared_reranker, reranker).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 461ef8d2-3fe5-41b8-b42b-cc155683ea18

📥 Commits

Reviewing files that changed from the base of the PR and between 39f0a3f and 930a244.

📒 Files selected for processing (2)
  • crates/tempyr-journal-index/src/search.rs
  • docs/journal-spec.md

Comment thread crates/tempyr-journal-index/src/search.rs Outdated
`maybe_apply_rerank` was calling `try_shared_reranker()` before
checking whether `hits` had anything to score. A zero-result search
with `--rerank` would still trigger the ~280 MB model download on
first call (and pay the slow timeout in a no-network environment
even after the load failure backoff kicks in). Swap the order:
compute `n = hits.len().min(RERANK_CANDIDATE_COUNT)`, early-return
on `n == 0`, *then* try to grab the reranker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cleak cleak merged commit 84bccc8 into master Apr 29, 2026
1 check passed
@cleak cleak deleted the claude/journal-cross-encoder-rerank branch April 29, 2026 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant