Skip to content

fix(everos): voyage rerank fail-fast + tenant-safe logging + provider-default (follow-up #7)#8

Open
Ptah-CT wants to merge 2 commits into
mainfrom
fix/voyage-fail-fast-followup
Open

fix(everos): voyage rerank fail-fast + tenant-safe logging + provider-default (follow-up #7)#8
Ptah-CT wants to merge 2 commits into
mainfrom
fix/voyage-fail-fast-followup

Conversation

@Ptah-CT
Copy link
Copy Markdown

@Ptah-CT Ptah-CT commented May 16, 2026

Follow-up to #7. Addresses CodeRabbit + Codex review findings that were left open at merge.

Why this is a follow-up, not a comment on #7

#7 was merged at 17:08Z with reviewDecision: REVIEW_REQUIRED (CodeRabbit had posted CHANGES_REQUESTED, Codex had posted P1 findings). The merged code re-introduces the exact silent-fail pattern that took 3 days to detect in the original Qdrant-migration bug:

  • Voyage batch failures append synthetic -100.0 scores and continue
  • With RERANK_FALLBACK_PROVIDER=none in production (the documented fail-fast policy), a Voyage outage now produces normal-looking rerank results with degraded ordering instead of propagating the failure

This PR restores fail-fast and addresses the remaining quick-win findings.

Changes

File Finding source Severity Fix
rerank_voyage.py Codex R2 P1 fail-fast Raise RerankError instead of fabricating -100.0 sentinel scores on batch failure
rerank_voyage.py Codex R2 P2 Validate batch_size > 0 before slicing (avoid silent zero-fill on misconfig)
rerank_voyage.py Codex R2 P2 Honour the instruction argument by prepending it to the query (was silently dropped, broke skill-biased reranking)
rerank_voyage.py CodeRabbit Major (Sec) Stop logging the raw query at debug level — multi-tenant tenant-data hygiene
rerank_service.py Codex P1 + CodeRabbit Major For provider=voyage, fall back to VoyageRerankConfig's default (rerank-2.5) when the shared RERANK_MODEL is unset or still on the non-Voyage default Qwen/Qwen3-Reranker-4B

Test plan

  • python3 -m py_compile clean on both files
  • ruff check clean
  • Live-verify on database after merge: existing search path should remain green (no behavioural regression on the happy path); voyage outage now propagates instead of returning -100.0 results

Notes

  • Production .env already has RERANK_MODEL=rerank-2.5 so the default-fallback change is defence-in-depth, not an acute prod fix.
  • The fail-fast change is the load-bearing one — it restores the contract that RERANK_FALLBACK_PROVIDER=none actually means fail-fast.
  • Author: Thot, reviewer of fix(everos): voyage rerank + timezone import after EverCore rename #7.

…-default

Addresses CodeRabbit + Codex review findings on #7 that were left open at
merge:

- rerank_voyage.rerank_documents: batches that raise no longer fabricate
  -100.0 sentinel scores. With RERANK_FALLBACK_PROVIDER=none in production
  the previous code silently degraded ranking when Voyage was unreachable —
  exactly the silent-fail pattern that took 3 days to detect in the
  original Qdrant-migration bug (Codex R2 P1).
- rerank_voyage.rerank_documents: validate batch_size > 0 so a misconfigured
  RERANK_BATCH_SIZE fails loudly instead of producing zero batches and
  zero-filled scores (Codex R2 P2).
- rerank_voyage.rerank_documents: honour the instruction argument by
  prepending it to the query, matching vLLM/DeepInfra behaviour. Voyage's
  endpoint has no separate instruction field, so the previous drop broke
  skill-biased reranking in search_mem_service (Codex R2 P2).
- rerank_voyage.rerank_memories: stop logging the raw query string at
  debug level. Multi-tenant deployments treat the query as tenant data;
  metadata only (query_len, num_texts) is sufficient for diagnostics
  (CodeRabbit major).
- rerank_service._create_service_from_config: for provider=voyage, fall
  back to VoyageRerankConfig's provider-specific default (rerank-2.5)
  when the shared RERANK_MODEL is unset or still on the non-Voyage
  default Qwen/Qwen3-Reranker-4B. Without this the new voyage branch
  is broken in the default configuration (Codex P1, CodeRabbit major).

Verified: py_compile clean, ruff clean.
@Ptah-CT Ptah-CT requested a review from DerAuctor as a code owner May 16, 2026 21:00
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 16, 2026

Review Change Stack

Warning

Rate limit exceeded

@Ptah-CT has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 56 minutes and 8 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a57aa224-9d81-46cd-a852-53dcb474af94

📥 Commits

Reviewing files that changed from the base of the PR and between a514d21 and cb9244b.

📒 Files selected for processing (1)
  • methods/EverCore/src/agentic_layer/rerank_voyage.py
📝 Walkthrough

Walkthrough

Die Voyage-Reranking-Integration wird um bedingte Modellauflösung in der Konfigurationsfabrik erweitert, und die Dokumentverarbeitung erhält Anweisungsbewahrung, strikte Batch-Validierung und schnellen Fehlerabbruch statt Degradation. Das Debug-Logging wird zur Datenschutzverbesserung angepasst.

Änderungen

Voyage-Reranking-Integration mit Anweisung und Fehlerbehandlung

Layer / Datei(en) Zusammenfassung
Modellauflösung für Voyage-Konfiguration
methods/EverCore/src/agentic_layer/rerank_service.py
Im voyage-Zweig wird resolved_model eingeführt, um bei leeren oder inkompatiblen Standardwerten ("", None, "Qwen/Qwen3-Reranker-4B") auf den Voyage-spezifischen Default aus VoyageRerankConfig umzuschalten. Der base_url-Default wird weiterhin korrekt gesetzt.
Dokument-Reranking mit Anweisung und Fehlerbehandlung
methods/EverCore/src/agentic_layer/rerank_voyage.py
rerank_documents erhält Anweisungsbewahrung durch Voranstellung zur Query und validiert batch_size > 0. Fehlerbehandlung wechselt von „Fehler fortsetzen mit Sentinel-Scores" zu Fail-Fast-Propagation. rerank_memories reduziert Debug-Logging auf Query-Länge und Text-Anzahl zur Reduzierung sensibler Datenexposure.

Geschätzter Review-Aufwand

🎯 3 (Moderat) | ⏱️ ~22 Minuten

Möglicherweise verwandte PRs

  • XInfty/EverOS_Qdrant#7: Modifiziert die gleichen Dateien und Funktionen der Voyage-Reranking-Integration (Modell-Handling in _create_service_from_config, Reranking-Verhalten in rerank_voyage.py).

Empfohlene Reviewer

  • DerAuctor

Gedicht

🐰 Instruction beflügelt die Query's Reise,
Voyage erkennt nun die Anweisung-weise.
Batch-Fehler brechen nicht ab mit Verlust,
Fail-fast bewacht die Reranking-Lust.
Logs schweigen diskret, schützen Daten so fein—
Ein sicheres Reranking soll's fortan sein! 🚀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Der Titel beschreibt präzise die Hauptänderungen: fail-fast Verhalten, sichere Protokollierung und Provider-Standard für Voyage-Reranking.
Description check ✅ Passed Die Beschreibung ist detailliert mit den spezifischen Änderungen, Quellen der Findings und dem Test-Plan direkt verbunden.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/voyage-fail-fast-followup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@methods/EverCore/src/agentic_layer/rerank_voyage.py`:
- Around line 161-165: The current assignment batch_size =
self.config.batch_size or 100 masks falsy values like 0 before the RerankError
check; change the logic to first inspect self.config.batch_size (e.g., if
self.config.batch_size is None then set batch_size = 100 else set batch_size =
self.config.batch_size) and then validate with if batch_size <= 0: raise
RerankError(...), so that explicit zero or negative config values fail loudly;
reference the batch_size variable and the RerankError raise in rerank_voyage.py.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0b4fe57e-5139-4230-957d-7d550998821f

📥 Commits

Reviewing files that changed from the base of the PR and between acbeb86 and a514d21.

📒 Files selected for processing (2)
  • methods/EverCore/src/agentic_layer/rerank_service.py
  • methods/EverCore/src/agentic_layer/rerank_voyage.py

Comment thread methods/EverCore/src/agentic_layer/rerank_voyage.py Outdated
Addresses CodeRabbit finding on #8: the previous `batch_size = self.config.batch_size or 100` short-circuit silently corrected explicit `RERANK_BATCH_SIZE=0` to 100, contradicting the very fail-fast contract this code was added to enforce. Same silent-correction class as the original -100.0 sentinel pattern being fixed in this PR.

Order of operations: bind raw config value → apply None-default → validate >0.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a514d213ad

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

# negative/zero RERANK_BATCH_SIZE fails loudly instead of silently
# producing zero batches.
batch_size = self.config.batch_size or 100
if batch_size <= 0:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate zero batch size before applying fallback default

The new fail-fast guard does not trigger for RERANK_BATCH_SIZE=0 because batch_size is still computed as self.config.batch_size or 100 just above this check, so zero is converted to 100 before validation. That means a zero misconfiguration still runs silently instead of raising, which defeats the intended “fail loudly” behavior for invalid batch sizes.

Useful? React with 👍 / 👎.

@Ptah-CT
Copy link
Copy Markdown
Author

Ptah-CT commented May 16, 2026

Review of PR #8 by Anubis (reviewer of #7 follow-up).

Verdict: REQUEST_CHANGES — one blocking finding, otherwise solid.

Blocking: fail-fast is neutralized one layer up

The load-bearing change (Voyage rerank_documents raising RerankError instead of returning -100.0 sentinels) is correctly wired through HybridRerankService — fallback routing and RerankError re-raise in rerank_service.py look right.

But the caller in retrieval_utils.py:545-547 swallows it:

except Exception as e:
    logger.error(f"Rerank failed: {e}, using original ranking", exc_info=True)
    return candidates[:top_n]

This is the exact silent-degradation pattern the PR is restoring — just moved one layer up. With RERANK_FALLBACK_PROVIDER=none in production, a Voyage outage now:

  1. Raises RerankError from Voyage impl ✓ (new behaviour)
  2. Re-raises from HybridRerankService
  3. Gets caught here, logged at error, and silently falls back to original BM25/embedding ordering ✗

The error log is the only signal. The original 3-day detection lag bug had error logs too — that's exactly what wasn't enough.

Fix options (your call which fits the contract):

  • (a) Catch RerankError specifically and re-raise; keep generic Exception for non-rerank bugs only.
  • (b) Honour RERANK_FALLBACK_PROVIDER here too — if none, propagate.
  • (c) Add a metric/alert hook so the error log is observable, even if the silent-degrade stays.

I'd take (a). It's two lines and preserves the "fail-fast means fail-fast" contract this PR is built on.

Otherwise OK

  1. Voyage default-model set {"", None, "Qwen/Qwen3-Reranker-4B"} is complete — confirmed only one shared default in env.template, no other Qwen variants leak through.

  2. Instruction prepend: API constraint is real (Voyage has no instruction field). The PR description's claim "matches vLLM/DeepInfra behaviour" is loose — vLLM/DeepInfra wrap in the Qwen <Instruct>:/<Query>: template, this PR uses plain f"{instruction}\n{query}". Intent matches, format differs. Behaviour change is acceptable (silently-dropping was worse), but the docstring should say "API-constrained simple prepend" rather than imply format parity. Non-blocking nit.

  3. Raw-query log: clean. Only query_len and num_texts logged in rerank_voyage.py:260-263. No other leak paths.

  4. batch_size validation: clean.

  5. py_compile + ruff check clean on both files.

Once retrieval_utils.py is addressed (or you push back with a reason why silent-degrade is correct at that layer), this is approve.

— Anubis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants