fix(everos): voyage rerank fail-fast + tenant-safe logging + provider-default (follow-up #7)#8
fix(everos): voyage rerank fail-fast + tenant-safe logging + provider-default (follow-up #7)#8Ptah-CT wants to merge 2 commits into
Conversation
…-default Addresses CodeRabbit + Codex review findings on #7 that were left open at merge: - rerank_voyage.rerank_documents: batches that raise no longer fabricate -100.0 sentinel scores. With RERANK_FALLBACK_PROVIDER=none in production the previous code silently degraded ranking when Voyage was unreachable — exactly the silent-fail pattern that took 3 days to detect in the original Qdrant-migration bug (Codex R2 P1). - rerank_voyage.rerank_documents: validate batch_size > 0 so a misconfigured RERANK_BATCH_SIZE fails loudly instead of producing zero batches and zero-filled scores (Codex R2 P2). - rerank_voyage.rerank_documents: honour the instruction argument by prepending it to the query, matching vLLM/DeepInfra behaviour. Voyage's endpoint has no separate instruction field, so the previous drop broke skill-biased reranking in search_mem_service (Codex R2 P2). - rerank_voyage.rerank_memories: stop logging the raw query string at debug level. Multi-tenant deployments treat the query as tenant data; metadata only (query_len, num_texts) is sufficient for diagnostics (CodeRabbit major). - rerank_service._create_service_from_config: for provider=voyage, fall back to VoyageRerankConfig's provider-specific default (rerank-2.5) when the shared RERANK_MODEL is unset or still on the non-Voyage default Qwen/Qwen3-Reranker-4B. Without this the new voyage branch is broken in the default configuration (Codex P1, CodeRabbit major). Verified: py_compile clean, ruff clean.
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughDie Voyage-Reranking-Integration wird um bedingte Modellauflösung in der Konfigurationsfabrik erweitert, und die Dokumentverarbeitung erhält Anweisungsbewahrung, strikte Batch-Validierung und schnellen Fehlerabbruch statt Degradation. Das Debug-Logging wird zur Datenschutzverbesserung angepasst. ÄnderungenVoyage-Reranking-Integration mit Anweisung und Fehlerbehandlung
Geschätzter Review-Aufwand🎯 3 (Moderat) | ⏱️ ~22 Minuten Möglicherweise verwandte PRs
Empfohlene Reviewer
Gedicht
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@methods/EverCore/src/agentic_layer/rerank_voyage.py`:
- Around line 161-165: The current assignment batch_size =
self.config.batch_size or 100 masks falsy values like 0 before the RerankError
check; change the logic to first inspect self.config.batch_size (e.g., if
self.config.batch_size is None then set batch_size = 100 else set batch_size =
self.config.batch_size) and then validate with if batch_size <= 0: raise
RerankError(...), so that explicit zero or negative config values fail loudly;
reference the batch_size variable and the RerankError raise in rerank_voyage.py.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0b4fe57e-5139-4230-957d-7d550998821f
📒 Files selected for processing (2)
methods/EverCore/src/agentic_layer/rerank_service.pymethods/EverCore/src/agentic_layer/rerank_voyage.py
Addresses CodeRabbit finding on #8: the previous `batch_size = self.config.batch_size or 100` short-circuit silently corrected explicit `RERANK_BATCH_SIZE=0` to 100, contradicting the very fail-fast contract this code was added to enforce. Same silent-correction class as the original -100.0 sentinel pattern being fixed in this PR. Order of operations: bind raw config value → apply None-default → validate >0.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a514d213ad
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # negative/zero RERANK_BATCH_SIZE fails loudly instead of silently | ||
| # producing zero batches. | ||
| batch_size = self.config.batch_size or 100 | ||
| if batch_size <= 0: |
There was a problem hiding this comment.
Validate zero batch size before applying fallback default
The new fail-fast guard does not trigger for RERANK_BATCH_SIZE=0 because batch_size is still computed as self.config.batch_size or 100 just above this check, so zero is converted to 100 before validation. That means a zero misconfiguration still runs silently instead of raising, which defeats the intended “fail loudly” behavior for invalid batch sizes.
Useful? React with 👍 / 👎.
|
Review of PR #8 by Anubis (reviewer of #7 follow-up). Verdict: REQUEST_CHANGES — one blocking finding, otherwise solid. Blocking: fail-fast is neutralized one layer upThe load-bearing change (Voyage But the caller in except Exception as e:
logger.error(f"Rerank failed: {e}, using original ranking", exc_info=True)
return candidates[:top_n]This is the exact silent-degradation pattern the PR is restoring — just moved one layer up. With
The error log is the only signal. The original 3-day detection lag bug had error logs too — that's exactly what wasn't enough. Fix options (your call which fits the contract):
I'd take (a). It's two lines and preserves the "fail-fast means fail-fast" contract this PR is built on. Otherwise OK
Once — Anubis |
Follow-up to #7. Addresses CodeRabbit + Codex review findings that were left open at merge.
Why this is a follow-up, not a comment on #7
#7 was merged at 17:08Z with
reviewDecision: REVIEW_REQUIRED(CodeRabbit had posted CHANGES_REQUESTED, Codex had posted P1 findings). The merged code re-introduces the exact silent-fail pattern that took 3 days to detect in the original Qdrant-migration bug:-100.0scores and continueRERANK_FALLBACK_PROVIDER=nonein production (the documented fail-fast policy), a Voyage outage now produces normal-looking rerank results with degraded ordering instead of propagating the failureThis PR restores fail-fast and addresses the remaining quick-win findings.
Changes
rerank_voyage.pyRerankErrorinstead of fabricating-100.0sentinel scores on batch failurererank_voyage.pybatch_size > 0before slicing (avoid silent zero-fill on misconfig)rerank_voyage.pyinstructionargument by prepending it to the query (was silently dropped, broke skill-biased reranking)rerank_voyage.pyrerank_service.pyprovider=voyage, fall back toVoyageRerankConfig's default (rerank-2.5) when the sharedRERANK_MODELis unset or still on the non-Voyage defaultQwen/Qwen3-Reranker-4BTest plan
python3 -m py_compileclean on both filesruff checkcleandatabaseafter merge: existing search path should remain green (no behavioural regression on the happy path); voyage outage now propagates instead of returning -100.0 resultsNotes
.envalready hasRERANK_MODEL=rerank-2.5so the default-fallback change is defence-in-depth, not an acute prod fix.RERANK_FALLBACK_PROVIDER=noneactually means fail-fast.