fix: streaming robustness, RRF list priority, and CLI suggestion quality#268
Merged
Conversation
- Add cancel() handlers to both OpenAI streaming translators to stop upstream reads on client disconnect (C2) - Emit response.failed event in Responses API translator on error instead of silently closing the stream, preventing client hangs (C1) - Use safeEnqueue pattern in both translators to gracefully handle enqueue-after-cancel without throwing (C2) - Replace naive 2-char prefix CLI suggestion matching with Levenshtein distance for accurate 'did you mean?' suggestions (M4) - Fix RRF list cap to trim expanded-query lists first, preserving high-value vector, lat.md, cross-project, and exact-match lists (m4) - Document /v1/models endpoint Anthropic-only limitation (M6)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up fixes from self-review of PR #267. Addresses 2 critical, 2 moderate, and 1 minor issue found during code review.
Critical: Streaming translator robustness (C1, C2)
C1 — Responses API error handling: The
translateAnthropicStreamToResponsestranslator silently closed the stream on upstream errors, leaving clients hanging with no terminal event. Now emits aresponse.failedevent with error details before closing.C2 — Client disconnect handling: Neither streaming translator had a
cancel()handler. If a client disconnected mid-stream, the upstream reader continued consuming the full response into the void, wasting bandwidth. Both translators now:cancel()to cancel the upstreamResponse.bodyreadersafeEnqueue()pattern that sets acancelledflag on enqueue failurecancelledat the top of each loop iteration to break earlyModerate: CLI suggestion quality (M4)
The "did you mean?" matching used a 2-character prefix check, which was far too loose (e.g.,
lore hiwould suggesthelp). Replaced with Levenshtein distance with a max-distance threshold ofmax(2, floor(len/2)), giving accurate suggestions for typos.Moderate: RRF list cap priority (m4)
With query expansion producing 3 queries, the per-query BM25 lists (9+) filled the MAX_RRF_LISTS=10 cap before vector search, lat.md, cross-project, quality re-ranking, and exact-match boost lists were added — dropping all high-value supplemental lists. Fixed by tracking list boundaries:
Minor: Documentation (M6)
Added a comment documenting that
/v1/modelspassthrough only supports the Anthropic upstream.Verification