Skip to content

fix: streaming robustness, RRF list priority, and CLI suggestion quality#268

Merged
BYK merged 1 commit into
mainfrom
fix/streaming-robustness-and-review-fixes
May 12, 2026
Merged

fix: streaming robustness, RRF list priority, and CLI suggestion quality#268
BYK merged 1 commit into
mainfrom
fix/streaming-robustness-and-review-fixes

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 12, 2026

Summary

Follow-up fixes from self-review of PR #267. Addresses 2 critical, 2 moderate, and 1 minor issue found during code review.

Critical: Streaming translator robustness (C1, C2)

C1 — Responses API error handling: The translateAnthropicStreamToResponses translator silently closed the stream on upstream errors, leaving clients hanging with no terminal event. Now emits a response.failed event with error details before closing.

C2 — Client disconnect handling: Neither streaming translator had a cancel() handler. If a client disconnected mid-stream, the upstream reader continued consuming the full response into the void, wasting bandwidth. Both translators now:

  • Implement cancel() to cancel the upstream Response.body reader
  • Use a safeEnqueue() pattern that sets a cancelled flag on enqueue failure
  • Check cancelled at the top of each loop iteration to break early

Moderate: CLI suggestion quality (M4)

The "did you mean?" matching used a 2-character prefix check, which was far too loose (e.g., lore hi would suggest help). Replaced with Levenshtein distance with a max-distance threshold of max(2, floor(len/2)), giving accurate suggestions for typos.

Moderate: RRF list cap priority (m4)

With query expansion producing 3 queries, the per-query BM25 lists (9+) filled the MAX_RRF_LISTS=10 cap before vector search, lat.md, cross-project, quality re-ranking, and exact-match boost lists were added — dropping all high-value supplemental lists. Fixed by tracking list boundaries:

  • Primary lists (original query BM25 + recency): always kept
  • Supplemental lists (vector, lat.md, cross-project, quality, exact-match): always kept
  • Expanded-query lists: trimmed first when over budget

Minor: Documentation (M6)

Added a comment documenting that /v1/models passthrough only supports the Anthropic upstream.

Verification

  • Typecheck: all 4 packages pass
  • Tests: 1254 pass, 0 fail

- Add cancel() handlers to both OpenAI streaming translators to stop
  upstream reads on client disconnect (C2)
- Emit response.failed event in Responses API translator on error
  instead of silently closing the stream, preventing client hangs (C1)
- Use safeEnqueue pattern in both translators to gracefully handle
  enqueue-after-cancel without throwing (C2)
- Replace naive 2-char prefix CLI suggestion matching with Levenshtein
  distance for accurate 'did you mean?' suggestions (M4)
- Fix RRF list cap to trim expanded-query lists first, preserving
  high-value vector, lat.md, cross-project, and exact-match lists (m4)
- Document /v1/models endpoint Anthropic-only limitation (M6)
@BYK BYK merged commit bdf8b60 into main May 12, 2026
7 checks passed
@BYK BYK deleted the fix/streaming-robustness-and-review-fixes branch May 12, 2026 20:26
This was referenced May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant