fix: improve multi-provider streaming robustness (Gemini, OpenAI-compatible, OpenRouter)#32
Merged
Merged
Conversation
- agent-loop.mjs: run multiple tool calls from one LLM turn in parallel (Promise.all) instead of sequentially. Permission checks/hooks remain sequential (they may show UI dialogs). Tool execution, verifyWrites, and post-hooks all run concurrently; results are yielded in original order. Measured: two 200ms tools now complete in ~200ms instead of ~400ms. - chat.js: replace per-frame (60fps) requestAnimationFrame throttle with a 100ms setTimeout throttle for markdown re-renders during streaming. This caps CPU-intensive renderMarkdown() at ~10 calls/s, eliminating jank on long responses. The scroll is also batched into the same timeout callback to avoid a second layout pass. Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/af8f1e4f-7656-415a-bc7e-df5f883b20ae Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>
…heck - agent-loop.mjs: add `anthropic-beta: prompt-caching-2024-07-31` header to Anthropic API calls so the cache_control blocks already present on the system prompt are actually honoured. Without this header Anthropic silently ignores cache_control, causing the full system prompt (~2440 tokens) to be re-processed from scratch on every turn. With it, subsequent turns in a session skip that processing entirely, cutting hundreds of ms off TTFT. - agent-loop.mjs: throttle detectRepetition to every 50 chars of new text instead of on every single streaming token. The function's inner loop does up to 27 indexOf() calls over 800 chars; calling it on every 1-3 char delta added unnecessary CPU work in the hot path. - registry.mjs: cache the tools.list() result and invalidate only when tools are registered/added. tools.list() was rebuilding a 30-element array from scratch on every API call (multiple times per user turn). Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/83d92820-acac-4cc5-9fb4-19a2ed53a813 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>
…Bottom usage) Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/83d92820-acac-4cc5-9fb4-19a2ed53a813 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>
Six targeted fixes to cut latency on every agent turn:
1. CUSTOM_PROVIDERS_JSON parse cache (agent-loop.mjs)
- findCustomProvider() called 2-3x per turn, each call did JSON.parse()
- Now caches result in _customProvidersCache, re-parses only when env changes
2. Module-level RateLimiter instances (agent-loop.mjs)
- new RateLimiter({...}) in callAnthropic/callOpenAI/callGoogle/callNvidia/callCustomProvider
- Hoisted to _rlAnthropic/_rlOpenAI/_rlGoogle/_rlNvidia/_rlCustom; reset() on each call
3. OpenAI tool-definition format cache (agent-loop.mjs + registry.mjs)
- toolDefs.map(t => ({type:'function',...})) rebuilt on every API call
- toOpenAITools() caches by array identity; stable because tools.list() is cached
- registry.listOpenAI() added as complementary cached accessor
4. shouldCompact gating (agent-loop.mjs)
- Called unconditionally before every API call; addMessage() already runs it
- Now gated to options.continuation=true only (tool-result continuation turns)
5. TurnClassifier.classify array allocation (turn-classifier.mjs)
- messages.filter(m => m.role==='user') allocated intermediate array every turn
- Replaced with reverse for-loop that short-circuits at first user message
6. TurnClassifier._extractText array allocation (turn-classifier.mjs)
- filter+map+join chain allocated two intermediate arrays
- Replaced with single for-loop with early concatenation
Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/69f9b44c-9440-4906-b90e-569d41d94644
Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>
Fix 1: callOpenAI — use buildOpenAIMessages() instead of inline builder
The old builder emitted only tool_result blocks, losing the assistant
tool_calls that must precede them. Multi-turn conversations where the
model invoked any tool were silently broken. Now uses the same full
message builder as callNvidia and callCustomProvider.
Fix 2: callOpenAI — add max_tokens to request body
Without max_tokens OpenAI uses the model default (4096 on older models)
instead of the configured limit. Also adds stream_options.include_usage
so token counts are available from streaming responses.
Fix 3: callAnthropic — honour ANTHROPIC_BASE_URL env var
The URL was hardcoded. Users with proxy setups or self-hosted Anthropic
endpoints (via ANTHROPIC_BASE_URL) were always hitting the public API.
Fix 4: detectProvider — cover all OpenAI reasoning model variants
Old code checked for 'o1' and 'o3' prefixes, missing o4-mini, o4, and
any future o5+ models. Replaced with /^o\d/ regex that matches any
single-letter 'o' followed by a digit.
Fix 5: callGoogle — functionResponse.name must be the function name
The Gemini API requires functionResponse.name to equal the original
functionCall.name. The code was passing the call ID instead (e.g.
'Read_1'), causing Gemini to reject or mis-route tool responses.
Now strips the trailing _<counter> suffix to recover the function name.
Fix 6: convertOpenAIResponse — guard JSON.parse with try/catch
Malformed tool argument JSON from the API would throw uncaught and
crash the agent loop. Matches the existing defensive pattern in
convertNvidiaResponse.
Fix 7: providers.mjs model catalogues updated
- Anthropic: add claude-opus/sonnet/haiku-4-5 and 4-6 series
- OpenAI: add gpt-4.1/mini/nano, o4-mini, o3, o1; remove stale names
- Google: replace placeholder gemini-3-flash-preview with real
gemini-2.5-pro/flash, gemini-2.0-flash/pro, gemini-1.5-pro/flash
- getProvider() in providers.mjs updated to use /^o\d/ like agent-loop
Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/942b0273-37a6-41a9-91f1-8a22a3589726
Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>
- Hoist ANTHROPIC_BASE_URL resolution to module-level (_anthropicBaseUrl) to avoid string ops on every API call - Extract extractGoogleToolName() helper used by both callGoogle and the comment in accumulateGoogleStream to keep the ID format in sync - Tighten provider-detection regex: /^o\d+(-|$)/ instead of /^o\d/ to prevent false-positives like 'output-formatter' routing to OpenAI - Add diagnostic stderr logging to convertOpenAIResponse catch block so malformed tool arguments are visible instead of silently dropping Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/942b0273-37a6-41a9-91f1-8a22a3589726 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>
- Improve callOpenAI comment to explain exactly what buildOpenAIMessages() does differently (tool_calls emission vs omission in old builder) - Remove unreachable '|| toolUseId' fallback from extractGoogleToolName (String.replace always returns a string, fallback was dead code) - Truncate raw tool arguments to 80 chars in convertOpenAIResponse warning to avoid accidentally logging secrets/PII from malformed API responses - Clarify getProvider() comment: document future-proofing intent of /^o\d+(-|$)/ and note that o5/o6 will work without code changes Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/942b0273-37a6-41a9-91f1-8a22a3589726 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/2c4b5e69-4041-408f-9957-fb986bd2905d Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>
Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/2ff987ae-24a8-439b-b29c-f352b639b597 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>
…leakage Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/dc6d5e28-0dc9-45ec-b4fb-25bc8faf11be Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>
Copilot created this pull request from a session on behalf of
codomium
May 12, 2026 17:01
View session
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
All changes are in
v2/src/core/agent-loop.mjs.Fixes
Gemini
maxOutputTokens— addsgenerationConfig.maxOutputTokensto every Gemini request so responses aren't silently truncated at the model's ~8k conservative default.Gemini 2.5 thinking blocks —
streamGoogleResponseandconvertGoogleResponsenow routepart.thought === trueparts asthinking_delta/ thinking content blocks rather than leaking them into the visible assistant text.OpenAI stream usage events —
streamOpenAIResponsecorrectly captures the trailing{choices:[],usage:{...}}chunk emitted bystream_options:{include_usage:true}and yields ausageevent so token counts are accurately tracked.OpenRouter free-model compatibility —
callCustomProviderno longer hard-codestemperature: 1.0andtop_p: 1.0; those fields are now sent only when the provider config explicitly sets them, unblocking free OpenRouter models that reject explicit sampling params.SSE tail-buffer hardening — the leftover buffer flush in
streamOpenAIResponsenow handles all event types (error, usage, thinking, text, tool-calls, finish_reason) consistently with the main loop rather than only text content.Privacy: tool-argument error log — removed the 80-char argument preview from the parse-error warning; only the character count is logged to avoid accidentally exposing PII or secrets.
Testing
All 996 existing tests pass (
npm testinv2/). CodeQL scan found 0 alerts.