fix: improve multi-provider streaming robustness (Gemini, OpenAI-compatible, OpenRouter) by Copilot · Pull Request #32 · codomium/FreeCode

Copilot · 2026-05-12T17:00:43Z

Summary

All changes are in v2/src/core/agent-loop.mjs.

Fixes

Gemini maxOutputTokens — adds generationConfig.maxOutputTokens to every Gemini request so responses aren't silently truncated at the model's ~8k conservative default.
Gemini 2.5 thinking blocks — streamGoogleResponse and convertGoogleResponse now route part.thought === true parts as thinking_delta / thinking content blocks rather than leaking them into the visible assistant text.
OpenAI stream usage events — streamOpenAIResponse correctly captures the trailing {choices:[],usage:{...}} chunk emitted by stream_options:{include_usage:true} and yields a usage event so token counts are accurately tracked.
OpenRouter free-model compatibility — callCustomProvider no longer hard-codes temperature: 1.0 and top_p: 1.0; those fields are now sent only when the provider config explicitly sets them, unblocking free OpenRouter models that reject explicit sampling params.
SSE tail-buffer hardening — the leftover buffer flush in streamOpenAIResponse now handles all event types (error, usage, thinking, text, tool-calls, finish_reason) consistently with the main loop rather than only text content.
Privacy: tool-argument error log — removed the 80-char argument preview from the parse-error warning; only the character count is logged to avoid accidentally exposing PII or secrets.

Testing

All 996 existing tests pass (npm test in v2/). CodeQL scan found 0 alerts.

- agent-loop.mjs: run multiple tool calls from one LLM turn in parallel (Promise.all) instead of sequentially. Permission checks/hooks remain sequential (they may show UI dialogs). Tool execution, verifyWrites, and post-hooks all run concurrently; results are yielded in original order. Measured: two 200ms tools now complete in ~200ms instead of ~400ms. - chat.js: replace per-frame (60fps) requestAnimationFrame throttle with a 100ms setTimeout throttle for markdown re-renders during streaming. This caps CPU-intensive renderMarkdown() at ~10 calls/s, eliminating jank on long responses. The scroll is also batched into the same timeout callback to avoid a second layout pass. Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/af8f1e4f-7656-415a-bc7e-df5f883b20ae Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

…heck - agent-loop.mjs: add `anthropic-beta: prompt-caching-2024-07-31` header to Anthropic API calls so the cache_control blocks already present on the system prompt are actually honoured. Without this header Anthropic silently ignores cache_control, causing the full system prompt (~2440 tokens) to be re-processed from scratch on every turn. With it, subsequent turns in a session skip that processing entirely, cutting hundreds of ms off TTFT. - agent-loop.mjs: throttle detectRepetition to every 50 chars of new text instead of on every single streaming token. The function's inner loop does up to 27 indexOf() calls over 800 chars; calling it on every 1-3 char delta added unnecessary CPU work in the hot path. - registry.mjs: cache the tools.list() result and invalidate only when tools are registered/added. tools.list() was rebuilding a 30-element array from scratch on every API call (multiple times per user turn). Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/83d92820-acac-4cc5-9fb4-19a2ed53a813 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

…Bottom usage) Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/83d92820-acac-4cc5-9fb4-19a2ed53a813 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

Six targeted fixes to cut latency on every agent turn: 1. CUSTOM_PROVIDERS_JSON parse cache (agent-loop.mjs) - findCustomProvider() called 2-3x per turn, each call did JSON.parse() - Now caches result in _customProvidersCache, re-parses only when env changes 2. Module-level RateLimiter instances (agent-loop.mjs) - new RateLimiter({...}) in callAnthropic/callOpenAI/callGoogle/callNvidia/callCustomProvider - Hoisted to _rlAnthropic/_rlOpenAI/_rlGoogle/_rlNvidia/_rlCustom; reset() on each call 3. OpenAI tool-definition format cache (agent-loop.mjs + registry.mjs) - toolDefs.map(t => ({type:'function',...})) rebuilt on every API call - toOpenAITools() caches by array identity; stable because tools.list() is cached - registry.listOpenAI() added as complementary cached accessor 4. shouldCompact gating (agent-loop.mjs) - Called unconditionally before every API call; addMessage() already runs it - Now gated to options.continuation=true only (tool-result continuation turns) 5. TurnClassifier.classify array allocation (turn-classifier.mjs) - messages.filter(m => m.role==='user') allocated intermediate array every turn - Replaced with reverse for-loop that short-circuits at first user message 6. TurnClassifier._extractText array allocation (turn-classifier.mjs) - filter+map+join chain allocated two intermediate arrays - Replaced with single for-loop with early concatenation Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/69f9b44c-9440-4906-b90e-569d41d94644 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

Fix 1: callOpenAI — use buildOpenAIMessages() instead of inline builder The old builder emitted only tool_result blocks, losing the assistant tool_calls that must precede them. Multi-turn conversations where the model invoked any tool were silently broken. Now uses the same full message builder as callNvidia and callCustomProvider. Fix 2: callOpenAI — add max_tokens to request body Without max_tokens OpenAI uses the model default (4096 on older models) instead of the configured limit. Also adds stream_options.include_usage so token counts are available from streaming responses. Fix 3: callAnthropic — honour ANTHROPIC_BASE_URL env var The URL was hardcoded. Users with proxy setups or self-hosted Anthropic endpoints (via ANTHROPIC_BASE_URL) were always hitting the public API. Fix 4: detectProvider — cover all OpenAI reasoning model variants Old code checked for 'o1' and 'o3' prefixes, missing o4-mini, o4, and any future o5+ models. Replaced with /^o\d/ regex that matches any single-letter 'o' followed by a digit. Fix 5: callGoogle — functionResponse.name must be the function name The Gemini API requires functionResponse.name to equal the original functionCall.name. The code was passing the call ID instead (e.g. 'Read_1'), causing Gemini to reject or mis-route tool responses. Now strips the trailing _<counter> suffix to recover the function name. Fix 6: convertOpenAIResponse — guard JSON.parse with try/catch Malformed tool argument JSON from the API would throw uncaught and crash the agent loop. Matches the existing defensive pattern in convertNvidiaResponse. Fix 7: providers.mjs model catalogues updated - Anthropic: add claude-opus/sonnet/haiku-4-5 and 4-6 series - OpenAI: add gpt-4.1/mini/nano, o4-mini, o3, o1; remove stale names - Google: replace placeholder gemini-3-flash-preview with real gemini-2.5-pro/flash, gemini-2.0-flash/pro, gemini-1.5-pro/flash - getProvider() in providers.mjs updated to use /^o\d/ like agent-loop Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/942b0273-37a6-41a9-91f1-8a22a3589726 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

- Hoist ANTHROPIC_BASE_URL resolution to module-level (_anthropicBaseUrl) to avoid string ops on every API call - Extract extractGoogleToolName() helper used by both callGoogle and the comment in accumulateGoogleStream to keep the ID format in sync - Tighten provider-detection regex: /^o\d+(-|$)/ instead of /^o\d/ to prevent false-positives like 'output-formatter' routing to OpenAI - Add diagnostic stderr logging to convertOpenAIResponse catch block so malformed tool arguments are visible instead of silently dropping Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/942b0273-37a6-41a9-91f1-8a22a3589726 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

- Improve callOpenAI comment to explain exactly what buildOpenAIMessages() does differently (tool_calls emission vs omission in old builder) - Remove unreachable '|| toolUseId' fallback from extractGoogleToolName (String.replace always returns a string, fallback was dead code) - Truncate raw tool arguments to 80 chars in convertOpenAIResponse warning to avoid accidentally logging secrets/PII from malformed API responses - Clarify getProvider() comment: document future-proofing intent of /^o\d+(-|$)/ and note that o5/o6 will work without code changes Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/942b0273-37a6-41a9-91f1-8a22a3589726 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/2c4b5e69-4041-408f-9957-fb986bd2905d Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/2ff987ae-24a8-439b-b29c-f352b639b597 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

…leakage Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/dc6d5e28-0dc9-45ec-b4fb-25bc8faf11be Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

Copilot AI and others added 10 commits May 12, 2026 12:31

fix: address code review issues (trailing repetition check + scrollTo…

91a2de6

…Bottom usage) Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/83d92820-acac-4cc5-9fb4-19a2ed53a813 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

Changes before error encountered

9512614

Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/2c4b5e69-4041-408f-9957-fb986bd2905d Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

fix: unblock OpenRouter free models and harden OpenAI SSE tail parsing

5f95509

Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/2ff987ae-24a8-439b-b29c-f352b639b597 Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

fix: remove tool-argument preview from error log to avoid PII/secret …

36e9b93

…leakage Agent-Logs-Url: https://github.com/codomium/FreeCode/sessions/dc6d5e28-0dc9-45ec-b4fb-25bc8faf11be Co-authored-by: codomium <255525663+codomium@users.noreply.github.com>

Copilot AI assigned Copilot and codomium May 12, 2026

Copilot created this pull request from a session on behalf of codomium May 12, 2026 17:01 View session

Copilot AI requested a review from codomium May 12, 2026 17:01

Copilot finished work on behalf of codomium May 12, 2026 17:01

codomium marked this pull request as ready for review May 12, 2026 17:04

codomium merged commit 9c3c817 into main May 12, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve multi-provider streaming robustness (Gemini, OpenAI-compatible, OpenRouter)#32

fix: improve multi-provider streaming robustness (Gemini, OpenAI-compatible, OpenRouter)#32
codomium merged 10 commits into
mainfrom
copilot/improve-agent-response-time

Copilot AI commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 12, 2026

Summary

Fixes

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants