fix: raise Ollama transport timeouts (undici 300s default killed long 35B generations)#17
Merged
Conversation
…ong 35B generations Diagnosis from the 2026-07-03 rerun: repeatable "transport failure: fetch failed" on the same cells, dying at ~301s with zero attempts recorded — undici's default headersTimeout (300s) aborting single long local generations. The retry pass reproduced it on the same cells, ruling out flakiness. OllamaAdapter now sends requests through a long-lived undici Agent with headers/body timeouts raised to one hour — a deliberate ceiling for local inference, not "no timeout": a truly hung server still fails rather than blocking a matrix forever. Injected test fetches ignore the dispatcher option; a test asserts it is passed. Anthropic adapter unaffected (the SDK manages its own timeouts). AC: 76 tests (75 + dispatcher-option test). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…gines floor (>=20) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses repeatable ~301s fetch failed transport failures during long local Ollama generations by routing OllamaAdapter requests through an undici Agent configured with much longer headers/body timeouts (1 hour), and adds eval rerun matrices documenting the rerun setup.
Changes:
- Add a long-lived undici
Agentdispatcher inOllamaAdapterwith 1-hourheadersTimeout/bodyTimeout. - Add a unit test asserting the Ollama request carries the dispatcher option.
- Add
undicias a dependency and add rerun matrix JSON configs for eval runs.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/adapters/ollama.ts | Adds an undici Agent dispatcher with raised transport timeouts and passes it via fetch init. |
| src/adapters/adapters.test.ts | Adds a test to ensure Ollama requests include the dispatcher option. |
| package.json | Adds undici dependency needed for the dispatcher/timeout configuration. |
| package-lock.json | Locks undici dependency resolution and records its engine requirements. |
| eval/matrix.rerun.json | Adds a documented rerun eval matrix (local + hosted) for the 2026-07-03 rerun. |
| eval/matrix.rerun-local.json | Adds a local-only rerun matrix configuration. |
| eval/matrix.rerun-hosted.json | Adds a hosted-only rerun matrix configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the last infrastructure hole the rerun exposed: repeatable
fetch failedon the same cells, dying at ~301s with zero attempts recorded — undici's defaultheadersTimeout(300s) aborting long local generations. The retry pass reproduced it on the same cells (p09/p11 long-thinking runs), ruling out flakiness; successful long runs all finished under ~300s per request.Fix:
OllamaAdaptersends requests through a long-lived undiciAgentwith headers/body timeouts raised to one hour — a deliberate ceiling for local inference, not "no timeout" (a hung server still fails rather than blocking a matrix forever).undiciadded as a dependency (adapter path only; the zero-networkcoreboundary is untouched — its test still passes). Injected test fetches ignore the option; a new test asserts it's passed.On merge: I delete the 3 remaining transport reports and run the final retry, which should leave qwen's column with zero infrastructure pollution.
AC: 76 tests green.
🤖 Generated with Claude Code