Problem (one or two sentences)
Requests to the LM Studio provider fail after ~5 minutes when the model takes a long time to start responding, even though apiRequestTimeout is set to 600 seconds. This is caused by Undici’s default headersTimeout (300s) triggering before the configured timeout.
Context (who is affected and when)
This affects users running large models (e.g., Qwen3.6 27B) on slower hardware (e.g., RTX 2080 Ti), especially when using long prompts or large context windows. In these cases, time-to-first-byte can exceed 5 minutes, causing requests to fail prematurely.
Reproduction steps
- Run LM Studio locally (default http://localhost:1234)
- Load a large model (e.g., Qwen3.6 27B Q4_K_M)
- Configure Roo Code to use the LM Studio provider
- Send a request with a long prompt or large context that requires >300 seconds before response starts
- Observe that the request fails at ~300 seconds
Expected result
The request should respect apiRequestTimeout (default 600 seconds) and continue waiting for the response without failing.
Actual result
The request fails at approximately 300 seconds with a timeout error (e.g., HeadersTimeoutError), and Roo Code retries the request, resulting in repeated disconnect loops.
Variations tried (optional)
- Increased apiRequestTimeout to 600 seconds → no effect
- Verified behavior using Qwen3.6-Plus (API) → confirmed timeout aligns with Undici’s 300s default
- Issue persists across different prompts and model settings when inference is slow
App Version
Roo Code: v3.53.0
API Provider (optional)
LM Studio
Model Used (optional)
Qwen3.6-27b-Q4_K_M
Roo Code Task Links (optional)
No response
Relevant logs or errors (optional)
Further investigation using Qwen3.6-Plus (API) indicates that this issue is likely caused by Undici’s default headersTimeout (300 seconds), which overrides the configured apiRequestTimeout of 600 seconds.
The following is traced and generated by Qwen3.6-plus.
## LM Studio Provider 5-Minute Timeout Bug Report
### Bug Description
When using the LM Studio provider with a model that takes more than 300 seconds (5 minutes) to produce the first token (e.g., long prompt, large context, speculative decoding, or slow inference hardware), the request fails at approximately 300,000ms — even though the configured `apiRequestTimeout` defaults to 600 seconds.
This is caused by **Undici's default `headersTimeout` of 300 seconds**, which fires before the OpenAI SDK's `timeout` option (set to 600s by `getApiRequestTimeout()`) can take effect.
### Affected Files
- `src/api/providers/lm-studio.ts` (lines 33–37)
- `src/utils/networkProxy.ts` (lines 279–288)
- `src/api/providers/utils/timeout-config.ts`
### Root Cause
#### Timeout Conflict
| Configuration | Value | Layer |
|---|---|---|
| `getApiRequestTimeout()` | Default 600s | OpenAI SDK `timeout` option |
| Undici `headersTimeout` (default) | **300s** | Underlying HTTP connection |
The OpenAI SDK's `timeout` option controls the SDK-level request timeout, but it **does not override** the underlying Undici dispatcher's `headersTimeout`. When waiting for the response headers (e.g., during Time-To-First-Token > 300s), Undici throws a `HeadersTimeoutError` at ~300,000ms regardless of the SDK's configured timeout.
#### Code Analysis
**`lm-studio.ts` (L33–37):**
this.client = new OpenAI({
baseURL: (this.options.lmStudioBaseUrl || "http://localhost:1234") + "/v1",
apiKey: "noop",
timeout: getApiRequestTimeout(), // 600,000ms by default
// No custom fetch/dispatcher configured — uses Node.js native fetch (powered by internal undici)
// which has headersTimeout=300s by default
})
**`networkProxy.ts` (L279–288) — when proxy is enabled in debug mode:**
const proxyAgent = new ProxyAgent({
uri: config.serverUrl,
requestTls: config.tlsInsecure ? { rejectUnauthorized: false } : undefined,
proxyTls: config.tlsInsecure ? { rejectUnauthorized: false } : undefined,
// No headersTimeout or bodyTimeout configured — uses Undici default of 300s
})
setGlobalDispatcher(proxyAgent)
#### Call Chain
lm-studio.ts
→ this.client.chat.completions.create(params) // OpenAI SDK v5.12.2
→ Node.js native fetch (undici under the hood)
→ Undici default dispatcher
→ headersTimeout = 300s → throws HeadersTimeoutError at ~300,000ms
#### Why This Affects LM Studio Specifically
LM Studio runs locally and may use large models on consumer hardware (e.g., quantized 70B models on a single GPU). Inference for long prompts or large context windows can legitimately take >5 minutes before the first token is streamed. The 300s undici `headersTimeout` is not configurable via the OpenAI SDK's `timeout` option.
### Reproduction Steps
1. Load a large model in LM Studio (e.g., a heavily quantized 70B+ model)
2. Set `lmStudioBaseUrl` to the LM Studio endpoint (default: `http://localhost:1234`)
3. Send a request with a very long prompt or a large context that requires >300s to produce the first token
4. Observe the request fails at approximately 300,000ms (5 minutes)
**Expected:** The request should respect `apiRequestTimeout` (default 600s) and complete successfully.
**Actual:** The request fails at ~300,000ms with a `HeadersTimeoutError` or similar timeout error from Undici.
### Suggested Fix
In `lm-studio.ts`, create a custom Undici `Agent` with `headersTimeout` and `bodyTimeout` values that exceed `getApiRequestTimeout()`, and pass it via the OpenAI SDK's `fetch` option:
import { Agent, fetch as undiciFetch } from "undici"
import { getApiRequestTimeout } from "./utils/timeout-config"
const apiTimeout = getApiRequestTimeout() ?? 600_000
// Add a 60s buffer to ensure undici timeout > SDK timeout
const undiciTimeout = apiTimeout + 60_000
const dispatcher = new Agent({
headersTimeout: undiciTimeout,
bodyTimeout: undiciTimeout,
})
const customFetch = (url: string | URL, init?: RequestInit) => {
return undiciFetch(url, { ...init, dispatcher })
}
this.client = new OpenAI({
baseURL: (this.options.lmStudioBaseUrl || "http://localhost:1234") + "/v1",
apiKey: "noop",
timeout: apiTimeout,
fetch: customFetch,
})
Additionally, in `networkProxy.ts`, the `ProxyAgent` should also accept configurable `headersTimeout` and `bodyTimeout` values that respect `getApiRequestTimeout()`:
const apiTimeout = getApiRequestTimeout() ?? 600_000
const undiciTimeout = apiTimeout + 60_000
const proxyAgent = new ProxyAgent({
uri: config.serverUrl,
headersTimeout: undiciTimeout,
bodyTimeout: undiciTimeout,
requestTls: config.tlsInsecure ? { rejectUnauthorized: false } : undefined,
proxyTls: config.tlsInsecure ? { rejectUnauthorized: false } : undefined,
})
### Environment
- **Roo Code version:** Current (commit ad25634)
- **OpenAI SDK version:** 5.12.2
- **Node.js version:** 18+ (uses native fetch powered by undici)
- **OS:** All (Windows 11, macOS, Linux)
- **LM Studio:** Any version running locally
### Impact
Users running LM Studio with large models on consumer hardware cannot complete requests that take >5 minutes, even though the `apiRequestTimeout` setting defaults to 600 seconds. This is a functional bug that prevents legitimate long-running inference requests from completing.
Problem (one or two sentences)
Requests to the LM Studio provider fail after ~5 minutes when the model takes a long time to start responding, even though apiRequestTimeout is set to 600 seconds. This is caused by Undici’s default headersTimeout (300s) triggering before the configured timeout.
Context (who is affected and when)
This affects users running large models (e.g., Qwen3.6 27B) on slower hardware (e.g., RTX 2080 Ti), especially when using long prompts or large context windows. In these cases, time-to-first-byte can exceed 5 minutes, causing requests to fail prematurely.
Reproduction steps
Expected result
The request should respect apiRequestTimeout (default 600 seconds) and continue waiting for the response without failing.
Actual result
The request fails at approximately 300 seconds with a timeout error (e.g., HeadersTimeoutError), and Roo Code retries the request, resulting in repeated disconnect loops.
Variations tried (optional)
App Version
Roo Code: v3.53.0
API Provider (optional)
LM Studio
Model Used (optional)
Qwen3.6-27b-Q4_K_M
Roo Code Task Links (optional)
No response
Relevant logs or errors (optional)