[BUG] LM Studio provider hits 5-minute timeout due to Undici headersTimeout default, despite apiRequestTimeout set to 600s

### Problem (one or two sentences)

Requests to the LM Studio provider fail after ~5 minutes when the model takes a long time to start responding, even though apiRequestTimeout is set to 600 seconds. This is caused by Undici’s default headersTimeout (300s) triggering before the configured timeout.

### Context (who is affected and when)

This affects users running large models (e.g., Qwen3.6 27B) on slower hardware (e.g., RTX 2080 Ti), especially when using long prompts or large context windows. In these cases, time-to-first-byte can exceed 5 minutes, causing requests to fail prematurely.

### Reproduction steps

1. Run LM Studio locally (default http://localhost:1234)
2. Load a large model (e.g., Qwen3.6 27B Q4_K_M)
3. Configure Roo Code to use the LM Studio provider
4. Send a request with a long prompt or large context that requires >300 seconds before response starts
5. Observe that the request fails at ~300 seconds

### Expected result

The request should respect apiRequestTimeout (default 600 seconds) and continue waiting for the response without failing.

### Actual result

The request fails at approximately 300 seconds with a timeout error (e.g., HeadersTimeoutError), and Roo Code retries the request, resulting in repeated disconnect loops.

### Variations tried (optional)

- Increased apiRequestTimeout to 600 seconds → no effect
- Verified behavior using Qwen3.6-Plus (API) → confirmed timeout aligns with Undici’s 300s default
- Issue persists across different prompts and model settings when inference is slow

### App Version

Roo Code: v3.53.0

### API Provider (optional)

LM Studio

### Model Used (optional)

Qwen3.6-27b-Q4_K_M

### Roo Code Task Links (optional)

_No response_

### Relevant logs or errors (optional)

```shell
Further investigation using Qwen3.6-Plus (API) indicates that this issue is likely caused by Undici’s default headersTimeout (300 seconds), which overrides the configured apiRequestTimeout of 600 seconds.

The following is traced and generated by Qwen3.6-plus.

## LM Studio Provider 5-Minute Timeout Bug Report

### Bug Description

When using the LM Studio provider with a model that takes more than 300 seconds (5 minutes) to produce the first token (e.g., long prompt, large context, speculative decoding, or slow inference hardware), the request fails at approximately 300,000ms — even though the configured `apiRequestTimeout` defaults to 600 seconds.

This is caused by **Undici's default `headersTimeout` of 300 seconds**, which fires before the OpenAI SDK's `timeout` option (set to 600s by `getApiRequestTimeout()`) can take effect.

### Affected Files

- `src/api/providers/lm-studio.ts` (lines 33–37)
- `src/utils/networkProxy.ts` (lines 279–288)
- `src/api/providers/utils/timeout-config.ts`

### Root Cause

#### Timeout Conflict

| Configuration | Value | Layer |
|---|---|---|
| `getApiRequestTimeout()` | Default 600s | OpenAI SDK `timeout` option |
| Undici `headersTimeout` (default) | **300s** | Underlying HTTP connection |

The OpenAI SDK's `timeout` option controls the SDK-level request timeout, but it **does not override** the underlying Undici dispatcher's `headersTimeout`. When waiting for the response headers (e.g., during Time-To-First-Token > 300s), Undici throws a `HeadersTimeoutError` at ~300,000ms regardless of the SDK's configured timeout.

#### Code Analysis

**`lm-studio.ts` (L33–37):**

this.client = new OpenAI({
    baseURL: (this.options.lmStudioBaseUrl || "http://localhost:1234") + "/v1",
    apiKey: "noop",
    timeout: getApiRequestTimeout(), // 600,000ms by default
    // No custom fetch/dispatcher configured — uses Node.js native fetch (powered by internal undici)
    // which has headersTimeout=300s by default
})


**`networkProxy.ts` (L279–288) — when proxy is enabled in debug mode:**

const proxyAgent = new ProxyAgent({
    uri: config.serverUrl,
    requestTls: config.tlsInsecure ? { rejectUnauthorized: false } : undefined,
    proxyTls: config.tlsInsecure ? { rejectUnauthorized: false } : undefined,
    // No headersTimeout or bodyTimeout configured — uses Undici default of 300s
})
setGlobalDispatcher(proxyAgent)


#### Call Chain


lm-studio.ts
  → this.client.chat.completions.create(params)   // OpenAI SDK v5.12.2
    → Node.js native fetch (undici under the hood)
      → Undici default dispatcher
        → headersTimeout = 300s → throws HeadersTimeoutError at ~300,000ms


#### Why This Affects LM Studio Specifically

LM Studio runs locally and may use large models on consumer hardware (e.g., quantized 70B models on a single GPU). Inference for long prompts or large context windows can legitimately take >5 minutes before the first token is streamed. The 300s undici `headersTimeout` is not configurable via the OpenAI SDK's `timeout` option.

### Reproduction Steps

1. Load a large model in LM Studio (e.g., a heavily quantized 70B+ model)
2. Set `lmStudioBaseUrl` to the LM Studio endpoint (default: `http://localhost:1234`)
3. Send a request with a very long prompt or a large context that requires >300s to produce the first token
4. Observe the request fails at approximately 300,000ms (5 minutes)

**Expected:** The request should respect `apiRequestTimeout` (default 600s) and complete successfully.
**Actual:** The request fails at ~300,000ms with a `HeadersTimeoutError` or similar timeout error from Undici.

### Suggested Fix

In `lm-studio.ts`, create a custom Undici `Agent` with `headersTimeout` and `bodyTimeout` values that exceed `getApiRequestTimeout()`, and pass it via the OpenAI SDK's `fetch` option:


import { Agent, fetch as undiciFetch } from "undici"
import { getApiRequestTimeout } from "./utils/timeout-config"

const apiTimeout = getApiRequestTimeout() ?? 600_000
// Add a 60s buffer to ensure undici timeout > SDK timeout
const undiciTimeout = apiTimeout + 60_000

const dispatcher = new Agent({
    headersTimeout: undiciTimeout,
    bodyTimeout: undiciTimeout,
})

const customFetch = (url: string | URL, init?: RequestInit) => {
    return undiciFetch(url, { ...init, dispatcher })
}

this.client = new OpenAI({
    baseURL: (this.options.lmStudioBaseUrl || "http://localhost:1234") + "/v1",
    apiKey: "noop",
    timeout: apiTimeout,
    fetch: customFetch,
})


Additionally, in `networkProxy.ts`, the `ProxyAgent` should also accept configurable `headersTimeout` and `bodyTimeout` values that respect `getApiRequestTimeout()`:


const apiTimeout = getApiRequestTimeout() ?? 600_000
const undiciTimeout = apiTimeout + 60_000

const proxyAgent = new ProxyAgent({
    uri: config.serverUrl,
    headersTimeout: undiciTimeout,
    bodyTimeout: undiciTimeout,
    requestTls: config.tlsInsecure ? { rejectUnauthorized: false } : undefined,
    proxyTls: config.tlsInsecure ? { rejectUnauthorized: false } : undefined,
})


### Environment

- **Roo Code version:** Current (commit ad25634)
- **OpenAI SDK version:** 5.12.2
- **Node.js version:** 18+ (uses native fetch powered by undici)
- **OS:** All (Windows 11, macOS, Linux)
- **LM Studio:** Any version running locally

### Impact

Users running LM Studio with large models on consumer hardware cannot complete requests that take >5 minutes, even though the `apiRequestTimeout` setting defaults to 600 seconds. This is a functional bug that prevents legitimate long-running inference requests from completing.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] LM Studio provider hits 5-minute timeout due to Undici headersTimeout default, despite apiRequestTimeout set to 600s #12244

Problem (one or two sentences)

Context (who is affected and when)

Reproduction steps

Expected result

Actual result

Variations tried (optional)

App Version

API Provider (optional)

Model Used (optional)

Roo Code Task Links (optional)

Relevant logs or errors (optional)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] LM Studio provider hits 5-minute timeout due to Undici headersTimeout default, despite apiRequestTimeout set to 600s #12244

Description

Problem (one or two sentences)

Context (who is affected and when)

Reproduction steps

Expected result

Actual result

Variations tried (optional)

App Version

API Provider (optional)

Model Used (optional)

Roo Code Task Links (optional)

Relevant logs or errors (optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions