Problem
ChatGPT subscription streams can hang indefinitely without timing out, leaving users with frozen connections and no feedback.
Symptoms
- LLM response streams from ChatGPT subscription endpoints stall permanently with no data
- No timeout mechanism detects or recovers from hung connections
- Users experience completely frozen sessions with no error message or recovery path
Root Cause
-
Broken provider request timeout: The existing timeout mechanism used AbortSignal.timeout() which does not work correctly in Bun's runtime. Provider HTTP requests had no effective timeout, allowing connections to hang indefinitely.
-
No stream-level watchdog: The session processor lacked any idle/first-byte timeout on LLM streams. Once a stream stopped producing chunks (or never started), there was no detection or recovery.
-
Incomplete retry classification: Transient server errors (429, 502, 503, 504) and common ChatGPT error shapes were not always classified as retryable, causing permanent failures on recoverable conditions.
-
Missing abort signal propagation: Token refresh calls in the Codex auth plugin did not propagate the caller's abort signal, so cancellation during auth refresh leaked hanging requests.
Impact
- Users on ChatGPT subscription plans experience frozen/hung sessions with no feedback
- No automatic recovery from transient server-side stalls
- Subagent tasks can run indefinitely without timeout bounds
Fix Approach
Repair the stream watchdog and timeout infrastructure across multiple layers:
- Replace broken
AbortSignal.timeout() with manual setTimeout + AbortController for provider request timeouts (5 min default)
- Add a stream-level watchdog with separate first-byte (30s) and idle (120s) timeouts that wraps LLM streams
- Extend retry classification to recognize transient server errors (429/502/503/504) and deeply-nested ChatGPT error patterns
- Convert
DOMException(TimeoutError) into retryable APIError for proper recovery
- Propagate abort signals through Codex OAuth token exchange and refresh paths
- Add configurable subagent timeout with input validation
Problem
ChatGPT subscription streams can hang indefinitely without timing out, leaving users with frozen connections and no feedback.
Symptoms
Root Cause
Broken provider request timeout: The existing timeout mechanism used
AbortSignal.timeout()which does not work correctly in Bun's runtime. Provider HTTP requests had no effective timeout, allowing connections to hang indefinitely.No stream-level watchdog: The session processor lacked any idle/first-byte timeout on LLM streams. Once a stream stopped producing chunks (or never started), there was no detection or recovery.
Incomplete retry classification: Transient server errors (429, 502, 503, 504) and common ChatGPT error shapes were not always classified as retryable, causing permanent failures on recoverable conditions.
Missing abort signal propagation: Token refresh calls in the Codex auth plugin did not propagate the caller's abort signal, so cancellation during auth refresh leaked hanging requests.
Impact
Fix Approach
Repair the stream watchdog and timeout infrastructure across multiple layers:
AbortSignal.timeout()with manualsetTimeout+AbortControllerfor provider request timeouts (5 min default)DOMException(TimeoutError)into retryableAPIErrorfor proper recovery