Skip to content

fix: retry on timeout errors instead of failing#13502

Open
morgaesis wants to merge 1 commit intoanomalyco:devfrom
morgaesis:fix/timeout-retry
Open

fix: retry on timeout errors instead of failing#13502
morgaesis wants to merge 1 commit intoanomalyco:devfrom
morgaesis:fix/timeout-retry

Conversation

@morgaesis
Copy link

Fixes #13138

Problem

When provider requests timeout (default 5 min), AbortSignal.timeout() throws a DOMException with name: 'TimeoutError'. This error was not handled in fromError(), causing it to fall through to Unknown error type which is not retryable. Sessions stopped immediately instead of retrying.

Solution

Handle TimeoutError DOMException in fromError() and convert it to APIError with isRetryable: true. This enables automatic retry with exponential backoff via the existing retry mechanism.

Testing

Verified the fix handles timeout errors correctly. The session will now retry on provider timeouts instead of failing immediately.


Related issues/PRs (not duplicates):

@github-actions
Copy link
Contributor

The following comment was made by an LLM, it may be inaccurate:

No duplicate PRs found.

AbortSignal.timeout() throws DOMException with name 'TimeoutError',
which was not handled in fromError(). This caused timeout errors to
become Unknown errors (not retryable), stopping the session instead
of retrying.

Now timeout errors are converted to APIError with isRetryable: true,
enabling automatic retry with exponential backoff.

Fixes anomalyco#13138

Signed-off-by: Mörgæsis <morgaesis+git@morgaes.is>
@rekram1-node
Copy link
Collaborator

Hm it shoudln't timeout on it's own, are you passing a custom fetch somewhere?

@morgaesis
Copy link
Author

Huh, yeah, I had forgotten I had set the timeout option for the provider...

Why does it matter that I set a custom timeout? Shouldn't opencode be resilient to whatever the timeout is?
IIRC, I set the timeout because of new and slow models like GLM-5, where the wait time is unpredictable and can actually be very slow. In some cases, indistinguishable from full-on hanging for hours.

{
    "openrouter": {
      "options": {
        "timeout": 300000, // 5 minutes
      },
      "models": {
        "openrouter/auto": {
          "reasoning": true,
          "limit": {
            "output": 4000,
            "context": 128000,
          },
          "options": {
            "reasoningEffort": "high",
          },
        }
    }
}

@morgaesis
Copy link
Author

@rekram1-node, as a concrete example of when this becomes an issue. Today I was closing my laptop after having been working for a while, but when I reopen, network connectivity has to resume. However, network hiccup causes the model to hang practically forever. Without this timeout, opencode and the model keeps hanging, waiting for some network call to return, but it never will.

This way, at least, I get a failure instead of hopes and prayers the job ever continues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Long running opencode instance into timeout error

2 participants