[Bug] LLM call fails with ReadTimeout after 3 retry attempts

**Tags:** `bug`, `api`, `performance`
**Quality Rating:** ⭐ 7/10

---

**Reporter:** 董江涵

## Description
When the agent processes user requests that involve LLM API calls, the system intermittently fails with the error:

```
[LLM call error] ReadTimeout: Connection failed after 3 attempts
```

This error has been observed multiple times during normal conversation flows, causing the agent to fail to respond entirely. The issue appears to be related to the LLM API call timeout configuration and retry strategy.

## Steps to Reproduce
1. Interact with an agent on the Clawith platform in a normal conversation
2. Send a message that requires the agent to generate a response via LLM
3. Under certain conditions (possibly high server load, large context, or network instability), the LLM call times out
4. The system retries 3 times and then surfaces the raw error to the user

## Expected Behavior
- The system should have a sufficiently long timeout for LLM API calls (especially for complex/long-context requests)
- The retry mechanism should use exponential backoff (e.g., 2s → 4s → 8s) rather than immediate retries
- If all retries fail, the user should see a friendly error message (e.g., "The service is temporarily busy, please try again") instead of the raw internal error `[LLM call error] ReadTimeout: Connection failed after 3 attempts`
- Consider supporting streaming responses to reduce perceived latency and avoid timeout issues

## Actual Behavior
- The LLM API call times out after the default timeout period
- The system retries exactly 3 times with no apparent backoff strategy
- After 3 failures, the raw error message `[LLM call error] ReadTimeout: Connection failed after 3 attempts` is displayed directly to the user
- The agent's response is completely lost — no partial response or fallback

## Suggested Improvements
1. **Increase timeout** — Raise the HTTP request timeout from the default (likely 30s) to 60–120s for LLM calls
2. **Exponential backoff** — Implement retry with increasing delays (e.g., 2s, 4s, 8s)
3. **Increase retry count** — Consider 5 retries instead of 3
4. **Streaming support** — Use streaming mode for LLM responses to avoid long-wait timeouts
5. **Graceful error handling** — Show a user-friendly message instead of the raw error
6. **Connection pooling** — Reuse HTTP connections to reduce connection establishment overhead
7. **Configurable timeout** — Allow timeout and retry parameters to be configurable per model/provider

## Additional Context
- This error has been observed multiple times during the same session
- The error occurs during regular conversational interactions (not particularly large prompts)
- Environment: Clawith platform, agent conversation mode


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] LLM call fails with ReadTimeout after 3 retry attempts #44

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Suggested Improvements

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] LLM call fails with ReadTimeout after 3 retry attempts #44

Description

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Suggested Improvements

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions