Tags: bug, api, performance
Quality Rating: ⭐ 7/10
Reporter: 董江涵
Description
When the agent processes user requests that involve LLM API calls, the system intermittently fails with the error:
[LLM call error] ReadTimeout: Connection failed after 3 attempts
This error has been observed multiple times during normal conversation flows, causing the agent to fail to respond entirely. The issue appears to be related to the LLM API call timeout configuration and retry strategy.
Steps to Reproduce
- Interact with an agent on the Clawith platform in a normal conversation
- Send a message that requires the agent to generate a response via LLM
- Under certain conditions (possibly high server load, large context, or network instability), the LLM call times out
- The system retries 3 times and then surfaces the raw error to the user
Expected Behavior
- The system should have a sufficiently long timeout for LLM API calls (especially for complex/long-context requests)
- The retry mechanism should use exponential backoff (e.g., 2s → 4s → 8s) rather than immediate retries
- If all retries fail, the user should see a friendly error message (e.g., "The service is temporarily busy, please try again") instead of the raw internal error
[LLM call error] ReadTimeout: Connection failed after 3 attempts
- Consider supporting streaming responses to reduce perceived latency and avoid timeout issues
Actual Behavior
- The LLM API call times out after the default timeout period
- The system retries exactly 3 times with no apparent backoff strategy
- After 3 failures, the raw error message
[LLM call error] ReadTimeout: Connection failed after 3 attempts is displayed directly to the user
- The agent's response is completely lost — no partial response or fallback
Suggested Improvements
- Increase timeout — Raise the HTTP request timeout from the default (likely 30s) to 60–120s for LLM calls
- Exponential backoff — Implement retry with increasing delays (e.g., 2s, 4s, 8s)
- Increase retry count — Consider 5 retries instead of 3
- Streaming support — Use streaming mode for LLM responses to avoid long-wait timeouts
- Graceful error handling — Show a user-friendly message instead of the raw error
- Connection pooling — Reuse HTTP connections to reduce connection establishment overhead
- Configurable timeout — Allow timeout and retry parameters to be configurable per model/provider
Additional Context
- This error has been observed multiple times during the same session
- The error occurs during regular conversational interactions (not particularly large prompts)
- Environment: Clawith platform, agent conversation mode
Tags:
bug,api,performanceQuality Rating: ⭐ 7/10
Reporter: 董江涵
Description
When the agent processes user requests that involve LLM API calls, the system intermittently fails with the error:
This error has been observed multiple times during normal conversation flows, causing the agent to fail to respond entirely. The issue appears to be related to the LLM API call timeout configuration and retry strategy.
Steps to Reproduce
Expected Behavior
[LLM call error] ReadTimeout: Connection failed after 3 attemptsActual Behavior
[LLM call error] ReadTimeout: Connection failed after 3 attemptsis displayed directly to the userSuggested Improvements
Additional Context