-
Notifications
You must be signed in to change notification settings - Fork 275
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Pre-checks
- I have searched existing issues and this is not a duplicate.
Deployment Method
Source (setup.sh)
Steps to Reproduce
Agent A(Caption)sends a longtask_requesttoAgent B(Nova)usingsend_message_to_agent.Agent Bstarts a multi-step workflow involving LLM calls and tool usage (web_searchin this case).- One of the intermediate LLM requests times out.
- The entire A2A request fails immediately and returns a generic empty error string.
Expected vs Actual Behavior
Actual Result
- The task is aborted mid-execution.
- The UI shows only
Message send error:with no cause. - The backend log shows
httpcore.ReadTimeout/httpx.ReadTimeout.
Expected Result
- A transient timeout during a long A2A task should not silently collapse into an empty UI error.
- The returned error should include a concrete cause.
- Long-running A2A flows should not be fragile to a single transient timeout.
Logs / Screenshots
1. The A2A request to Nova(Agent B) was actually started
From .data/log/backend.log:
3459 2026-03-20 13:38:15 | INFO | ... | [LLM] Raw arguments for send_message_to_agent (len=655): '{"agent_name": "Nova", "message": "...", "msg_type": "task_request"}'
3460 2026-03-20 13:38:15 | INFO | ... | [LLM] Calling tool: send_message_to_agent({'agent_name': 'Nova', 'message': '...', 'msg_type': 'task_request'})
This confirms the failing action was specifically send_message_to_agent targeting Nova.
2. Nova had already entered a multi-step long-running flow
3472 2026-03-20 13:38:55 | INFO | ... | app.services.autonomy_service:check_and_enforce:62 - L2: Executing web_search for agent Nova with notification
3480 2026-03-20 13:39:51 | INFO | ... | app.services.autonomy_service:check_and_enforce:62 - L2: Executing web_search for agent Nova with notification
3484 2026-03-20 13:40:08 | INFO | ... | HTTP Request: POST https://ark.cn-beijing.volces.com/api/coding/v3/chat/completions "HTTP/1.1 200 OK"
3485 2026-03-20 13:40:15 | INFO | ... | HTTP Request: POST https://ark.cn-beijing.volces.com/api/coding/v3/chat/completions "HTTP/1.1 200 OK"
3486 2026-03-20 13:40:20 | INFO | ... | HTTP Request: POST https://ark.cn-beijing.volces.com/api/coding/v3/chat/completions "HTTP/1.1 200 OK"
This shows Nova was not failing immediately. It had already started working through a long task with multiple successful LLM/tool steps before the failure occurred.
3. The real failure was a timeout during an LLM request
3519 httpcore.ReadTimeout
3524 File "/Users/congregalis/Code/Clawith/backend/app/services/agent_tools.py", line 2806, in _send_message_to_agent
3525 response = await llm_client.complete(
3527 File "/Users/congregalis/Code/Clawith/backend/app/services/llm_client.py", line 410, in complete
3528 response = await client.post(url, json=payload, headers=self._get_headers())
3555 httpx.ReadTimeout
This is the critical evidence. The failure is a timeout while waiting for an LLM response inside _send_message_to_agent, not an invalid tool argument, not a missing agent, and not a frontend-only issue.
4. The empty UI error is explained by how the exception is rendered
At the time of failure, the original code returned:
return f"❌ Message send error: {str(e)[:200]}"For httpx.ReadTimeout, str(e) can be empty. Reproduced locally:
ReadTimeout
str(e)= ''
That explains why the UI showed only:
Message send error:
with no additional detail.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working