Summary
When an agent run completes without invoking the send_message MCP tool (e.g., the claude CLI fails with "Not logged in · Please run /login"), the system logs run_completed reason=Natural and posts nothing to the channel. The user — who may have just accepted a task proposal and is actively waiting for the per-task session to start — sees silence and has no way to diagnose what's wrong.
This is pre-existing behavior (predates v1 task proposals), but task proposals v2 (#96) amplifies the pain: accepting a proposal is a commitment moment, and silence afterward is maximally confusing.
Reproduction (from v2 dogfood session)
- Start chorus with
HOME=/tmp/dogfood (isolated, no claude auth state).
- Create channel
eng, create agent claude-ccd4, join both.
- Post a message in
eng: @claude-ccd4 hey.
- Observe agent wake: server log shows
starting agent via shared bridge, session attached, run completed reason=Natural.
- Observe channel: no reply from the agent.
- Inspect
$HOME/.claude/projects/…/<session>.jsonl: [ASST text] Not logged in · Please run /login.
The claude CLI emitted a user-visible error to its own stdout/JSONL, but because it's not a send_message tool call, chorus never routes it to the channel.
Root cause
Chorus treats "run completed with zero send_message tool uses" as equivalent to "run completed normally." It's not. A healthy run either:
- Produces chat content via
send_message (visible response), OR
- Explicitly declines to respond (e.g., agent judges the message doesn't warrant a reply).
Right now there's no way to distinguish a healthy-decline from a stuck-agent-that-cannot-respond.
Proposed fix options (pick one)
Option A — Post-run empty-response detection (recommended)
After an agent run completes, check whether the run invoked send_message at least once. If not, post a system-authored warning message to the triggering channel:
⚠️ @<agent> completed a run without replying. Common causes: not authed, auth expired, runtime error. Check agent logs.
Pros: catches all silent-fail modes (auth, crash, confused model, etc.) without needing to enumerate each. Cheap.
Cons: false positives on legitimate no-reply runs (rare — most runs do reply).
Option B — Pre-wake auth health check
Before dispatching a wake to an agent process, probe whether the runtime can authenticate. If not, post the system warning BEFORE the user can click Accept on a proposal.
Pros: fails fast at the right moment. No surprise silence post-accept.
Cons: auth-check has its own cost (CLI probe per wake), and non-auth failures (crashes, confused output) still fall through to Option A's problem.
Option C — Surface claude CLI stdout verbatim
Monitor the agent subprocess's stdout/JSONL for ANY user-visible text and route it to the channel as a system-authored sibling message when the agent doesn't use send_message.
Pros: most informative.
Cons: tight coupling to claude CLI's output format. Won't work for other runtimes (codex, gemini, opencode) with different output shapes. Brittle.
Scope
Related
Summary
When an agent run completes without invoking the
send_messageMCP tool (e.g., the claude CLI fails with "Not logged in · Please run /login"), the system logsrun_completed reason=Naturaland posts nothing to the channel. The user — who may have just accepted a task proposal and is actively waiting for the per-task session to start — sees silence and has no way to diagnose what's wrong.This is pre-existing behavior (predates v1 task proposals), but task proposals v2 (#96) amplifies the pain: accepting a proposal is a commitment moment, and silence afterward is maximally confusing.
Reproduction (from v2 dogfood session)
HOME=/tmp/dogfood(isolated, no claude auth state).eng, create agentclaude-ccd4, join both.eng:@claude-ccd4 hey.starting agent via shared bridge,session attached,run completed reason=Natural.$HOME/.claude/projects/…/<session>.jsonl:[ASST text] Not logged in · Please run /login.The claude CLI emitted a user-visible error to its own stdout/JSONL, but because it's not a
send_messagetool call, chorus never routes it to the channel.Root cause
Chorus treats "run completed with zero
send_messagetool uses" as equivalent to "run completed normally." It's not. A healthy run either:send_message(visible response), ORRight now there's no way to distinguish a healthy-decline from a stuck-agent-that-cannot-respond.
Proposed fix options (pick one)
Option A — Post-run empty-response detection (recommended)
After an agent run completes, check whether the run invoked
send_messageat least once. If not, post a system-authored warning message to the triggering channel:Pros: catches all silent-fail modes (auth, crash, confused model, etc.) without needing to enumerate each. Cheap.
Cons: false positives on legitimate no-reply runs (rare — most runs do reply).
Option B — Pre-wake auth health check
Before dispatching a wake to an agent process, probe whether the runtime can authenticate. If not, post the system warning BEFORE the user can click Accept on a proposal.
Pros: fails fast at the right moment. No surprise silence post-accept.
Cons: auth-check has its own cost (CLI probe per wake), and non-auth failures (crashes, confused output) still fall through to Option A's problem.
Option C — Surface claude CLI stdout verbatim
Monitor the agent subprocess's stdout/JSONL for ANY user-visible text and route it to the channel as a system-authored sibling message when the agent doesn't use
send_message.Pros: most informative.
Cons: tight coupling to claude CLI's output format. Won't work for other runtimes (codex, gemini, opencode) with different output shapes. Brittle.
Scope
tracing::error!whendeliver_message_to_agentsfails after commit — same class of observability gap).Related