Skip to content

Agent silent-fail: run completes without send_message, user sees nothing #97

@Fullstop000

Description

@Fullstop000

Summary

When an agent run completes without invoking the send_message MCP tool (e.g., the claude CLI fails with "Not logged in · Please run /login"), the system logs run_completed reason=Natural and posts nothing to the channel. The user — who may have just accepted a task proposal and is actively waiting for the per-task session to start — sees silence and has no way to diagnose what's wrong.

This is pre-existing behavior (predates v1 task proposals), but task proposals v2 (#96) amplifies the pain: accepting a proposal is a commitment moment, and silence afterward is maximally confusing.

Reproduction (from v2 dogfood session)

  1. Start chorus with HOME=/tmp/dogfood (isolated, no claude auth state).
  2. Create channel eng, create agent claude-ccd4, join both.
  3. Post a message in eng: @claude-ccd4 hey.
  4. Observe agent wake: server log shows starting agent via shared bridge, session attached, run completed reason=Natural.
  5. Observe channel: no reply from the agent.
  6. Inspect $HOME/.claude/projects/…/<session>.jsonl: [ASST text] Not logged in · Please run /login.

The claude CLI emitted a user-visible error to its own stdout/JSONL, but because it's not a send_message tool call, chorus never routes it to the channel.

Root cause

Chorus treats "run completed with zero send_message tool uses" as equivalent to "run completed normally." It's not. A healthy run either:

  • Produces chat content via send_message (visible response), OR
  • Explicitly declines to respond (e.g., agent judges the message doesn't warrant a reply).

Right now there's no way to distinguish a healthy-decline from a stuck-agent-that-cannot-respond.

Proposed fix options (pick one)

Option A — Post-run empty-response detection (recommended)

After an agent run completes, check whether the run invoked send_message at least once. If not, post a system-authored warning message to the triggering channel:

⚠️ @<agent> completed a run without replying. Common causes: not authed, auth expired, runtime error. Check agent logs.

Pros: catches all silent-fail modes (auth, crash, confused model, etc.) without needing to enumerate each. Cheap.
Cons: false positives on legitimate no-reply runs (rare — most runs do reply).

Option B — Pre-wake auth health check

Before dispatching a wake to an agent process, probe whether the runtime can authenticate. If not, post the system warning BEFORE the user can click Accept on a proposal.

Pros: fails fast at the right moment. No surprise silence post-accept.
Cons: auth-check has its own cost (CLI probe per wake), and non-auth failures (crashes, confused output) still fall through to Option A's problem.

Option C — Surface claude CLI stdout verbatim

Monitor the agent subprocess's stdout/JSONL for ANY user-visible text and route it to the channel as a system-authored sibling message when the agent doesn't use send_message.

Pros: most informative.
Cons: tight coupling to claude CLI's output format. Won't work for other runtimes (codex, gemini, opencode) with different output shapes. Brittle.

Scope

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions