Agent silent-fail: run completes without send_message, user sees nothing

## Summary

When an agent run completes without invoking the `send_message` MCP tool (e.g., the claude CLI fails with "Not logged in · Please run /login"), the system logs `run_completed reason=Natural` and posts nothing to the channel. The user — who may have just accepted a task proposal and is actively waiting for the per-task session to start — sees silence and has no way to diagnose what's wrong.

This is pre-existing behavior (predates v1 task proposals), but task proposals v2 (#96) amplifies the pain: accepting a proposal is a commitment moment, and silence afterward is maximally confusing.

## Reproduction (from v2 dogfood session)

1. Start chorus with `HOME=/tmp/dogfood` (isolated, no claude auth state).
2. Create channel `eng`, create agent `claude-ccd4`, join both.
3. Post a message in `eng`: `@claude-ccd4 hey`.
4. Observe agent wake: server log shows `starting agent via shared bridge`, `session attached`, `run completed reason=Natural`.
5. Observe channel: no reply from the agent.
6. Inspect `$HOME/.claude/projects/…/<session>.jsonl`: `[ASST text] Not logged in · Please run /login`.

The claude CLI emitted a user-visible error to its own stdout/JSONL, but because it's not a `send_message` tool call, chorus never routes it to the channel.

## Root cause

Chorus treats "run completed with zero `send_message` tool uses" as equivalent to "run completed normally." It's not. A healthy run either:

- Produces chat content via `send_message` (visible response), OR
- Explicitly declines to respond (e.g., agent judges the message doesn't warrant a reply).

Right now there's no way to distinguish a healthy-decline from a stuck-agent-that-cannot-respond.

## Proposed fix options (pick one)

### Option A — Post-run empty-response detection (recommended)

After an agent run completes, check whether the run invoked `send_message` at least once. If not, post a system-authored warning message to the triggering channel:

> ⚠️ `@<agent>` completed a run without replying. Common causes: not authed, auth expired, runtime error. Check agent logs.

Pros: catches all silent-fail modes (auth, crash, confused model, etc.) without needing to enumerate each. Cheap.
Cons: false positives on legitimate no-reply runs (rare — most runs do reply).

### Option B — Pre-wake auth health check

Before dispatching a wake to an agent process, probe whether the runtime can authenticate. If not, post the system warning BEFORE the user can click Accept on a proposal.

Pros: fails fast at the right moment. No surprise silence post-accept.
Cons: auth-check has its own cost (CLI probe per wake), and non-auth failures (crashes, confused output) still fall through to Option A's problem.

### Option C — Surface claude CLI stdout verbatim

Monitor the agent subprocess's stdout/JSONL for ANY user-visible text and route it to the channel as a system-authored sibling message when the agent doesn't use `send_message`.

Pros: most informative.
Cons: tight coupling to claude CLI's output format. Won't work for other runtimes (codex, gemini, opencode) with different output shapes. Brittle.

## Scope

- Not blocking #96 (task proposals v2). Captured here so it can be picked up separately.
- Also references the pre-existing wake-failure silent-fail TODO (Chorus doesn't log `tracing::error!` when `deliver_message_to_agents` fails after commit — same class of observability gap).

## Related

- #96 — v2 task proposals (amplifies this UX gap post-accept).
- PR #93 v1 TODO notes: wake-failure logging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent silent-fail: run completes without send_message, user sees nothing #97

Summary

Reproduction (from v2 dogfood session)

Root cause

Proposed fix options (pick one)

Option A — Post-run empty-response detection (recommended)

Option B — Pre-wake auth health check

Option C — Surface claude CLI stdout verbatim

Scope

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Agent silent-fail: run completes without send_message, user sees nothing #97

Description

Summary

Reproduction (from v2 dogfood session)

Root cause

Proposed fix options (pick one)

Option A — Post-run empty-response detection (recommended)

Option B — Pre-wake auth health check

Option C — Surface claude CLI stdout verbatim

Scope

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions