🤖 fix: improve status_set tool description and display logic #466

ammar-agent · 2025-10-28T17:35:11Z

Overview

Fixes two critical bugs with the status_set tool and refines completion status examples.

Problems Fixed

1. Validation failures showed as 'completed' ✓ instead of 'failed' ✗

When status_set validation failed (e.g., invalid emoji), the tool displayed 'completed' status even though it failed. This made users think validation was silently failing, especially when the status didn't appear in the sidebar.

Root cause: Tool status determination only checked part.state === 'output-available' (meaning the tool returned a result), but didn't check whether that result indicated success or failure via result.success.

2. Status didn't persist across page reloads or workspace switches

Workspaces with successful status_set calls would lose their status after reload because loadHistoricalMessages() didn't reconstruct derived state from tool results.

Root cause: Derived state (agentStatus, currentTodos) was only updated during live streaming events, not when loading historical messages.

Solutions

1. Show 'failed' status for validation errors

Enhanced status determination to check result.success for tools returning { success: boolean }
Extracted hasSuccessResult() and hasFailureResult() helpers to eliminate duplication
Display error message in StatusSetToolCall UI when validation fails

2. Shared tool result processing

Extracted processToolResult() as single source of truth for updating derived state. Called from:

handleToolCallEnd() - live streaming events
loadHistoricalMessages() - historical message loading

This ensures agentStatus and currentTodos are reconstructed uniformly whether processing live events or historical snapshots.

3. Refined completion status examples

Replaced generic examples with outcome-specific ones that show variety:

✅ Success: 'PR checks pass and ready to merge'
❌ Failure: 'CreateWorkspace Tests failed'
⚠️ Warning: 'Encountered serious issue with design'

Encourages agents to communicate actual outcomes, not just completion.

Implementation Details

Status determination (StreamingMessageAggregator.ts:772-789):

if (part.state === "output-available") {
  status = hasFailureResult(part.output) ? "failed" : "completed";
}

Shared result processing (StreamingMessageAggregator.ts:530-552):

private processToolResult(toolName: string, input: unknown, output: unknown): void {
  if (toolName === "todo_write" && hasSuccessResult(output)) {
    // Update todos...
  }
  if (toolName === "status_set" && hasSuccessResult(output)) {
    // Update agentStatus...
  }
}

Historical message processing (StreamingMessageAggregator.ts:199-213):

loadHistoricalMessages(messages: CmuxMessage[]): void {
  for (const message of messages) {
    this.messages.set(message.id, message);
    if (message.role === "assistant") {
      for (const part of message.parts) {
        if (isDynamicToolPart(part) && part.state === "output-available") {
          this.processToolResult(part.toolName, part.input, part.output);
        }
      }
    }
  }
}

Testing

Added 5 new unit tests for status persistence and validation display
All 842 tests pass
Typecheck and lint pass

New tests:

Show 'failed' status when validation fails
Show 'completed' status when validation succeeds
Reconstruct agentStatus from historical messages
Use most recent status when loading multiple messages
Don't reconstruct from failed status_set

Before/After

Validation Failure Display

Before:

status_set({ emoji: "not-emoji", message: "test" })
→ Shows: completed ✓
→ Sidebar: No status appears
→ User: "Is validation failing silently?" 🤔

After:

status_set({ emoji: "not-emoji", message: "test" })
→ Shows: failed ✗ (emoji must be a single emoji character)
→ Sidebar: No status appears (correct - validation failed)
→ User: Clear feedback! ✨

Status Persistence

Before:

1. Agent sets status: "🔍 Analyzing code"
2. Reload page or switch workspace
3. Status disappears (even though it was successful)

After:

1. Agent sets status: "🔍 Analyzing code"
2. Reload page or switch workspace
3. Status persists ✅

Architecture Benefits

Single source of truth: One method processes tool results, regardless of source
Leverages existing data: Tool parts already contain everything we need
No special reconstruction logic: Just process parts uniformly
Easier to extend: Adding new derived state only requires updating processToolResult()
Matches architecture: Historical messages are complete snapshots

Generated with cmux

Updated status_set tool description to remind agents to set a final completion status before finishing their response. Examples added: '✅ Complete', '🎉 Done', '✓ Finished' This helps users understand when the agent has completed all work and isn't just waiting or still processing. Generated with `cmux`

Models don't understand 'stream' terminology. Changed to: 'The status is cleared at the start of each new response, so you must set it again.' More concise and uses terminology models understand. Generated with `cmux`

## Problem When status_set validation failed (e.g., invalid emoji), the tool showed 'completed' status in the UI even though it failed. This made users think validation was silently failing, especially when the status didn't appear in the sidebar. ## Root Cause Tool status determination only checked part.state === 'output-available', which is true even for failed results. It didn't check result.success. ## Solution 1. Enhanced status determination to check result.success for tools that return { success: boolean } pattern 2. Show 'failed' status when result.success === false 3. Display error message in StatusSetToolCall UI for failed validations ## Testing - Added unit tests for failed/completed status display - All 839 tests pass - Typecheck passes When validation fails now: - Tool shows 'failed' status (not 'completed') - Error message displays in UI - agentStatus NOT updated (correct existing behavior) - Clear feedback to user that validation failed

ESLint flagged 'as unknown' as unnecessary since part.output is already typed as unknown. Simplified to use part.output directly.

Extracted hasSuccessResult() and hasFailureResult() helpers to reduce duplication. Both todo_write and status_set use the same pattern to check result.success, and the display logic now uses the same helper. This makes the code more maintainable and consistent.

Updated final status examples to reflect different outcomes: - ✅ Success: 'PR checks pass and ready to merge' - ❌ Failure: 'CreateWorkspace Tests failed' - ⚠️ Warning/blocker: 'Encountered serious issue with design' This encourages agents to communicate the actual outcome, not just that work is complete.

## Problem Status (and TODOs) weren't persisting across page reloads/workspace switches because loadHistoricalMessages() didn't process tool results to reconstruct derived state. ## Solution Extracted processToolResult() as single source of truth for updating derived state from tool results. Called from: 1. handleToolCallEnd() - live streaming events 2. loadHistoricalMessages() - historical message loading This ensures agentStatus and currentTodos are reconstructed uniformly whether processing live events or historical snapshots. ## Implementation - Extracted 23-line processToolResult() method - Updated handleToolCallEnd() to call shared method (-12 LoC) - Updated loadHistoricalMessages() to process tool parts (+6 LoC) - Added 3 tests for historical message reconstruction ## Benefits - Single source of truth for tool result processing - No special reconstruction logic needed - Easier to extend with new derived state - Leverages existing tool part architecture ## Testing - Added 3 new tests for historical message scenarios - All 842 tests pass - Typecheck and lint pass Net: ~17 LoC

## Problem Previous implementation cleared state at different times: - TODOs: Cleared on stream-end (via cleanupStreamState) - Status: Cleared on stream-start This created historical/live disparity because stream events aren't persisted in chat.jsonl. After page reload, TODOs would stick around (desirable!) but the implementation was inconsistent. ## Solution Clear BOTH todos and agentStatus when new user message arrives. Why user messages? - ✅ Persisted in chat.jsonl (unlike stream events) - ✅ Consistent live/historical behavior - ✅ Semantic: New question = new task = clear previous state ## Changes 1. Removed TODO clearing from cleanupStreamState() 2. Removed status clearing from handleStreamStart() 3. Added centralized clearing in handleMessage() when user message arrives 4. Updated test: 'stream-start' → 'new user message' ## Benefits - Consistent behavior whether loading from history or processing live - TODOs persist until next question (user preference!) - Simpler: One place to clear state, not scattered across handlers - Architecture: Relies on persisted data, not ephemeral events ## Testing - All 842 tests pass - Updated test reflects new clearing behavior - Typecheck and lint pass

ammario · 2025-10-28T18:48:48Z

Depot runners are having issues per their status, so force-merging.

ammar-agent added 6 commits October 28, 2025 17:35

🤖 Replace 'stream' with 'response' in tool description

f802ab7

Models don't understand 'stream' terminology. Changed to: 'The status is cleared at the start of each new response, so you must set it again.' More concise and uses terminology models understand. Generated with `cmux`

🤖 fix: Remove unnecessary type assertion

cf1603b

ESLint flagged 'as unknown' as unnecessary since part.output is already typed as unknown. Simplified to use part.output directly.

ammario approved these changes Oct 28, 2025

View reviewed changes

ammar-agent added 2 commits October 28, 2025 18:00

🤖 fix: Run prettier on StreamingMessageAggregator.ts

56ccbf6

ammar-agent changed the title ~~🤖 Encourage agents to set final status before completing~~ 🤖 fix: Show validation failures & persist status across reloads Oct 28, 2025

ammario approved these changes Oct 28, 2025

View reviewed changes

ammar-agent added 2 commits October 28, 2025 18:19

🤖 fix: Run prettier on StreamingMessageAggregator.ts

9eef8ae

ammario changed the title ~~🤖 fix: Show validation failures & persist status across reloads~~ 🤖 fix: improve status_set tool description and display logic Oct 28, 2025

ammario approved these changes Oct 28, 2025

View reviewed changes

ammario merged commit 1202238 into main Oct 28, 2025
22 of 25 checks passed

ammario deleted the status-set-final-reminder branch October 28, 2025 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 fix: improve status_set tool description and display logic #466

🤖 fix: improve status_set tool description and display logic #466

Uh oh!

ammar-agent commented Oct 28, 2025 •

edited

Loading

Uh oh!

ammario commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🤖 fix: improve status_set tool description and display logic #466

🤖 fix: improve status_set tool description and display logic #466

Uh oh!

Conversation

ammar-agent commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Problems Fixed

1. Validation failures showed as 'completed' ✓ instead of 'failed' ✗

2. Status didn't persist across page reloads or workspace switches

Solutions

1. Show 'failed' status for validation errors

2. Shared tool result processing

3. Refined completion status examples

Implementation Details

Testing

Before/After

Validation Failure Display

Status Persistence

Architecture Benefits

Uh oh!

ammario commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ammar-agent commented Oct 28, 2025 •

edited

Loading