Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

@ammar-agent ammar-agent commented Oct 28, 2025

Overview

Fixes two critical bugs with the status_set tool and refines completion status examples.

Problems Fixed

1. Validation failures showed as 'completed' βœ“ instead of 'failed' βœ—

When status_set validation failed (e.g., invalid emoji), the tool displayed 'completed' status even though it failed. This made users think validation was silently failing, especially when the status didn't appear in the sidebar.

Root cause: Tool status determination only checked part.state === 'output-available' (meaning the tool returned a result), but didn't check whether that result indicated success or failure via result.success.

2. Status didn't persist across page reloads or workspace switches

Workspaces with successful status_set calls would lose their status after reload because loadHistoricalMessages() didn't reconstruct derived state from tool results.

Root cause: Derived state (agentStatus, currentTodos) was only updated during live streaming events, not when loading historical messages.

Solutions

1. Show 'failed' status for validation errors

  • Enhanced status determination to check result.success for tools returning { success: boolean }
  • Extracted hasSuccessResult() and hasFailureResult() helpers to eliminate duplication
  • Display error message in StatusSetToolCall UI when validation fails

2. Shared tool result processing

Extracted processToolResult() as single source of truth for updating derived state. Called from:

  1. handleToolCallEnd() - live streaming events
  2. loadHistoricalMessages() - historical message loading

This ensures agentStatus and currentTodos are reconstructed uniformly whether processing live events or historical snapshots.

3. Refined completion status examples

Replaced generic examples with outcome-specific ones that show variety:

  • βœ… Success: 'PR checks pass and ready to merge'
  • ❌ Failure: 'CreateWorkspace Tests failed'
  • ⚠️ Warning: 'Encountered serious issue with design'

Encourages agents to communicate actual outcomes, not just completion.

Implementation Details

Status determination (StreamingMessageAggregator.ts:772-789):

if (part.state === "output-available") {
  status = hasFailureResult(part.output) ? "failed" : "completed";
}

Shared result processing (StreamingMessageAggregator.ts:530-552):

private processToolResult(toolName: string, input: unknown, output: unknown): void {
  if (toolName === "todo_write" && hasSuccessResult(output)) {
    // Update todos...
  }
  if (toolName === "status_set" && hasSuccessResult(output)) {
    // Update agentStatus...
  }
}

Historical message processing (StreamingMessageAggregator.ts:199-213):

loadHistoricalMessages(messages: CmuxMessage[]): void {
  for (const message of messages) {
    this.messages.set(message.id, message);
    if (message.role === "assistant") {
      for (const part of message.parts) {
        if (isDynamicToolPart(part) && part.state === "output-available") {
          this.processToolResult(part.toolName, part.input, part.output);
        }
      }
    }
  }
}

Testing

  • Added 5 new unit tests for status persistence and validation display
  • All 842 tests pass
  • Typecheck and lint pass

New tests:

  1. Show 'failed' status when validation fails
  2. Show 'completed' status when validation succeeds
  3. Reconstruct agentStatus from historical messages
  4. Use most recent status when loading multiple messages
  5. Don't reconstruct from failed status_set

Before/After

Validation Failure Display

Before:

status_set({ emoji: "not-emoji", message: "test" })
β†’ Shows: completed βœ“
β†’ Sidebar: No status appears
β†’ User: "Is validation failing silently?" πŸ€”

After:

status_set({ emoji: "not-emoji", message: "test" })
β†’ Shows: failed βœ— (emoji must be a single emoji character)
β†’ Sidebar: No status appears (correct - validation failed)
β†’ User: Clear feedback! ✨

Status Persistence

Before:

1. Agent sets status: "πŸ” Analyzing code"
2. Reload page or switch workspace
3. Status disappears (even though it was successful)

After:

1. Agent sets status: "πŸ” Analyzing code"
2. Reload page or switch workspace
3. Status persists βœ…

Architecture Benefits

  1. Single source of truth: One method processes tool results, regardless of source
  2. Leverages existing data: Tool parts already contain everything we need
  3. No special reconstruction logic: Just process parts uniformly
  4. Easier to extend: Adding new derived state only requires updating processToolResult()
  5. Matches architecture: Historical messages are complete snapshots

Generated with cmux

Updated status_set tool description to remind agents to set a final
completion status before finishing their response.

Examples added: 'βœ… Complete', 'πŸŽ‰ Done', 'βœ“ Finished'

This helps users understand when the agent has completed all work
and isn't just waiting or still processing.

Generated with `cmux`
Models don't understand 'stream' terminology. Changed to:
'The status is cleared at the start of each new response, so you must set it again.'

More concise and uses terminology models understand.

Generated with `cmux`
## Problem
When status_set validation failed (e.g., invalid emoji), the tool showed
'completed' status in the UI even though it failed. This made users think
validation was silently failing, especially when the status didn't appear
in the sidebar.

## Root Cause
Tool status determination only checked part.state === 'output-available',
which is true even for failed results. It didn't check result.success.

## Solution
1. Enhanced status determination to check result.success for tools that
   return { success: boolean } pattern
2. Show 'failed' status when result.success === false
3. Display error message in StatusSetToolCall UI for failed validations

## Testing
- Added unit tests for failed/completed status display
- All 839 tests pass
- Typecheck passes

When validation fails now:
- Tool shows 'failed' status (not 'completed')
- Error message displays in UI
- agentStatus NOT updated (correct existing behavior)
- Clear feedback to user that validation failed
ESLint flagged 'as unknown' as unnecessary since part.output is already
typed as unknown. Simplified to use part.output directly.
Extracted hasSuccessResult() and hasFailureResult() helpers to reduce
duplication. Both todo_write and status_set use the same pattern to check
result.success, and the display logic now uses the same helper.

This makes the code more maintainable and consistent.
Updated final status examples to reflect different outcomes:
- βœ… Success: 'PR checks pass and ready to merge'
- ❌ Failure: 'CreateWorkspace Tests failed'
- ⚠️ Warning/blocker: 'Encountered serious issue with design'

This encourages agents to communicate the actual outcome, not just
that work is complete.
## Problem
Status (and TODOs) weren't persisting across page reloads/workspace switches
because loadHistoricalMessages() didn't process tool results to reconstruct
derived state.

## Solution
Extracted processToolResult() as single source of truth for updating derived
state from tool results. Called from:
1. handleToolCallEnd() - live streaming events
2. loadHistoricalMessages() - historical message loading

This ensures agentStatus and currentTodos are reconstructed uniformly whether
processing live events or historical snapshots.

## Implementation
- Extracted 23-line processToolResult() method
- Updated handleToolCallEnd() to call shared method (-12 LoC)
- Updated loadHistoricalMessages() to process tool parts (+6 LoC)
- Added 3 tests for historical message reconstruction

## Benefits
- Single source of truth for tool result processing
- No special reconstruction logic needed
- Easier to extend with new derived state
- Leverages existing tool part architecture

## Testing
- Added 3 new tests for historical message scenarios
- All 842 tests pass
- Typecheck and lint pass

Net: ~17 LoC
@ammar-agent ammar-agent changed the title πŸ€– Encourage agents to set final status before completing πŸ€– fix: Show validation failures & persist status across reloads Oct 28, 2025
## Problem
Previous implementation cleared state at different times:
- TODOs: Cleared on stream-end (via cleanupStreamState)
- Status: Cleared on stream-start

This created historical/live disparity because stream events aren't persisted
in chat.jsonl. After page reload, TODOs would stick around (desirable!) but
the implementation was inconsistent.

## Solution
Clear BOTH todos and agentStatus when new user message arrives.

Why user messages?
- βœ… Persisted in chat.jsonl (unlike stream events)
- βœ… Consistent live/historical behavior
- βœ… Semantic: New question = new task = clear previous state

## Changes
1. Removed TODO clearing from cleanupStreamState()
2. Removed status clearing from handleStreamStart()
3. Added centralized clearing in handleMessage() when user message arrives
4. Updated test: 'stream-start' β†’ 'new user message'

## Benefits
- Consistent behavior whether loading from history or processing live
- TODOs persist until next question (user preference!)
- Simpler: One place to clear state, not scattered across handlers
- Architecture: Relies on persisted data, not ephemeral events

## Testing
- All 842 tests pass
- Updated test reflects new clearing behavior
- Typecheck and lint pass
@ammario ammario changed the title πŸ€– fix: Show validation failures & persist status across reloads πŸ€– fix: improve status_set tool description and display logic Oct 28, 2025
@ammario
Copy link
Member

ammario commented Oct 28, 2025

Depot runners are having issues per their status, so force-merging.

@ammario ammario merged commit 1202238 into main Oct 28, 2025
22 of 25 checks passed
@ammario ammario deleted the status-set-final-reminder branch October 28, 2025 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants