-
Notifications
You must be signed in to change notification settings - Fork 13
π€ fix: improve status_set tool description and display logic #466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Updated status_set tool description to remind agents to set a final completion status before finishing their response. Examples added: 'β Complete', 'π Done', 'β Finished' This helps users understand when the agent has completed all work and isn't just waiting or still processing. Generated with `cmux`
Models don't understand 'stream' terminology. Changed to: 'The status is cleared at the start of each new response, so you must set it again.' More concise and uses terminology models understand. Generated with `cmux`
## Problem
When status_set validation failed (e.g., invalid emoji), the tool showed
'completed' status in the UI even though it failed. This made users think
validation was silently failing, especially when the status didn't appear
in the sidebar.
## Root Cause
Tool status determination only checked part.state === 'output-available',
which is true even for failed results. It didn't check result.success.
## Solution
1. Enhanced status determination to check result.success for tools that
return { success: boolean } pattern
2. Show 'failed' status when result.success === false
3. Display error message in StatusSetToolCall UI for failed validations
## Testing
- Added unit tests for failed/completed status display
- All 839 tests pass
- Typecheck passes
When validation fails now:
- Tool shows 'failed' status (not 'completed')
- Error message displays in UI
- agentStatus NOT updated (correct existing behavior)
- Clear feedback to user that validation failed
ESLint flagged 'as unknown' as unnecessary since part.output is already typed as unknown. Simplified to use part.output directly.
Extracted hasSuccessResult() and hasFailureResult() helpers to reduce duplication. Both todo_write and status_set use the same pattern to check result.success, and the display logic now uses the same helper. This makes the code more maintainable and consistent.
Updated final status examples to reflect different outcomes: - β Success: 'PR checks pass and ready to merge' - β Failure: 'CreateWorkspace Tests failed' -β οΈ Warning/blocker: 'Encountered serious issue with design' This encourages agents to communicate the actual outcome, not just that work is complete.
ammario
approved these changes
Oct 28, 2025
## Problem Status (and TODOs) weren't persisting across page reloads/workspace switches because loadHistoricalMessages() didn't process tool results to reconstruct derived state. ## Solution Extracted processToolResult() as single source of truth for updating derived state from tool results. Called from: 1. handleToolCallEnd() - live streaming events 2. loadHistoricalMessages() - historical message loading This ensures agentStatus and currentTodos are reconstructed uniformly whether processing live events or historical snapshots. ## Implementation - Extracted 23-line processToolResult() method - Updated handleToolCallEnd() to call shared method (-12 LoC) - Updated loadHistoricalMessages() to process tool parts (+6 LoC) - Added 3 tests for historical message reconstruction ## Benefits - Single source of truth for tool result processing - No special reconstruction logic needed - Easier to extend with new derived state - Leverages existing tool part architecture ## Testing - Added 3 new tests for historical message scenarios - All 842 tests pass - Typecheck and lint pass Net: ~17 LoC
ammario
approved these changes
Oct 28, 2025
## Problem Previous implementation cleared state at different times: - TODOs: Cleared on stream-end (via cleanupStreamState) - Status: Cleared on stream-start This created historical/live disparity because stream events aren't persisted in chat.jsonl. After page reload, TODOs would stick around (desirable!) but the implementation was inconsistent. ## Solution Clear BOTH todos and agentStatus when new user message arrives. Why user messages? - β Persisted in chat.jsonl (unlike stream events) - β Consistent live/historical behavior - β Semantic: New question = new task = clear previous state ## Changes 1. Removed TODO clearing from cleanupStreamState() 2. Removed status clearing from handleStreamStart() 3. Added centralized clearing in handleMessage() when user message arrives 4. Updated test: 'stream-start' β 'new user message' ## Benefits - Consistent behavior whether loading from history or processing live - TODOs persist until next question (user preference!) - Simpler: One place to clear state, not scattered across handlers - Architecture: Relies on persisted data, not ephemeral events ## Testing - All 842 tests pass - Updated test reflects new clearing behavior - Typecheck and lint pass
ammario
approved these changes
Oct 28, 2025
Member
|
Depot runners are having issues per their status, so force-merging. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Fixes two critical bugs with the
status_settool and refines completion status examples.Problems Fixed
1. Validation failures showed as 'completed' β instead of 'failed' β
When
status_setvalidation failed (e.g., invalid emoji), the tool displayed 'completed' status even though it failed. This made users think validation was silently failing, especially when the status didn't appear in the sidebar.Root cause: Tool status determination only checked
part.state === 'output-available'(meaning the tool returned a result), but didn't check whether that result indicated success or failure viaresult.success.2. Status didn't persist across page reloads or workspace switches
Workspaces with successful
status_setcalls would lose their status after reload becauseloadHistoricalMessages()didn't reconstruct derived state from tool results.Root cause: Derived state (agentStatus, currentTodos) was only updated during live streaming events, not when loading historical messages.
Solutions
1. Show 'failed' status for validation errors
result.successfor tools returning{ success: boolean }hasSuccessResult()andhasFailureResult()helpers to eliminate duplication2. Shared tool result processing
Extracted
processToolResult()as single source of truth for updating derived state. Called from:handleToolCallEnd()- live streaming eventsloadHistoricalMessages()- historical message loadingThis ensures agentStatus and currentTodos are reconstructed uniformly whether processing live events or historical snapshots.
3. Refined completion status examples
Replaced generic examples with outcome-specific ones that show variety:
Encourages agents to communicate actual outcomes, not just completion.
Implementation Details
Status determination (StreamingMessageAggregator.ts:772-789):
Shared result processing (StreamingMessageAggregator.ts:530-552):
Historical message processing (StreamingMessageAggregator.ts:199-213):
Testing
New tests:
Before/After
Validation Failure Display
Before:
After:
Status Persistence
Before:
After:
Architecture Benefits
processToolResult()Generated with
cmux