- 
                Notifications
    
You must be signed in to change notification settings  - Fork 36
 
refactor: standardize Observation base class #929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…atus helpers - Add ObservationStatus enum and result_status property - Add standardized output and error fields with default to_llm_content - Keep subclasses backward compatible by using bool(error) in has_error Co-authored-by: openhands <openhands@all-hands.dev>
…mpat - FinishObservation: rename message->output, add message alias; rely on base to_llm_content/visualize - ThinkObservation: rename content->output, add content alias; rely on base defaults - MCPToolObservation: adopt base error semantics while preserving is_error; unify has_error and formatting - TaskTrackerObservation: rename content->output; use base to_llm_content; keep rich visualize - FileEditorObservation: standardize error formatting prefix as 'Error: ' This aligns all observation classes with the unified Observation base (output: str, error: str|None, has_error, result_status) and removes duplicated logic where possible. Co-authored-by: openhands <openhands@all-hands.dev>
          
Coverage Report •
  | 
    |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
- Fix remaining assertion in test_registry to use .output - Ensure subclass output fields provide defaults to satisfy pyright - Adjust ExecuteBashObservation output default Co-authored-by: openhands <openhands@all-hands.dev>
…ed Observation Co-authored-by: openhands <openhands@all-hands.dev>
… base to_llm_content; update tests Co-authored-by: openhands <openhands@all-hands.dev>
| 
           @OpenHands Let's add a small modification: Also let's be more consistent with how observations sub-class the base observation observations should not override 'to_llm_content': to_llm_content is flexible enough so that any error message or any content can be sent back to the llm  | 
    
| 
           I'm on it! simonrosenberg can track my progress at all-hands.dev  | 
    
- Update _format_error to use 'Tool Execution Error:' prefix - Remove command field from DelegateObservation and TaskTrackerObservation - Remove unnecessary to_llm_content override in DelegateObservation - Update delegate/impl.py to use error field for errors (not output) - Prefix success messages with command name (spawn:/delegate:/plan:/view:) - Update TaskTrackerExecutor to handle errors via error field - Update tests to match new observation structure - Ensure output is not populated when error is present These changes make observations more consistent by: 1. Eliminating the redundant command field 2. Ensuring errors and outputs are mutually exclusive 3. Relying on base class to_llm_content when possible 4. Keeping command context in the message text when needed Co-authored-by: openhands <openhands@all-hands.dev>
          SummaryI've successfully improved observation consistency across the codebase as requested. The changes have been committed and pushed to PR #929 ( Changes Made✅ All requested modifications completed: 
 Files Modified
 Test Results✅ All 77 SDK tool tests pass The code is now more consistent, with clear separation between errors and outputs, and observations that follow a standardized pattern.  | 
    
- Remove custom to_llm_content override from MCPToolObservation - Rename convrted_content to converted_content (typo fix) - Populate error field with full content when MCP result has error - Populate output field with formatted content for successful results - Update tests to match new behavior (single TextContent vs list) - Error messages now include: header + error notice + content - Success messages now include: header + content This makes MCPToolObservation consistent with other observations by relying on the base class to_llm_content and properly separating error vs output fields. Co-authored-by: openhands <openhands@all-hands.dev>
- Remove output field from multiple commands error case - Move full error message to error field instead - Update test to check error field instead of output field - Fix line length by extracting commands_list variable This makes ExecuteBashObservation consistent with the pattern that errors should use the error field, not the output field. Co-authored-by: openhands <openhands@all-hands.dev>
The FileEditorObservation.to_llm_content was simply returning error or output, which is exactly what the base Observation class now provides by default. Removing this override allows the observation to use the standardized base implementation. Co-authored-by: openhands <openhands@all-hands.dev>
…interface - Restored 'command' field to TaskTrackerObservation - Changed 'output' back to 'content' field - Restored original visualize method that uses command field to determine styling - Added custom to_llm_content implementation - Updated executor to populate command field in observations This preserves the original interface while maintaining consistency with the base observation standardization changes. Co-authored-by: openhands <openhands@all-hands.dev>
…o_llm_content
- Added command: str | None field to base Observation class
- Updated to_llm_content to prepend 'Command: {command}' when command is present
- Removed to_llm_content overrides from simple observations (Glob, Grep, TaskTracker)
- Updated executors to populate output field instead of defining custom to_llm_content
- Kept complex to_llm_content overrides (Browser, ExecuteBash) that handle images and metadata
Co-authored-by: openhands <openhands@all-hands.dev>
    - Remove 'output', 'error', 'command' field redeclarations from observation subclasses where possible - Update MCPToolObservation to use output field instead of _content field - Store text content in output, images in separate images field - Update MCP tests to match new structure - Keep FileEditorObservation.command field (required for internal logic) - When error occurs, populate only error field, not output Co-authored-by: openhands <openhands@all-hands.dev>
- Update ThinkExecutor to set output='Your thought has been logged.' - Update FinishExecutor to set output=action.message - Fix test_view_batch_atomicity.py to use output instead of content - Fix test_mcp_tool_kind_field.py import issues - Add Sequence import to glob and grep definitions These changes ensure observations consistently use the base output field rather than custom fields, aligning with the standardization effort. Co-authored-by: openhands <openhands@all-hands.dev>
- Changed base Observation._format_error to use 'Tool Execution Error:' prefix - Ensured observations don't populate output when there is an error - Removed command parameter from ExecuteBashObservation instantiations - Updated ExecuteBashObservation to override output field as str - Updated ExecuteBashObservation.to_llm_content to return error without output - Fixed terminal_session.py to only populate error field in error cases - Removed 'ERROR:' prefix from error messages (base class adds prefix) - Updated test_bash_session.py to check error field instead of output - Updated all observation tests to work with new error handling Co-authored-by: openhands <openhands@all-hands.dev>
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
- Removed custom content field, use base class output field instead - Converted content items are now stored directly in output field - Error messages include all content combined into error string - Removed unnecessary complexity from from_call_tool_result - Updated tests to match new structure (header + content items) Co-authored-by: openhands <openhands@all-hands.dev>
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
| 
           @OpenHands MCP For openhands-sdk/openhands/sdk/tool/builtins/finish.py For openhands-tools/openhands/tools/delegate/definition.py openhands-tools/openhands/tools/delegate/impl.py For openhands-tools/openhands/tools/execute_bash/terminal/terminal_session.py For file Editor openhands-tools/openhands/tools/file_editor/definition.py For task tracker Make sure all tests and pre-commit hooks are passing  | 
    
| 
           I'm on it! simonrosenberg can track my progress at all-hands.dev  | 
    
…fields - Updated all Observation subclasses to use standardized output and error fields - Added command property to all observations with type narrowing where needed - MCP observations: removed redundant to_llm_content overrides, removed error header duplication - FinishObservation/ThinkObservation: empty output list as specified - DelegateObservation: added cmd field and command property with Literal type - ExecuteBashObservation: added cmd field and command property, use raw_output for command output - FileEditorObservation: added cmd field with Literal type narrowing - TaskTrackerObservation: added cmd field with Literal type narrowing - BrowserObservation: wrapped output strings in TextContent - GrepObservation/GlobObservation: wrapped output strings in TextContent - Updated all tests to use new field names (cmd, raw_output) and TextContent wrapping Co-authored-by: openhands <openhands@all-hands.dev>
| 
           @OpenHands there are failing actions in this PR. Please fix them =========================== short test summary info ============================ 0 errors, 0 warnings, 0 informations  | 
    
| 
           I'm on it! simonrosenberg can track my progress at all-hands.dev  | 
    
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
- Added raw_output property to ExecuteBashObservation for backward compatibility - Property extracts text from first TextContent item in output field - Enables tests and code to access raw output text seamlessly - Updated test files to use correct observation field names - Changed test_stuck_detector.py to use output=[TextContent(text=...)] instead of raw_output= - Tests can still access raw_output via the property for assertions - All previously failing tests now pass: - test_mask_secrets - test_mask_changing_secrets - test_masking_persists - test_history_too_short - test_repeating_action_observation_not_stuck_less_than_4_repeats - test_repeating_action_observation_stuck - test_not_stuck_with_different_actions - test_reset_after_user_message Co-authored-by: openhands <openhands@all-hands.dev>
          SummaryI've successfully fixed all 8 failing tests in PR #929. The changes have been committed and pushed to the  Problem IdentifiedThe PR standardized the  
 However, tests were still using the old  
 Solution Implemented1. Added  
 2. Updated test instantiations (tests/cross/test_stuck_detector.py) 
 Verification✅ All 8 previously failing tests now pass: 
 ✅ All 16 tests in affected files pass ✅ Pre-commit checks pass (ruff format, ruff lint, pycodestyle, pyright) The changes are minimal, focused, and maintain backward compatibility while adhering to the new standardized Observation base class pattern.  | 
    
| 
           @enyst  I dont know how I feel about typing  
  | 
    
- Removed command property and to_llm_content serialization from base Observation - Renamed cmd field to command in all Observation subclasses: * ExecuteBashObservation * FileEditorObservation * DelegateObservation * TaskTrackerObservation - Updated all executor implementations to use command instead of cmd - Updated test files to use command field consistently - Fixed subprocess.TimeoutExpired to use correct parameter name (cmd) Co-authored-by: openhands <openhands@all-hands.dev>
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
- Updated PlanningFileEditorObservation initialization to use command field - Aligns with FileEditorObservation parent class field naming Co-authored-by: openhands <openhands@all-hands.dev>
- Fixed model_copy update to use 'command' field instead of 'cmd' - Ensures [RESET] prefix is properly included in command field Co-authored-by: openhands <openhands@all-hands.dev>
| 
           @OpenHands  /codereview-roasted  | 
    
| 
           I'm on it! simonrosenberg can track my progress at all-hands.dev  | 
    
          SummaryI've completed a critical code review of PR #929 focusing on the  Key Findings🔴 Critical Issues Identified:
 📋 Recommended Solution:Remove the  class Observation(Schema, ABC):
    """Base schema for output observation."""
    
    # ✅ Keep: Standardized error handling
    error: str | None = Field(default=None, ...)
    
    @property
    def has_error(self) -> bool:
        return bool(self.error)
    
    @property
    def result_status(self) -> ObservationStatus:
        return ObservationStatus.ERROR if self.has_error else ObservationStatus.SUCCESS
    
    # ✅ Keep: Interface contract
    @property
    @abstractmethod
    def to_llm_content(self) -> Sequence[TextContent | ImageContent]:
        """Subclasses implement their own formatting."""Let each observation use natural data structures: 
 🎯 Core Insight:The refactor standardizes the wrong thing: 
 "Bad programmers worry about the code. Good programmers worry about data structures." The current  No code changes were made - this was a code review task as requested with the   | 
    
- Added Observation.text_output() static method for easy text wrapping - Added output_as_text property for easy text extraction - Updated ExecuteBash and Glob to use new helpers - Reduces ceremony of wrapping/unwrapping TextContent This addresses the typing complexity while maintaining the standardized 'output' field across all observations (vs 'message', 'content', etc.) Co-authored-by: openhands <openhands@all-hands.dev>
This reverts commit b252b15.
| 
           @OpenHands read the diff of this branch versus main. This way, for every Observation subclass where the output should be a str ( = all observation subclasses that aren't MCP)  simply do output = < some str> Update to_llm_content so it checks what type is output and returns the correct llm content Make sure all pre-commit checks and all tests are passing  | 
    
| 
           I'm on it! simonrosenberg can track my progress at all-hands.dev  | 
    
- Updated Observation.output type to str | list[TextContent | ImageContent] - Most non-MCP tools now use simple str output for efficiency - MCP tools continue to use list[TextContent | ImageContent] for rich content - Updated to_llm_content to handle both types with isinstance checks - Updated all observation subclasses (ExecuteBash, FileEditor, Browser) to use str - Updated test helper functions to handle both output types - All pre-commit checks and tests passing Co-authored-by: openhands <openhands@all-hands.dev>
          Summary of ChangesI have successfully updated the  ✅ Request Checklist
 ✅ Verification
 📦 Changes Summary11 files modified with a net change of +102 insertions, -73 deletions: 
 The changes have been committed and pushed to the   | 
    
      
        
              This comment has been minimized.
        
        
      
    
  This comment has been minimized.
- Updated delegate, grep, glob, and task_tracker tools to use str output - Simplified example 02_custom_tools.py to use assert for type narrowing - Fixed test helper assert_browser_observation_success to handle both str and list[TextContent] output types - Removed unused TextContent imports from tool implementations This completes the migration to the new Observation.output type that supports both str and list[TextContent | ImageContent]. Co-authored-by: openhands <openhands@all-hands.dev>
| 
           Looks like there are a few issues preventing this PR from being merged! 
 If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings  | 
    
| 
           @OpenHands Please fix all test failures Delegate now uses output: str in Observation  | 
    
| 
           I'm on it! simonrosenberg can track my progress at all-hands.dev  | 
    
The new Observation base class now uses output: str | list[TextContent | ImageContent] instead of just list[TextContent]. The DelegateExecutor correctly returns observations with output as a string, so the tests need to be updated to check the string directly instead of treating it as a list and accessing output[0]. Changes: - Updated test_delegate_observation_creation to create observations with string output - Updated test_delegate_executor_delegate to check output as a string - Verified to_llm_content property returns the expected TextContent list Co-authored-by: openhands <openhands@all-hands.dev>
          SummaryI've successfully fixed the test failure in PR #929 related to the standardized Observation base class changes. ProblemThe test  Changes MadeUpdated  
 Verification✅ All 5 delegation tests now pass The fix is minimal, focused, and maintains backward compatibility with the new Observation base schema.  | 
    
Summary
Motivation and Context
There were 9 different observation classes with minor variations in how they handled errors and outputs. This refactor centralizes those patterns in the base class to reduce duplication and ensure consistent agent-visible content.
What changed
Backward Compatibility
Tests
Related Issues
Checklist
Co-authored-by: openhands openhands@all-hands.dev
@simonrosenberg can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
golang:1.21-bookwormeclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22Pull (multi-arch manifest)
Run
All tags pushed for this build
The
a9185dctag is a multi-arch manifest (amd64/arm64); your client pulls the right arch automatically.