refactor: standardize Observation base class #929

simonrosenberg · 2025-10-27T20:33:58Z

Summary

Standardized the Observation base class to unify error handling and output formatting across tools.
Introduced ObservationStatus enum and result_status property to compute success/error state.
Added standardized output: str and error: str | None fields in the base.
Implemented a default to_llm_content that returns a formatted error when present, otherwise the output.
Kept backward compatibility by having has_error use bool(self.error) to support both string and boolean-style error flags in existing subclasses.

Motivation and Context
There were 9 different observation classes with minor variations in how they handled errors and outputs. This refactor centralizes those patterns in the base class to reduce duplication and ensure consistent agent-visible content.

What changed

openhands-sdk/openhands/sdk/tool/schema.py
- Added ObservationStatus enum
- Added standardized output and error fields
- Added has_error property and result_status computation
- Added _format_error helper and default to_llm_content implementation
- Kept visualize behavior consistent with to_llm_content

Backward Compatibility

Subclasses can still define their own fields and override to_llm_content/visualize as needed
has_error uses bool(self.error) to accommodate legacy subclasses that used error: bool or is_error: bool in their own models without breaking existing behavior
No changes were required to existing tools or tests; existing custom to_llm_content implementations (e.g., FileEditor, ExecuteBash, Browser, MCP) continue to work as before and take precedence over the default

Tests

Ran pre-commit and the full test suite locally. Only unrelated e2e/environment-dependent tests failed. Schema immutability tests pass with the new base properties.

Related Issues

Closes refactor: standardize Observation class #923

Checklist

Pre-commit hooks pass for modified files
Types checked with pyright
Minimal, focused change touching only the Observation base to standardize behavior

Co-authored-by: openhands openhands@all-hands.dev

@simonrosenberg can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Base Image	Docs / Tags
golang	`golang:1.21-bookworm`	Link
java	`eclipse-temurin:17-jdk`	Link
python	`nikolaik/python-nodejs:python3.12-nodejs22`	Link

Pull (multi-arch manifest)

docker pull ghcr.io/openhands/agent-server:a9185dc-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a9185dc-python \
  ghcr.io/openhands/agent-server:a9185dc-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a9185dc-golang
ghcr.io/openhands/agent-server:v1.0.0a5_golang_tag_1.21-bookworm_binary
ghcr.io/openhands/agent-server:a9185dc-java
ghcr.io/openhands/agent-server:v1.0.0a5_eclipse-temurin_tag_17-jdk_binary
ghcr.io/openhands/agent-server:a9185dc-python
ghcr.io/openhands/agent-server:v1.0.0a5_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary

The a9185dc tag is a multi-arch manifest (amd64/arm64); your client pulls the right arch automatically.

…atus helpers - Add ObservationStatus enum and result_status property - Add standardized output and error fields with default to_llm_content - Keep subclasses backward compatible by using bool(error) in has_error Co-authored-by: openhands <openhands@all-hands.dev>

…mpat - FinishObservation: rename message->output, add message alias; rely on base to_llm_content/visualize - ThinkObservation: rename content->output, add content alias; rely on base defaults - MCPToolObservation: adopt base error semantics while preserving is_error; unify has_error and formatting - TaskTrackerObservation: rename content->output; use base to_llm_content; keep rich visualize - FileEditorObservation: standardize error formatting prefix as 'Error: ' This aligns all observation classes with the unified Observation base (output: str, error: str|None, has_error, result_status) and removes duplicated logic where possible. Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2025-10-27T21:19:51Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/mcp
definition.py	51	31	39%	47, 60–61, 63–67, 73, 78, 81, 84, 86, 98, 106–107, 109–113, 115–116, 118–124, 126
tool.py	82	35	57%	53–56, 60, 63, 66–69, 80, 136–137, 140–144, 146–148, 153, 174, 176–177, 181–183, 215–216, 221, 228–229, 231, 257
openhands-sdk/openhands/sdk/tool
schema.py	129	18	86%	24–26, 28, 37, 119, 215, 221, 225, 236, 243, 253–257, 259–260
openhands-sdk/openhands/sdk/tool/builtins
finish.py	18	5	72%	25–28, 59
think.py	24	11	54%	27, 30–31, 34, 36–40, 42, 71
openhands-tools/openhands/tools/browser_use
definition.py	122	29	76%	37–38, 41–42, 44, 48, 52–61, 63–64, 66, 114, 173, 229, 282, 338, 391, 438, 485, 540, 591
impl.py	177	118	33%	25, 39, 43–46, 48, 58–61, 66, 68–70, 72–73, 80–82, 84–88, 102, 114, 154–155, 166–167, 181, 187, 201–202, 204–213, 216–225, 227–228, 230–234, 238, 240–241, 246–247, 251–252, 257–258, 262–263, 267–268, 272, 274–275, 277–280, 283–284, 288, 290, 292, 297–298, 302–303, 307–308, 313–314, 320–324, 328–333, 337–340, 342–344, 347, 351–354
openhands-tools/openhands/tools/delegate
definition.py	20	4	80%	104, 107, 113, 116
impl.py	97	81	16%	29, 31–32, 41–42, 46, 52–53, 56–59, 61, 77–78, 83–84, 93–97, 99, 101, 107, 114–115, 117–119, 124–126, 142–143, 149–151, 159, 161–163, 165, 167–170, 173–176, 178–179, 183–186, 189–191, 196–197, 200–201, 204, 206–210, 212, 215–217, 219–220, 223, 225, 230–232
openhands-tools/openhands/tools/execute_bash
definition.py	113	71	37%	52, 55, 58–59, 61, 64–66, 68–70, 72–74, 76, 100, 108–109, 113–115, 118–119, 121–122, 124–131, 136, 139–141, 144–145, 147–148, 150, 152–154, 156, 160–161, 164–166, 168–169, 172–175, 179–181, 186, 190–192, 195–197, 201–202, 204, 283
impl.py	80	31	61%	59, 62, 70–71, 98–100, 102–103, 109, 111, 115, 131, 134, 137–138, 143–144, 147–148, 150, 157–158, 160, 167, 175, 185–186, 195–196, 198
openhands-tools/openhands/tools/execute_bash/terminal
terminal_session.py	182	63	65%	90, 96, 100–102, 124–125, 152, 167–168, 202–204, 209, 212–213, 217, 223, 237–239, 244, 247–248, 252, 258, 277, 279, 282, 284, 300, 315, 320, 328, 331, 364, 368, 371, 374–375, 381–382, 388, 393–394, 400–401, 460–462, 466, 471, 476–477, 481–482, 485–488, 494–495, 498
openhands-tools/openhands/tools/file_editor
definition.py	56	8	85%	98, 118, 121, 124, 131, 133, 135, 137
openhands-tools/openhands/tools/glob
definition.py	24	7	70%	74, 76–78, 81, 84, 91
impl.py	92	77	16%	37–40, 55–56, 58–60, 62, 65, 69–70, 77–78, 80, 83–84, 89–90, 94–95, 100, 108, 110–112, 114–116, 118, 139, 149, 154–158, 160–161, 163, 165, 181–183, 187, 190, 193, 197–198, 200–202, 205–206, 208, 210, 212, 244–245, 248, 251, 253, 256–257, 260–264, 266, 268–269, 271, 273–274, 276
openhands-tools/openhands/tools/grep
definition.py	26	7	73%	76, 78–80, 83, 86, 93
impl.py	80	67	16%	35–38, 53, 55–58, 66, 69–72, 80–81, 83, 85, 87–89, 91–93, 95, 112–113, 116, 121–123, 127–128, 132, 139, 149–150, 153, 158–162, 164–165, 167, 169, 177, 191, 204–205, 208, 213–216, 219–220, 222–224, 226–228, 230, 232, 240
openhands-tools/openhands/tools/task_tracker
definition.py	129	83	35%	52, 55–57, 59–60, 63–64, 66, 82, 84, 86–87, 90, 93–95, 97–98, 101–107, 109–111, 114, 116–119, 121, 124, 127–128, 130–131, 133–134, 136, 165, 167, 169–171, 178, 180–181, 188–189, 195, 206–207, 209–211, 215–216, 218–221, 223, 228, 234–238, 242, 246–247, 249–250, 252, 254–258
TOTAL	11454	5246	54%

- Fix remaining assertion in test_registry to use .output - Ensure subclass output fields provide defaults to satisfy pyright - Adjust ExecuteBashObservation output default Co-authored-by: openhands <openhands@all-hands.dev>

…ed Observation Co-authored-by: openhands <openhands@all-hands.dev>

… base to_llm_content; update tests Co-authored-by: openhands <openhands@all-hands.dev>

simonrosenberg · 2025-10-28T10:57:01Z

@OpenHands Let's add a small modification:
def _format_error(self) -> TextContent:
return TextContent(text=f"Tool Execution Error: {self.error}")

Also let's be more consistent with how observations sub-class the base observation

observations should not override 'to_llm_content': to_llm_content is flexible enough so that any error message or any content can be sent back to the llm
when there is an error, output must not be populated in the Observation
For delegate/definition.py, this is useless return super().to_llm_content
For delegation: this should be an error not an output output=f"Unsupported command: {action.command}",. Same for all other errors.
Remove 'command' from the observation. Simply add it in the message if needed. This PR should not modify too much what is sent to the llm "to_llm_content"

openhands-ai · 2025-10-28T10:57:10Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

- Update _format_error to use 'Tool Execution Error:' prefix - Remove command field from DelegateObservation and TaskTrackerObservation - Remove unnecessary to_llm_content override in DelegateObservation - Update delegate/impl.py to use error field for errors (not output) - Prefix success messages with command name (spawn:/delegate:/plan:/view:) - Update TaskTrackerExecutor to handle errors via error field - Update tests to match new observation structure - Ensure output is not populated when error is present These changes make observations more consistent by: 1. Eliminating the redundant command field 2. Ensuring errors and outputs are mutually exclusive 3. Relying on base class to_llm_content when possible 4. Keeping command context in the message text when needed Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2025-10-28T11:09:04Z

Summary

I've successfully improved observation consistency across the codebase as requested. The changes have been committed and pushed to PR #929 (openhands/standardize-observation-base).

Changes Made

✅ All requested modifications completed:

Updated error format in _format_error():
- Changed from "Error: {self.error}" to "Tool Execution Error: {self.error}"
Removed unnecessary to_llm_content overrides:
- Removed useless super().to_llm_content call from DelegateObservation
- Left MCPToolObservation's override intact (necessary for structured content)
Ensured errors and outputs are mutually exclusive:
- When error is set, output is not populated (and vice versa)
- All error cases now use the error field
- All success cases now use the output field
Fixed delegation error handling:
- Moved all error messages from output to error field
- Unsupported commands, spawn errors, and delegate errors all use error field
Removed command field from observations:
- Removed from DelegateObservation
- Removed from TaskTrackerObservation
- Added command context as message prefix (e.g., "spawn:", "delegate:", "plan:", "view:")
Minimized LLM content changes:
- Only removed the unnecessary override
- Base class to_llm_content handles most cases
- Custom implementations remain where needed (structured content)

Files Modified

openhands-sdk/openhands/sdk/tool/schema.py - Updated error format
openhands-tools/openhands/tools/delegate/definition.py - Removed command field and unnecessary override
openhands-tools/openhands/tools/delegate/impl.py - Fixed error handling, added command prefixes
openhands-tools/openhands/tools/task_tracker/definition.py - Removed command field, updated executor
tests/tools/delegation/test_delegation.py - Updated tests to match new structure

Test Results

✅ All 77 SDK tool tests pass
✅ All 189+ tool tests pass
✅ All pre-commit checks pass (format, lint, type check)
✅ Net reduction of 21 lines of code

The code is now more consistent, with clear separation between errors and outputs, and observations that follow a standardized pattern.

_{View full conversation}

- Remove custom to_llm_content override from MCPToolObservation - Rename convrted_content to converted_content (typo fix) - Populate error field with full content when MCP result has error - Populate output field with formatted content for successful results - Update tests to match new behavior (single TextContent vs list) - Error messages now include: header + error notice + content - Success messages now include: header + content This makes MCPToolObservation consistent with other observations by relying on the base class to_llm_content and properly separating error vs output fields. Co-authored-by: openhands <openhands@all-hands.dev>

- Remove output field from multiple commands error case - Move full error message to error field instead - Update test to check error field instead of output field - Fix line length by extracting commands_list variable This makes ExecuteBashObservation consistent with the pattern that errors should use the error field, not the output field. Co-authored-by: openhands <openhands@all-hands.dev>

The FileEditorObservation.to_llm_content was simply returning error or output, which is exactly what the base Observation class now provides by default. Removing this override allows the observation to use the standardized base implementation. Co-authored-by: openhands <openhands@all-hands.dev>

…interface - Restored 'command' field to TaskTrackerObservation - Changed 'output' back to 'content' field - Restored original visualize method that uses command field to determine styling - Added custom to_llm_content implementation - Updated executor to populate command field in observations This preserves the original interface while maintaining consistency with the base observation standardization changes. Co-authored-by: openhands <openhands@all-hands.dev>

…o_llm_content - Added command: str | None field to base Observation class - Updated to_llm_content to prepend 'Command: {command}' when command is present - Removed to_llm_content overrides from simple observations (Glob, Grep, TaskTracker) - Updated executors to populate output field instead of defining custom to_llm_content - Kept complex to_llm_content overrides (Browser, ExecuteBash) that handle images and metadata Co-authored-by: openhands <openhands@all-hands.dev>

- Remove 'output', 'error', 'command' field redeclarations from observation subclasses where possible - Update MCPToolObservation to use output field instead of _content field - Store text content in output, images in separate images field - Update MCP tests to match new structure - Keep FileEditorObservation.command field (required for internal logic) - When error occurs, populate only error field, not output Co-authored-by: openhands <openhands@all-hands.dev>

- Update ThinkExecutor to set output='Your thought has been logged.' - Update FinishExecutor to set output=action.message - Fix test_view_batch_atomicity.py to use output instead of content - Fix test_mcp_tool_kind_field.py import issues - Add Sequence import to glob and grep definitions These changes ensure observations consistently use the base output field rather than custom fields, aligning with the standardization effort. Co-authored-by: openhands <openhands@all-hands.dev>

- Changed base Observation._format_error to use 'Tool Execution Error:' prefix - Ensured observations don't populate output when there is an error - Removed command parameter from ExecuteBashObservation instantiations - Updated ExecuteBashObservation to override output field as str - Updated ExecuteBashObservation.to_llm_content to return error without output - Fixed terminal_session.py to only populate error field in error cases - Removed 'ERROR:' prefix from error messages (base class adds prefix) - Updated test_bash_session.py to check error field instead of output - Updated all observation tests to work with new error handling Co-authored-by: openhands <openhands@all-hands.dev>

- Removed custom content field, use base class output field instead - Converted content items are now stored directly in output field - Error messages include all content combined into error string - Removed unnecessary complexity from from_call_tool_result - Updated tests to match new structure (header + content items) Co-authored-by: openhands <openhands@all-hands.dev>

simonrosenberg · 2025-10-31T08:27:19Z

@OpenHands
I updated the schema of the tool Observation: openhands-sdk/openhands/sdk/tool/schema.py
I added a property ‘command: str’. This property can be narrowed in some subtypes to Literal[ …]  I also updated the to_llm_content to serialize command, error and output. Your job is to edit all subclasses of Observation and make sure all subclasses adhere to that base behavior.
Some subclasses had a “command” field on the main branch. Make sure to add it back as a command property. It’s now a property because we need type narrowing: some classes had command: str while others had command: Literal[…]

MCP
openhands-sdk/openhands/sdk/mcp/definition.py
openhands-sdk/openhands/sdk/mcp/tool.py
No need to do initial_message += "\n[An error occurred during execution.]"
Because format error already adds a error msg header

For openhands-sdk/openhands/sdk/tool/builtins/finish.py
And openhands-sdk/openhands/sdk/tool/builtins/think.py
Do NOT populate any field in FinishObservation. The output must be an empty list.

For openhands-tools/openhands/tools/delegate/definition.py
Add command: Literal[“spawn”, “delegate”]

openhands-tools/openhands/tools/delegate/impl.py
When creating DelegateObservation, put back the command=action.command

For openhands-tools/openhands/tools/execute_bash/terminal/terminal_session.py
Reinsert command everywhere in all ExecuteBashObservation
return ExecuteBashObservation(
output=command_output,
metadata=metadata,
)

For file Editor openhands-tools/openhands/tools/file_editor/definition.py
Add command as Literal and make sure implementation works

For task tracker
openhands-tools/openhands/tools/task_tracker/definition.py
Make sure command works well

Make sure all tests and pre-commit hooks are passing

openhands-ai · 2025-10-31T08:27:28Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

…fields - Updated all Observation subclasses to use standardized output and error fields - Added command property to all observations with type narrowing where needed - MCP observations: removed redundant to_llm_content overrides, removed error header duplication - FinishObservation/ThinkObservation: empty output list as specified - DelegateObservation: added cmd field and command property with Literal type - ExecuteBashObservation: added cmd field and command property, use raw_output for command output - FileEditorObservation: added cmd field with Literal type narrowing - TaskTrackerObservation: added cmd field with Literal type narrowing - BrowserObservation: wrapped output strings in TextContent - GrepObservation/GlobObservation: wrapped output strings in TextContent - Updated all tests to use new field names (cmd, raw_output) and TextContent wrapping Co-authored-by: openhands <openhands@all-hands.dev>

simonrosenberg · 2025-10-31T14:24:46Z

@OpenHands there are failing actions in this PR. Please fix them
Errors

=========================== short test summary info ============================
Exception ignored in atexit callback: <bound method LocalConversation.close of <openhands.sdk.conversation.impl.local_conversation.LocalConversation object at 0x7f8f813b9460>>
FAILED tests/cross/test_agent_secrets_integration.py::test_mask_secrets - AttributeError: 'ExecuteBashObservation' object has no attribute 'raw_output'
FAILED tests/cross/test_agent_secrets_integration.py::test_mask_changing_secrets - AttributeError: 'ExecuteBashObservation' object has no attribute 'raw_output'
FAILED tests/cross/test_agent_secrets_integration.py::test_masking_persists - AttributeError: 'ExecuteBashObservation' object has no attribute 'raw_output'
FAILED tests/cross/test_stuck_detector.py::test_history_too_short - pydantic_core._pydantic_core.ValidationError: 1 validation error for ExecuteBashObservation
raw_output
Extra inputs are not permitted [type=extra_forbidden, input_value='file1.txt\nfile2.txt', input_type=str]
For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
FAILED tests/cross/test_stuck_detector.py::test_repeating_action_observation_not_stuck_less_than_4_repeats - pydantic_core._pydantic_core.ValidationError: 1 validation error for ExecuteBashObservation
raw_output
Extra inputs are not permitted [type=extra_forbidden, input_value='file1.txt\nfile2.txt', input_type=str]
For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
FAILED tests/cross/test_stuck_detector.py::test_repeating_action_observation_stuck - pydantic_core._pydantic_core.ValidationError: 1 validation error for ExecuteBashObservation
raw_output
Extra inputs are not permitted [type=extra_forbidden, input_value='file1.txt\nfile2.txt', input_type=str]
For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
FAILED tests/cross/test_stuck_detector.py::test_not_stuck_with_different_actions - pydantic_core._pydantic_core.ValidationError: 1 validation error for ExecuteBashObservation
raw_output
Extra inputs are not permitted [type=extra_forbidden, input_value='output from ls', input_type=str]
For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
FAILED tests/cross/test_stuck_detector.py::test_reset_after_user_message - pydantic_core._pydantic_core.ValidationError: 1 validation error for ExecuteBashObservation
raw_output
Extra inputs are not permitted [type=extra_forbidden, input_value='file1.txt\nfile2.txt', input_type=str]
For further information visit https://errors.pydantic.dev/2.12/v/extra_forbidden
================== 8 failed, 24 passed, 12 warnings in 41.15s ==================
Traceback (most recent call last):
File "/home/runner/_work/software-agent-sdk/software-agent-sdk/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py", line 371, in close
for tool in self.agent.tools_map.values():
^^^^^^^^^^^^^^^^^^^^
File "/home/runner/_work/software-agent-sdk/software-agent-sdk/openhands-sdk/openhands/sdk/agent/base.py", line 414, in tools_map
raise RuntimeError("Agent not initialized; call initialize() before use")

0 errors, 0 warnings, 0 informations

0 errors, 0 warnings, 0 informations
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py:237:45 - error: Cannot access attribute "raw_output" for class "ExecuteBashObservation"
Attribute "raw_output" is unknown (reportAttributeAccessIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py:238:44 - error: Cannot access attribute "raw_output" for class "ExecuteBashObservation"
Attribute "raw_output" is unknown (reportAttributeAccessIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py:242:47 - error: Cannot access attribute "raw_output" for class "ExecuteBashObservation"
Attribute "raw_output" is unknown (reportAttributeAccessIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py:243:44 - error: Cannot access attribute "raw_output" for class "ExecuteBashObservation"
Attribute "raw_output" is unknown (reportAttributeAccessIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py:268:48 - error: Cannot access attribute "raw_output" for class "ExecuteBashObservation"
Attribute "raw_output" is unknown (reportAttributeAccessIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py:269:44 - error: Cannot access attribute "raw_output" for class "ExecuteBashObservation"
Attribute "raw_output" is unknown (reportAttributeAccessIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py:273:48 - error: Cannot access attribute "raw_output" for class "ExecuteBashObservation"
Attribute "raw_output" is unknown (reportAttributeAccessIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py:274:44 - error: Cannot access attribute "raw_output" for class "ExecuteBashObservation"
Attribute "raw_output" is unknown (reportAttributeAccessIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py:306:48 - error: Cannot access attribute "raw_output" for class "ExecuteBashObservation"
Attribute "raw_output" is unknown (reportAttributeAccessIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py:307:44 - error: Cannot access attribute "raw_output" for class "ExecuteBashObservation"
Attribute "raw_output" is unknown (reportAttributeAccessIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py:311:48 - error: Cannot access attribute "raw_output" for class "ExecuteBashObservation"
Attribute "raw_output" is unknown (reportAttributeAccessIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_agent_secrets_integration.py:312:44 - error: Cannot access attribute "raw_output" for class "ExecuteBashObservation"
Attribute "raw_output" is unknown (reportAttributeAccessIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_stuck_detector.py
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_stuck_detector.py:62:13 - error: No parameter named "raw_output" (reportCallIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_stuck_detector.py:111:17 - error: No parameter named "raw_output" (reportCallIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_stuck_detector.py:160:17 - error: No parameter named "raw_output" (reportCallIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_stuck_detector.py:301:17 - error: No parameter named "raw_output" (reportCallIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_stuck_detector.py:350:17 - error: No parameter named "raw_output" (reportCallIssue)
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/cross/test_stuck_detector.py:393:13 - error: No parameter named "raw_output" (reportCallIssue)

openhands-ai · 2025-10-31T14:24:58Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

- Added raw_output property to ExecuteBashObservation for backward compatibility - Property extracts text from first TextContent item in output field - Enables tests and code to access raw output text seamlessly - Updated test files to use correct observation field names - Changed test_stuck_detector.py to use output=[TextContent(text=...)] instead of raw_output= - Tests can still access raw_output via the property for assertions - All previously failing tests now pass: - test_mask_secrets - test_mask_changing_secrets - test_masking_persists - test_history_too_short - test_repeating_action_observation_not_stuck_less_than_4_repeats - test_repeating_action_observation_stuck - test_not_stuck_with_different_actions - test_reset_after_user_message Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2025-10-31T14:32:59Z

Summary

I've successfully fixed all 8 failing tests in PR #929. The changes have been committed and pushed to the openhands/standardize-observation-base branch.

Problem Identified

The PR standardized the Observation base class to use:

output: list[TextContent | ImageContent] instead of a raw string
error: str | None instead of error: bool

However, tests were still using the old raw_output field, which caused:

AttributeError in test_agent_secrets_integration.py - trying to access .raw_output property that didn't exist
ValidationError in test_stuck_detector.py - trying to pass raw_output= as a constructor parameter (rejected by Pydantic's extra="forbid")

Solution Implemented

1. Added raw_output property to ExecuteBashObservation (openhands-tools/openhands/tools/execute_bash/definition.py)

Property extracts text from the first TextContent item in the output list
Provides backward compatibility for code accessing .raw_output
Returns empty string if output is empty or not TextContent

2. Updated test instantiations (tests/cross/test_stuck_detector.py)

Changed from raw_output="..." to output=[TextContent(text="...")]
Tests can still access .raw_output via the property for assertions
6 test functions updated with correct constructor parameters

Verification

✅ All 8 previously failing tests now pass:

test_mask_secrets
test_mask_changing_secrets
test_masking_persists
test_history_too_short
test_repeating_action_observation_not_stuck_less_than_4_repeats
test_repeating_action_observation_stuck
test_not_stuck_with_different_actions
test_reset_after_user_message

✅ All 16 tests in affected files pass

✅ Pre-commit checks pass (ruff format, ruff lint, pycodestyle, pyright)

The changes are minimal, focused, and maintain backward compatibility while adhering to the new standardized Observation base class pattern.

_{View full conversation}

openhands-sdk/openhands/sdk/tool/schema.py

simonrosenberg · 2025-10-31T14:42:38Z

@enyst I dont know how I feel about typing output: List[TextContent | ImageContent].

it adds complexity because almost all "real" outputs are just output: str
but the most general case is for MCPTool where output can be a list of text content, image content.
Perhaps the internals of different Observation subclasses should be whatever makes most sense for them as long as to_llm_content -> List[TextContent | ImageContent]
But then again I really dont like the current chaos in the current internal output:

output: str                           # ExecuteBash, FileEditor, Browser
content: str                          # TaskTracker, Think
message: str                          # Finish
content: list[TextContent | ImageContent]  # MCPTool

- Removed command property and to_llm_content serialization from base Observation - Renamed cmd field to command in all Observation subclasses: * ExecuteBashObservation * FileEditorObservation * DelegateObservation * TaskTrackerObservation - Updated all executor implementations to use command instead of cmd - Updated test files to use command field consistently - Fixed subprocess.TimeoutExpired to use correct parameter name (cmd) Co-authored-by: openhands <openhands@all-hands.dev>

- Updated PlanningFileEditorObservation initialization to use command field - Aligns with FileEditorObservation parent class field naming Co-authored-by: openhands <openhands@all-hands.dev>

- Fixed model_copy update to use 'command' field instead of 'cmd' - Ensures [RESET] prefix is properly included in command field Co-authored-by: openhands <openhands@all-hands.dev>

simonrosenberg · 2025-11-01T09:17:52Z

@OpenHands /codereview-roasted
Read the diff and think hard about how to factorize all observations into a base observation class tool/schema.py
A current issue is the typing of output which is too complicated for 8/9 use cases.

openhands-ai · 2025-11-01T09:18:01Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

openhands-ai · 2025-11-01T09:21:45Z

Summary

I've completed a critical code review of PR #929 focusing on the Observation base class refactoring, specifically addressing the typing complexity of the output field as requested.

Key Findings

🔴 Critical Issues Identified:

Wrong Data Structure Abstraction - The output: list[TextContent | ImageContent] field is overcomplicated for 8-9 out of 11 observation types:
- Only Browser and MCP tools actually need ImageContent capability
- The other 9 tools (Glob, Grep, ExecuteBash, FileEditor, TaskTracker, Delegate, Finish, Think, PlanningFileEditor) just work with plain text
- This forces unnecessary ceremony: wrap text → store as list → unwrap text

Forced Complexity - Current pattern requires:

# Writing: wrap, wrap again
output=[TextContent(text=my_string)]

# Reading: unwrap, type check, extract
first_item = self.output[0] if self.output else None
output_text = first_item.text if isinstance(first_item, TextContent) else ""

Abstraction Doesn't Eliminate Special Cases - Most observations still override to_llm_content anyway, making the base implementation largely unused.

📋 Recommended Solution:

Remove the output field entirely and follow these principles:

class Observation(Schema, ABC):
    """Base schema for output observation."""
    
    # ✅ Keep: Standardized error handling
    error: str | None = Field(default=None, ...)
    
    @property
    def has_error(self) -> bool:
        return bool(self.error)
    
    @property
    def result_status(self) -> ObservationStatus:
        return ObservationStatus.ERROR if self.has_error else ObservationStatus.SUCCESS
    
    # ✅ Keep: Interface contract
    @property
    @abstractmethod
    def to_llm_content(self) -> Sequence[TextContent | ImageContent]:
        """Subclasses implement their own formatting."""

Let each observation use natural data structures:

GlobObservation: files: list[str] (formats to text in to_llm_content)
ExecuteBash: exit_code: int, command: str (formats with metadata)
Browser/MCP: Build rich content directly in their to_llm_content

🎯 Core Insight:

The refactor standardizes the wrong thing:

❌ Don't standardize: Data storage format (output field)
✅ Do standardize: Interface contract (to_llm_content) and error handling

"Bad programmers worry about the code. Good programmers worry about data structures." The current list[TextContent | ImageContent] doesn't naturally represent most tools' output—only 2 out of 11 need it. The right abstraction should make the 9 simple cases trivial and the 2 complex cases possible, not make all 11 complex.

No code changes were made - this was a code review task as requested with the /codereview-roasted command. The detailed analysis provides actionable feedback for refactoring the observation base class to eliminate unnecessary type complexity.

_{View full conversation}

- Added Observation.text_output() static method for easy text wrapping - Added output_as_text property for easy text extraction - Updated ExecuteBash and Glob to use new helpers - Reduces ceremony of wrapping/unwrapping TextContent This addresses the typing complexity while maintaining the standardized 'output' field across all observations (vs 'message', 'content', etc.) Co-authored-by: openhands <openhands@all-hands.dev>

This reverts commit b252b15.

simonrosenberg · 2025-11-02T14:00:22Z

@OpenHands read the diff of this branch versus main.
Let's update the type in tools/schema.py Observation.output:
Current:
output: list[TextContent | ImageContent] = Field(
default_factory=list,
...
New type:
output: str | list[TextContent | ImageContent] = Field(.....)

This way, for every Observation subclass where the output should be a str ( = all observation subclasses that aren't MCP) simply do output = < some str>
For MCP where the output is a list[TextContent | ImageContent], assign a list of events to output (= current code)

Update to_llm_content so it checks what type is output and returns the correct llm content

Make sure all pre-commit checks and all tests are passing

openhands-ai · 2025-11-02T14:00:31Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

- Updated Observation.output type to str | list[TextContent | ImageContent] - Most non-MCP tools now use simple str output for efficiency - MCP tools continue to use list[TextContent | ImageContent] for rich content - Updated to_llm_content to handle both types with isinstance checks - Updated all observation subclasses (ExecuteBash, FileEditor, Browser) to use str - Updated test helper functions to handle both output types - All pre-commit checks and tests passing Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2025-11-02T14:16:30Z

Summary of Changes

I have successfully updated the Observation.output type in the software-agent-sdk repository as requested. Here's what was accomplished:

✅ Request Checklist

Updated Observation.output type in tools/schema.py:
- Changed from: output: list[TextContent | ImageContent] = Field(default_factory=list, ...)
- To: output: str | list[TextContent | ImageContent] = Field(default="", ...)
Updated to_llm_content to handle both types:
- Added isinstance(self.output, str) check
- For str: converts to TextContent(text=self.output)
- For list: extends with the existing list of content
Updated all non-MCP Observation subclasses to use str output:
- FileEditor: Changed 6 occurrences from output=[TextContent(text=...)] to output=...
- ExecuteBash: Updated all output creation and property methods (raw_output, to_llm_content, visualize) to handle both types
- Browser: Updated output creation to use simple strings
MCP tools continue to use rich content:
- Verified that MCPToolObservation still correctly uses output=content_with_header (list type)
- No changes needed for MCP - works as expected
Updated test helper functions to handle both output types in:
- tests/tools/file_editor/conftest.py
- tests/tools/browser_use/test_browser_executor_e2e.py
- tests/tools/execute_bash/conftest.py

✅ Verification

Pre-commit checks: All passing ✓
File Editor tests: 135 tests passed ✓
Execute Bash tests: All tested passed ✓
Browser tests: 15 tests passed ✓
SDK tests: 1039 tests passed ✓
MCP tests: 47 tests passed ✓

📦 Changes Summary

11 files modified with a net change of +102 insertions, -73 deletions:

Core schema update in openhands-sdk/openhands/sdk/tool/schema.py
All three main tools updated (file_editor, execute_bash, browser_use)
Test helpers updated to support both output types
Example code updated to handle the new type

The changes have been committed and pushed to the openhands/standardize-observation-base branch, updating PR #929. The implementation successfully provides flexibility for tools to use simple strings (more efficient) while maintaining backward compatibility for MCP tools that need rich content with images and structured text.

_{View full conversation}

- Updated delegate, grep, glob, and task_tracker tools to use str output - Simplified example 02_custom_tools.py to use assert for type narrowing - Fixed test helper assert_browser_observation_success to handle both str and list[TextContent] output types - Removed unused TextContent imports from tool implementations This completes the migration to the new Observation.output type that supports both str and list[TextContent | ImageContent]. Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2025-11-02T14:55:40Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #929 at branch `openhands/standardize-observation-base`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

simonrosenberg · 2025-11-03T07:47:18Z

@OpenHands Please fix all test failures
https://github.com/OpenHands/software-agent-sdk/actions/runs/19013799998/job/54298654120?pr=929

Delegate now uses output: str in Observation
=================================== FAILURES ===================================
_______________________ test_delegate_executor_delegate ________________________
tests/tools/delegation/test_delegation.py:106: in test_delegate_executor_delegate
assert isinstance(output_block, TextContent)
E AssertionError: assert False
E + where False = isinstance('S', TextContent)
=============================== warnings summary ===============================
tests/tools/browser_use/test_browser_executor_e2e.py:170
/home/runner/_work/software-agent-sdk/software-agent-sdk/tests/tools/browser_use/test_browser_executor_e2e.py:170: PytestUnknownMarkWarning: Unknown pytest.mark.e2e - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
@pytest.mark.e2e

openhands-ai · 2025-11-03T07:47:28Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

The new Observation base class now uses output: str | list[TextContent | ImageContent] instead of just list[TextContent]. The DelegateExecutor correctly returns observations with output as a string, so the tests need to be updated to check the string directly instead of treating it as a list and accessing output[0]. Changes: - Updated test_delegate_observation_creation to create observations with string output - Updated test_delegate_executor_delegate to check output as a string - Verified to_llm_content property returns the expected TextContent list Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2025-11-03T08:02:32Z

Summary

I've successfully fixed the test failure in PR #929 related to the standardized Observation base class changes.

Problem

The test test_delegate_executor_delegate was failing because it expected DelegateObservation.output to be a list of TextContent objects (the old behavior), but the new standardized Observation base class defines output: str | list[TextContent | ImageContent]. The DelegateExecutor implementation correctly returns observations with output as a string, but the tests were still treating it as a list and accessing output[0], which resulted in getting just the first character 'S' instead of the expected content.

Changes Made

Updated tests/tools/delegation/test_delegation.py:

test_delegate_observation_creation: Changed to create observations with string output and verify both the string output and the to_llm_content property
test_delegate_executor_delegate: Updated to check output as a string directly instead of treating it as a list

Verification

✅ All 5 delegation tests now pass
✅ Pre-commit hooks pass (ruff format, ruff lint, pycodestyle, pyright)
✅ Changes committed and pushed to branch openhands/standardize-observation-base

The fix is minimal, focused, and maintains backward compatibility with the new Observation base schema.

_{View full conversation}

openhands-ai bot mentioned this pull request Oct 27, 2025

refactor: standardize Observation class #923

Open

openhands-agent and others added 4 commits October 27, 2025 22:58

test: align tests with standardized Observation fields

a349a7b

- Fix remaining assertion in test_registry to use .output - Ensure subclass output fields provide defaults to satisfy pyright - Adjust ExecuteBashObservation output default Co-authored-by: openhands <openhands@all-hands.dev>

test(execute_bash): update assertions to use has_error per standardiz…

3ff4cb9

…ed Observation Co-authored-by: openhands <openhands@all-hands.dev>

refactor(delegate): standardize DelegateObservation to use output and…

4c1d809

… base to_llm_content; update tests Co-authored-by: openhands <openhands@all-hands.dev>

Merge branch 'main' into openhands/standardize-observation-base

7ce7a9b

openhands-agent and others added 9 commits October 28, 2025 11:20

update

b52ce10

This comment has been minimized.

Sign in to view

openhands-agent and others added 3 commits October 28, 2025 14:36

Merge branch 'main' into openhands/standardize-observation-base

c2cb27b

Merge branch 'main' into openhands/standardize-observation-base

adc5da0

This comment has been minimized.

Sign in to view

update tool base schema

09cee6b

This comment has been minimized.

Sign in to view

enyst reviewed Oct 31, 2025

View reviewed changes

openhands-sdk/openhands/sdk/tool/schema.py Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

openhands-agent and others added 3 commits October 31, 2025 16:10

fix: update planning_file_editor to use command instead of cmd

6470d0a

- Updated PlanningFileEditorObservation initialization to use command field - Aligns with FileEditorObservation parent class field naming Co-authored-by: openhands <openhands@all-hands.dev>

fix: use command instead of cmd in bash reset with command

2d12008

- Fixed model_copy update to use 'command' field instead of 'cmd' - Ensures [RESET] prefix is properly included in command field Co-authored-by: openhands <openhands@all-hands.dev>

Merge branch 'main' into openhands/standardize-observation-base

0b4b706

openhands-agent and others added 2 commits November 1, 2025 09:30

Revert "Add ergonomic helpers for standardized 'output' field"

21d2d56

This reverts commit b252b15.

This comment has been minimized.

Sign in to view

Merge branch 'main' into openhands/standardize-observation-base

d2dba7a

simonrosenberg requested a review from enyst November 3, 2025 08:32

refactor: standardize Observation base class #929

Are you sure you want to change the base?

refactor: standardize Observation base class #929

Uh oh!

Conversation

simonrosenberg commented Oct 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonrosenberg commented Oct 28, 2025

Uh oh!

openhands-ai bot commented Oct 28, 2025

Uh oh!

openhands-ai bot commented Oct 28, 2025

Summary

Changes Made

Files Modified

Test Results

Uh oh!

This comment has been minimized.

This comment has been minimized.

simonrosenberg commented Oct 31, 2025

Uh oh!

openhands-ai bot commented Oct 31, 2025

Uh oh!

simonrosenberg commented Oct 31, 2025

Uh oh!

openhands-ai bot commented Oct 31, 2025

Uh oh!

This comment has been minimized.

openhands-ai bot commented Oct 31, 2025

Summary

Problem Identified

Solution Implemented

Verification

Uh oh!

Uh oh!

simonrosenberg commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

simonrosenberg commented Nov 1, 2025

Uh oh!

openhands-ai bot commented Nov 1, 2025

Uh oh!

openhands-ai bot commented Nov 1, 2025

Summary

Key Findings

🔴 Critical Issues Identified:

📋 Recommended Solution:

🎯 Core Insight:

Uh oh!

simonrosenberg commented Nov 2, 2025

Uh oh!

openhands-ai bot commented Nov 2, 2025

Uh oh!

openhands-ai bot commented Nov 2, 2025

Summary of Changes

✅ Request Checklist

✅ Verification

📦 Changes Summary

Uh oh!

This comment has been minimized.

openhands-ai bot commented Nov 2, 2025

Uh oh!

simonrosenberg commented Nov 3, 2025

Uh oh!

openhands-ai bot commented Nov 3, 2025

Uh oh!

openhands-ai bot commented Nov 3, 2025

Summary

Problem

Changes Made

Verification

Uh oh!

Reviewers

Assignees

Labels

simonrosenberg commented Oct 27, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Oct 27, 2025 •

edited

Loading

simonrosenberg commented Oct 31, 2025 •

edited

Loading