Conversation
When a server crashes during tool execution and restarts, crash recovery emits an AgentErrorEvent. However, if run() is called again, the action was being re-executed because get_unmatched_actions() only checked for ObservationEvent and UserRejectObservation. This change makes get_unmatched_actions() also check for AgentErrorEvent by tool_call_id (since AgentErrorEvent does not have an action_id field). This prevents the action from being re-executed after crash recovery. Fixes #2298 Co-authored-by: openhands <openhands@all-hands.dev>
API breakage checks (Griffe)Result: Failed Log excerpt (first 1000 characters) |
Agent server REST API breakage checks (OpenAPI)Result: Passed |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean fix that extends the existing pattern elegantly.
Code Quality: The implementation is solid. You track tool_call_id separately for AgentErrorEvent matching, which makes sense since it lacks action_id. The tests are comprehensive and validate real behavior (not just mocks).
Verdict: Code is merge-ready from a quality standpoint, but flagging for human review + eval testing due to agent behavior change.
| observed_action_ids.add(event.action_id) | ||
| elif isinstance(event, AgentErrorEvent): | ||
| # AgentErrorEvent doesn't have action_id, match by tool_call_id | ||
| observed_tool_call_ids.add(event.tool_call_id) |
There was a problem hiding this comment.
Oh, weird. I thought this was already the case, idk when it broke 😅
…OpenHands#2300) Cherry-pick from upstream 5015e72
Summary
This PR fixes a bug where duplicate observations were being created for the same
tool_call_idafter crash recovery.Fixes #2298
Problem
When a server crashes during tool execution and restarts:
ActionEventcreated (tool_call_id=X)AgentErrorEvent(tool_call_id=X)execution_status=ERRORrun()againrun()allows ERROR status to proceedagent.step()callsget_unmatched_actions()which returned the action (becauseAgentErrorEventwas not checked)agent.step()calls_execute_actions()on the "pending" actionObservationEvent(tool_call_id=X)AgentErrorEventANDObservationEventfor sametool_call_idRoot Cause
get_unmatched_actions()instate.pyonly checked forObservationEventandUserRejectObservation:It did NOT check for
AgentErrorEvent, so the action remained "unmatched" even after crash recovery emitted an error for it.Solution
This PR implements the third fix option suggested in the issue: "Make
get_unmatched_actions()also checkAgentErrorEventbytool_call_id".The fix:
AgentErrorEventto the imports instate.pytool_call_ids in a separate setAgentErrorEvent, adds itstool_call_idto the observed settool_call_idhas been observedNote:
AgentErrorEventis matched bytool_call_id(notaction_id) because it doesn't have anaction_idfield, unlikeObservationEventandUserRejectObservation.Tests
Added comprehensive tests in
tests/sdk/conversation/test_get_unmatched_actions.py:test_action_without_observation_is_unmatched- Basic casetest_action_with_observation_event_is_matched- ObservationEvent matchingtest_action_with_user_reject_observation_is_matched- UserRejectObservation matchingtest_action_with_agent_error_event_is_matched- NEW: AgentErrorEvent matchingtest_multiple_actions_with_mixed_responses- Mixed observation typestest_agent_error_event_matching_by_tool_call_id- NEW: Verifies tool_call_id matchingtest_agent_error_event_different_tool_call_id_does_not_match- NEW: Non-matching tool_call_idtest_crash_recovery_scenario_prevents_duplicate_execution- NEW: Full crash recovery scenariotest_non_executable_action_is_not_considered_unmatched- Non-executable actionsAll existing tests continue to pass.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:6db9283-pythonRun
All tags pushed for this build
About Multi-Architecture Support
6db9283-python) is a multi-arch manifest supporting both amd64 and arm646db9283-python-amd64) are also available if needed