Fix duplicate observations for same tool_call_id after crash recovery by csmith49 · Pull Request #2300 · OpenHands/software-agent-sdk

csmith49 · 2026-03-04T19:54:37Z

Summary

This PR fixes a bug where duplicate observations were being created for the same tool_call_id after crash recovery.

Fixes #2298

Problem

When a server crashes during tool execution and restarts:

ActionEvent created (tool_call_id=X)
Server crashes during tool execution
On restart, crash recovery emits AgentErrorEvent (tool_call_id=X)
Crash recovery sets execution_status=ERROR
User calls run() again
run() allows ERROR status to proceed
agent.step() calls get_unmatched_actions() which returned the action (because AgentErrorEvent was not checked)
agent.step() calls _execute_actions() on the "pending" action
Tool executes and emits ObservationEvent (tool_call_id=X)
Result: BOTH AgentErrorEvent AND ObservationEvent for same tool_call_id

Root Cause

get_unmatched_actions() in state.py only checked for ObservationEvent and UserRejectObservation:

if isinstance(event, (ObservationEvent, UserRejectObservation)):
    observed_action_ids.add(event.action_id)

It did NOT check for AgentErrorEvent, so the action remained "unmatched" even after crash recovery emitted an error for it.

Solution

This PR implements the third fix option suggested in the issue: "Make get_unmatched_actions() also check AgentErrorEvent by tool_call_id".

The fix:

Adds AgentErrorEvent to the imports in state.py
Tracks observed tool_call_ids in a separate set
When encountering AgentErrorEvent, adds its tool_call_id to the observed set
When checking if an action is unmatched, also checks if its tool_call_id has been observed

Note: AgentErrorEvent is matched by tool_call_id (not action_id) because it doesn't have an action_id field, unlike ObservationEvent and UserRejectObservation.

Tests

Added comprehensive tests in tests/sdk/conversation/test_get_unmatched_actions.py:

test_action_without_observation_is_unmatched - Basic case
test_action_with_observation_event_is_matched - ObservationEvent matching
test_action_with_user_reject_observation_is_matched - UserRejectObservation matching
test_action_with_agent_error_event_is_matched - NEW: AgentErrorEvent matching
test_multiple_actions_with_mixed_responses - Mixed observation types
test_agent_error_event_matching_by_tool_call_id - NEW: Verifies tool_call_id matching
test_agent_error_event_different_tool_call_id_does_not_match - NEW: Non-matching tool_call_id
test_crash_recovery_scenario_prevents_duplicate_execution - NEW: Full crash recovery scenario
test_non_executable_action_is_not_considered_unmatched - Non-executable actions

All existing tests continue to pass.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:6db9283-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-6db9283-python \
  ghcr.io/openhands/agent-server:6db9283-python

All tags pushed for this build

ghcr.io/openhands/agent-server:6db9283-golang-amd64
ghcr.io/openhands/agent-server:6db9283-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:6db9283-golang-arm64
ghcr.io/openhands/agent-server:6db9283-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:6db9283-java-amd64
ghcr.io/openhands/agent-server:6db9283-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:6db9283-java-arm64
ghcr.io/openhands/agent-server:6db9283-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:6db9283-python-amd64
ghcr.io/openhands/agent-server:6db9283-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:6db9283-python-arm64
ghcr.io/openhands/agent-server:6db9283-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:6db9283-golang
ghcr.io/openhands/agent-server:6db9283-java
ghcr.io/openhands/agent-server:6db9283-python

About Multi-Architecture Support

Each variant tag (e.g., 6db9283-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 6db9283-python-amd64) are also available if needed

When a server crashes during tool execution and restarts, crash recovery emits an AgentErrorEvent. However, if run() is called again, the action was being re-executed because get_unmatched_actions() only checked for ObservationEvent and UserRejectObservation. This change makes get_unmatched_actions() also check for AgentErrorEvent by tool_call_id (since AgentErrorEvent does not have an action_id field). This prevents the action from being re-executed after crash recovery. Fixes #2298 Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-03-04T19:55:12Z

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)


============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=196,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

github-actions · 2026-03-04T19:55:22Z

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

github-actions · 2026-03-04T19:57:55Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/conversation
state.py	185	8	95%	191, 195, 206, 345, 391–393, 520
TOTAL	19285	5709	70%

…rash-recovery

all-hands-bot

🟢 Good taste - Clean fix that extends the existing pattern elegantly.

Code Quality: The implementation is solid. You track tool_call_id separately for AgentErrorEvent matching, which makes sense since it lacks action_id. The tests are comprehensive and validate real behavior (not just mocks).

⚠️ Eval-Risk Flag: This PR changes crash recovery behavior - specifically how agents handle actions after server restarts. Per repo guidelines, this should be tested with lightweight evals before merging to ensure no benchmark/evaluation regressions.

Verdict: Code is merge-ready from a quality standpoint, but flagging for human review + eval testing due to agent behavior change.

enyst · 2026-03-04T20:48:34Z

openhands-sdk/openhands/sdk/conversation/state.py

                observed_action_ids.add(event.action_id)
+            elif isinstance(event, AgentErrorEvent):
+                # AgentErrorEvent doesn't have action_id, match by tool_call_id
+                observed_tool_call_ids.add(event.tool_call_id)


Oh, weird. I thought this was already the case, idk when it broke 😅

enyst

LGTM, thank you!

…OpenHands#2300) Cherry-pick from upstream 5015e72

openhands-ai bot mentioned this pull request Mar 4, 2026

Bug: Duplicate observations for same tool_call_id after crash recovery #2298

Closed

csmith49 marked this pull request as ready for review March 4, 2026 20:37

Merge branch 'main' into openhands/fix-duplicate-observations-after-c…

5d09c42

…rash-recovery

all-hands-bot reviewed Mar 4, 2026

View reviewed changes

enyst reviewed Mar 4, 2026

View reviewed changes

enyst approved these changes Mar 4, 2026

View reviewed changes

csmith49 merged commit 5015e72 into main Mar 4, 2026
32 checks passed

csmith49 deleted the openhands/fix-duplicate-observations-after-crash-recovery branch March 4, 2026 21:22

csmith49 mentioned this pull request Mar 4, 2026

Fix duplicate tool_result error from Anthropic API #2256

Closed

5 tasks

zparnold added a commit to zparnold/software-agent-sdk that referenced this pull request Mar 5, 2026

Fix duplicate observations for same tool_call_id after crash recovery (…

3c5396f

…OpenHands#2300) Cherry-pick from upstream 5015e72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicate observations for same tool_call_id after crash recovery#2300

Fix duplicate observations for same tool_call_id after crash recovery#2300
csmith49 merged 2 commits intomainfrom
openhands/fix-duplicate-observations-after-crash-recovery

csmith49 commented Mar 4, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

enyst Mar 4, 2026

Uh oh!

enyst left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

csmith49 commented Mar 4, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Root Cause

Solution

Tests

Uh oh!

github-actions bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

github-actions bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent server REST API breakage checks (OpenAPI)

Uh oh!

github-actions bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

enyst Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

csmith49 commented Mar 4, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Mar 4, 2026 •

edited

Loading

github-actions bot commented Mar 4, 2026 •

edited

Loading

github-actions bot commented Mar 4, 2026 •

edited

Loading