Skip to content

Fix duplicate observations for same tool_call_id after crash recovery#2300

Merged
csmith49 merged 2 commits intomainfrom
openhands/fix-duplicate-observations-after-crash-recovery
Mar 4, 2026
Merged

Fix duplicate observations for same tool_call_id after crash recovery#2300
csmith49 merged 2 commits intomainfrom
openhands/fix-duplicate-observations-after-crash-recovery

Conversation

@csmith49
Copy link
Collaborator

@csmith49 csmith49 commented Mar 4, 2026

Summary

This PR fixes a bug where duplicate observations were being created for the same tool_call_id after crash recovery.

Fixes #2298

Problem

When a server crashes during tool execution and restarts:

  1. ActionEvent created (tool_call_id=X)
  2. Server crashes during tool execution
  3. On restart, crash recovery emits AgentErrorEvent (tool_call_id=X)
  4. Crash recovery sets execution_status=ERROR
  5. User calls run() again
  6. run() allows ERROR status to proceed
  7. agent.step() calls get_unmatched_actions() which returned the action (because AgentErrorEvent was not checked)
  8. agent.step() calls _execute_actions() on the "pending" action
  9. Tool executes and emits ObservationEvent (tool_call_id=X)
  10. Result: BOTH AgentErrorEvent AND ObservationEvent for same tool_call_id

Root Cause

get_unmatched_actions() in state.py only checked for ObservationEvent and UserRejectObservation:

if isinstance(event, (ObservationEvent, UserRejectObservation)):
    observed_action_ids.add(event.action_id)

It did NOT check for AgentErrorEvent, so the action remained "unmatched" even after crash recovery emitted an error for it.

Solution

This PR implements the third fix option suggested in the issue: "Make get_unmatched_actions() also check AgentErrorEvent by tool_call_id".

The fix:

  1. Adds AgentErrorEvent to the imports in state.py
  2. Tracks observed tool_call_ids in a separate set
  3. When encountering AgentErrorEvent, adds its tool_call_id to the observed set
  4. When checking if an action is unmatched, also checks if its tool_call_id has been observed

Note: AgentErrorEvent is matched by tool_call_id (not action_id) because it doesn't have an action_id field, unlike ObservationEvent and UserRejectObservation.

Tests

Added comprehensive tests in tests/sdk/conversation/test_get_unmatched_actions.py:

  • test_action_without_observation_is_unmatched - Basic case
  • test_action_with_observation_event_is_matched - ObservationEvent matching
  • test_action_with_user_reject_observation_is_matched - UserRejectObservation matching
  • test_action_with_agent_error_event_is_matched - NEW: AgentErrorEvent matching
  • test_multiple_actions_with_mixed_responses - Mixed observation types
  • test_agent_error_event_matching_by_tool_call_id - NEW: Verifies tool_call_id matching
  • test_agent_error_event_different_tool_call_id_does_not_match - NEW: Non-matching tool_call_id
  • test_crash_recovery_scenario_prevents_duplicate_execution - NEW: Full crash recovery scenario
  • test_non_executable_action_is_not_considered_unmatched - Non-executable actions

All existing tests continue to pass.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:6db9283-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-6db9283-python \
  ghcr.io/openhands/agent-server:6db9283-python

All tags pushed for this build

ghcr.io/openhands/agent-server:6db9283-golang-amd64
ghcr.io/openhands/agent-server:6db9283-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:6db9283-golang-arm64
ghcr.io/openhands/agent-server:6db9283-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:6db9283-java-amd64
ghcr.io/openhands/agent-server:6db9283-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:6db9283-java-arm64
ghcr.io/openhands/agent-server:6db9283-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:6db9283-python-amd64
ghcr.io/openhands/agent-server:6db9283-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:6db9283-python-arm64
ghcr.io/openhands/agent-server:6db9283-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:6db9283-golang
ghcr.io/openhands/agent-server:6db9283-java
ghcr.io/openhands/agent-server:6db9283-python

About Multi-Architecture Support

  • Each variant tag (e.g., 6db9283-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 6db9283-python-amd64) are also available if needed

When a server crashes during tool execution and restarts, crash recovery
emits an AgentErrorEvent. However, if run() is called again, the action
was being re-executed because get_unmatched_actions() only checked for
ObservationEvent and UserRejectObservation.

This change makes get_unmatched_actions() also check for AgentErrorEvent
by tool_call_id (since AgentErrorEvent does not have an action_id field).
This prevents the action from being re-executed after crash recovery.

Fixes #2298

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)

============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=196,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/conversation
   state.py185895%191, 195, 206, 345, 391–393, 520
TOTAL19285570970% 

@csmith49 csmith49 marked this pull request as ready for review March 4, 2026 20:37
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean fix that extends the existing pattern elegantly.

Code Quality: The implementation is solid. You track tool_call_id separately for AgentErrorEvent matching, which makes sense since it lacks action_id. The tests are comprehensive and validate real behavior (not just mocks).

⚠️ Eval-Risk Flag: This PR changes crash recovery behavior - specifically how agents handle actions after server restarts. Per repo guidelines, this should be tested with lightweight evals before merging to ensure no benchmark/evaluation regressions.

Verdict: Code is merge-ready from a quality standpoint, but flagging for human review + eval testing due to agent behavior change.

observed_action_ids.add(event.action_id)
elif isinstance(event, AgentErrorEvent):
# AgentErrorEvent doesn't have action_id, match by tool_call_id
observed_tool_call_ids.add(event.tool_call_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, weird. I thought this was already the case, idk when it broke 😅

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@csmith49 csmith49 merged commit 5015e72 into main Mar 4, 2026
32 checks passed
@csmith49 csmith49 deleted the openhands/fix-duplicate-observations-after-crash-recovery branch March 4, 2026 21:22
zparnold added a commit to zparnold/software-agent-sdk that referenced this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Duplicate observations for same tool_call_id after crash recovery

4 participants