Skip to content

Conversation

@tofarr
Copy link
Collaborator

@tofarr tofarr commented Nov 21, 2025

Summary

This PR makes the conversation start process more resilient by improving how we handle event service lifecycle and conversation reuse.

Changes Made

1. Safer Conversation Reuse Logic

  • Before: Only checked if conversation_id exists in _event_services dictionary
  • After: Also verifies that the existing event service is actually open/active using the new is_open() method
  • Why: Prevents attempting to reuse event services that may be in an inconsistent or closed state

2. Delayed Event Service Storage

  • Before: Stored event service reference in dictionary before calling start()
  • After: Only stores the reference AFTER successful startup
  • Why: Prevents storing references to event services that failed to start, avoiding potential memory leaks and inconsistent state

3. New EventService.is_open() Method

  • Added is_open() method that returns bool(self._conversation)
  • Provides a reliable way to check if an event service is in an active, usable state

Files Modified

  • openhands-agent-server/openhands/agent_server/conversation_service.py
    • Updated conversation reuse logic to check is_open() status
    • Moved event service storage to after successful startup
  • openhands-agent-server/openhands/agent_server/event_service.py
    • Added is_open() method for state checking

Benefits

  • Improved Reliability: Reduces race conditions during conversation startup
  • Better Resource Management: Prevents storing references to failed event services
  • Cleaner State Management: More explicit checks for service availability
  • Backward Compatible: No breaking changes to existing APIs

Testing

These changes address edge cases in conversation lifecycle management and should be particularly beneficial in high-concurrency scenarios or when dealing with network/resource constraints during startup.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:24a7d50-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-24a7d50-python \
  ghcr.io/openhands/agent-server:24a7d50-python

All tags pushed for this build

ghcr.io/openhands/agent-server:24a7d50-golang-amd64
ghcr.io/openhands/agent-server:24a7d50-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:24a7d50-golang-arm64
ghcr.io/openhands/agent-server:24a7d50-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:24a7d50-java-amd64
ghcr.io/openhands/agent-server:24a7d50-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:24a7d50-java-arm64
ghcr.io/openhands/agent-server:24a7d50-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:24a7d50-python-amd64
ghcr.io/openhands/agent-server:24a7d50-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:24a7d50-python-arm64
ghcr.io/openhands/agent-server:24a7d50-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:24a7d50-golang
ghcr.io/openhands/agent-server:24a7d50-java
ghcr.io/openhands/agent-server:24a7d50-python

About Multi-Architecture Support

  • Each variant tag (e.g., 24a7d50-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 24a7d50-python-amd64) are also available if needed

tofarr and others added 3 commits November 21, 2025 13:21
…ilience

- Add try-catch block in _start_event_service to clean up failed event services
- Add comprehensive tests for EventService.is_open() method
- Add tests for conversation reuse logic with is_open() checks
- Add tests for event service startup failure cleanup
- Add tests for successful event service storage after startup
- Ensure event services are only stored after successful startup

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   conversation_service.py29217739%63, 66, 77–78, 81–84, 86, 90, 92, 95–102, 105–106, 109–113, 116–118, 120–123, 125, 132–133, 135–137, 140, 144, 146, 148, 155, 161, 169, 184, 189–190, 193, 199, 202, 213–217, 219–222, 225–230, 233–236, 238–240, 243–245, 250–253, 261, 266–268, 282–286, 289, 291, 294–296, 298, 302, 306, 313–317, 320–321, 327–332, 338–340, 358, 382, 407, 409–410, 436, 438, 440–443, 448, 450–451, 455–456, 458–459, 462–464, 467, 473, 478–481, 488–489, 493–497, 499, 504, 508–510, 514–515, 517–519, 521, 523, 536–538, 541, 544, 547–550, 557–558, 562–564, 567–568, 570
   event_service.py2038458%52–53, 72–74, 77–82, 94, 110, 117, 119, 126–127, 135–138, 145–147, 159–160, 163–164, 166–168, 170, 175, 178, 182–184, 186, 188, 192, 195, 199, 203–204, 206, 223–224, 271, 278–279, 281, 284–286, 288, 292–295, 299–302, 310–313, 332–333, 335–342, 344–345, 351, 357, 370–371, 378
TOTAL12163557354% 

Copy link
Contributor

@hieptl hieptl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! 🙏

@tofarr tofarr merged commit 858c7f4 into main Nov 21, 2025
21 checks passed
@tofarr tofarr deleted the fix-more-resilient-conversation-start branch November 21, 2025 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants