Skip to content

Conversation

@ryanhoangt
Copy link
Collaborator

@ryanhoangt ryanhoangt commented Nov 22, 2025

This PR is to:

  • Fix existing examples & failure to run on cron job
  • Speed up with multiple workers using pytest-xdist

Fix #1058


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:30c3275-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-30c3275-python \
  ghcr.io/openhands/agent-server:30c3275-python

All tags pushed for this build

ghcr.io/openhands/agent-server:30c3275-golang-amd64
ghcr.io/openhands/agent-server:30c3275-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:30c3275-golang-arm64
ghcr.io/openhands/agent-server:30c3275-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:30c3275-java-amd64
ghcr.io/openhands/agent-server:30c3275-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:30c3275-java-arm64
ghcr.io/openhands/agent-server:30c3275-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:30c3275-python-amd64
ghcr.io/openhands/agent-server:30c3275-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:30c3275-python-arm64
ghcr.io/openhands/agent-server:30c3275-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:30c3275-golang
ghcr.io/openhands/agent-server:30c3275-java
ghcr.io/openhands/agent-server:30c3275-python

About Multi-Architecture Support

  • Each variant tag (e.g., 30c3275-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 30c3275-python-amd64) are also available if needed

@ryanhoangt ryanhoangt added the test-examples Run all applicable "examples/" files. Expensive operation. label Nov 24, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 24, 2025

"## 🔄 Running Examples with openhands/claude-haiku-4-5-20251001 Generated: 2025-11-24 16:54:13 UTC | Example | Status | Duration | Cost | |---------|--------|----------|------| | 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 32.1s | $0.03 | | 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 10.0s | $0.01 | | 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 12.0s | $0.01 | | 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 50.0s | $0.02 | | 01_standalone_sdk/09_pause_example.py | ✅ PASS | 14.6s | $0.01 | | 01_standalone_sdk/10_persistence.py | ✅ PASS | 37.1s | $0.02 | | 01_standalone_sdk/11_async.py | ✅ PASS | 40.5s | $0.03 | | 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 14.1s | $0.01 | | 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 32.7s | $0.01 | | 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 3m 44s | $0.41 | | 01_standalone_sdk/17_image_input.py | ✅ PASS | 17.3s | $0.02 | | 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 15.5s | $0.01 | | 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 31.9s | $0.02 | | 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 20.4s | $0.02 | | 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 10.2s | $0.00 | | 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 18.0s | $0.01 | | 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 36.9s | $0.01 | | 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 5m 40s | $0.41 | | 01_standalone_sdk/25_agent_delegation.py | ✅ PASS | 48.8s | $0.04 | | 01_standalone_sdk/26_custom_visualizer.py | ✅ PASS | 34.1s | $0.03 | | 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 1m 14s | $0.05 | | 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 1m 55s | $0.04 | | 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 1m 7s | $0.06 | | 02_remote_agent_server/04_convo_with_api_sandboxed_server.py | ❌ FAIL
Exit code 1 | 5m 13s | -- | --- ### ❌ Some tests failed Total: 24 | Passed: 23 | Failed: 1 | Total Cost: $1.27 Failed examples: - examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1 View full workflow run"

@ryanhoangt ryanhoangt added test-examples Run all applicable "examples/" files. Expensive operation. and removed test-examples Run all applicable "examples/" files. Expensive operation. labels Nov 24, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 24, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL12502579253% 
report-only-changed-files is enabled. No files were changed during this commit :)

@ryanhoangt ryanhoangt added test-examples Run all applicable "examples/" files. Expensive operation. and removed test-examples Run all applicable "examples/" files. Expensive operation. labels Nov 24, 2025
@ryanhoangt ryanhoangt added test-examples Run all applicable "examples/" files. Expensive operation. and removed test-examples Run all applicable "examples/" files. Expensive operation. labels Nov 24, 2025
@ryanhoangt ryanhoangt added test-examples Run all applicable "examples/" files. Expensive operation. and removed test-examples Run all applicable "examples/" files. Expensive operation. labels Nov 24, 2025
@ryanhoangt ryanhoangt marked this pull request as ready for review November 24, 2025 16:41
@openhands-ai
Copy link

openhands-ai bot commented Nov 24, 2025

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run Examples Scripts

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1229 at branch `ht/fix-examples`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@ryanhoangt ryanhoangt removed the test-examples Run all applicable "examples/" files. Expensive operation. label Nov 24, 2025
@ryanhoangt ryanhoangt added the test-examples Run all applicable "examples/" files. Expensive operation. label Nov 24, 2025
@github-actions
Copy link
Contributor

🔄 Running Examples with openhands/claude-haiku-4-5-20251001\n\n_Run in progress..._\n

@ryanhoangt ryanhoangt added test-examples Run all applicable "examples/" files. Expensive operation. and removed test-examples Run all applicable "examples/" files. Expensive operation. labels Nov 24, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 24, 2025

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2025-11-24 17:26:47 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 29.9s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 10.3s $0.01
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 13.7s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 49.4s $0.02
01_standalone_sdk/09_pause_example.py ✅ PASS 17.0s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 39.7s $0.02
01_standalone_sdk/11_async.py ✅ PASS 37.8s $0.02
01_standalone_sdk/12_custom_secrets.py ✅ PASS 14.0s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 32.7s $0.01
01_standalone_sdk/14_context_condenser.py ✅ PASS 3m 15s $0.37
01_standalone_sdk/17_image_input.py ✅ PASS 17.2s $0.02
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 21.5s $0.01
01_standalone_sdk/19_llm_routing.py ✅ PASS 17.4s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 23.9s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 10.4s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 13.5s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 39.7s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 7m 34s $0.57
01_standalone_sdk/25_agent_delegation.py ✅ PASS 1m 31s $0.09
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 25.9s $0.02
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 1m 8s $0.05
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 1m 3s $0.02
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 2m 17s $0.04
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ✅ PASS 4m 57s $0.03

✅ All tests passed!

Total: 24 | Passed: 24 | Failed: 0 | Total Cost: $1.42

View full workflow run

@ryanhoangt ryanhoangt changed the title Fix run examples workflow failed on schedule run & fix failed example scripts Fix run examples workflow failed on schedule run & use parallel execution with pytest Nov 24, 2025
@ryanhoangt ryanhoangt added test-examples Run all applicable "examples/" files. Expensive operation. and removed test-examples Run all applicable "examples/" files. Expensive operation. labels Nov 24, 2025
@ryanhoangt ryanhoangt requested a review from xingyaoww November 24, 2025 18:12
@github-actions
Copy link
Contributor

github-actions bot commented Nov 24, 2025

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2025-11-24 18:21:33 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 38.5s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 13.6s $0.01
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 15.1s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 51.0s $0.02
01_standalone_sdk/09_pause_example.py ✅ PASS 18.5s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 43.5s $0.02
01_standalone_sdk/11_async.py ✅ PASS 36.3s $0.03
01_standalone_sdk/12_custom_secrets.py ✅ PASS 23.0s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 35.3s $0.02
01_standalone_sdk/14_context_condenser.py ✅ PASS 2m 43s $0.30
01_standalone_sdk/17_image_input.py ✅ PASS 17.4s $0.02
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 20.7s $0.01
01_standalone_sdk/19_llm_routing.py ✅ PASS 17.9s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 22.0s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 13.8s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 22.2s $0.02
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 41.8s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 4m 46s $0.32
01_standalone_sdk/25_agent_delegation.py ✅ PASS 47.0s $0.04
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 24.6s $0.03
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 56.4s $0.03
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 2m 29s $0.04
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 2m 45s $0.10
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ✅ PASS 1m 40s $0.03

✅ All tests passed!

Total: 24 | Passed: 24 | Failed: 0 | Total Cost: $1.13

View full workflow run

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so awesome! Thank you!

@xingyaoww xingyaoww added test-examples Run all applicable "examples/" files. Expensive operation. and removed test-examples Run all applicable "examples/" files. Expensive operation. labels Nov 24, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 24, 2025

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2025-11-24 19:24:10 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 29.1s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 12.5s $0.01
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 12.9s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 48.3s $0.02
01_standalone_sdk/09_pause_example.py ✅ PASS 16.6s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 39.0s $0.02
01_standalone_sdk/11_async.py ✅ PASS 37.8s $0.03
01_standalone_sdk/12_custom_secrets.py ✅ PASS 19.4s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 32.7s $0.01
01_standalone_sdk/14_context_condenser.py ✅ PASS 2m 58s $0.34
01_standalone_sdk/17_image_input.py ✅ PASS 19.0s $0.02
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 23.6s $0.01
01_standalone_sdk/19_llm_routing.py ✅ PASS 24.9s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 20.8s $0.01
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 11.4s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 16.6s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 38.2s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 3m 41s $0.21
01_standalone_sdk/25_agent_delegation.py ✅ PASS 1m 42s $0.22
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 23.0s $0.02
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 1m 10s $0.05
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 2m 28s $0.05
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 2m 54s $0.07
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ✅ PASS 1m 37s $0.03

✅ All tests passed!

Total: 24 | Passed: 24 | Failed: 0 | Total Cost: $1.23

View full workflow run

@xingyaoww xingyaoww merged commit e996867 into main Nov 24, 2025
31 checks passed
@xingyaoww xingyaoww deleted the ht/fix-examples branch November 24, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test-examples Run all applicable "examples/" files. Expensive operation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Speed up test-examples by running them in parallel

3 participants