Skip to content

fix(docker): bundle Python runtime for portable /agent-server#2676

Closed
simonrosenberg wants to merge 4 commits intomainfrom
fix/portable-agent-server-python-runtime
Closed

fix(docker): bundle Python runtime for portable /agent-server#2676
simonrosenberg wants to merge 4 commits intomainfrom
fix/portable-agent-server-python-runtime

Conversation

@simonrosenberg
Copy link
Copy Markdown
Collaborator

@simonrosenberg simonrosenberg commented Apr 2, 2026

Summary

  • Bundle a portable Python runtime into /agent-server/.python during the builder stage
  • Repoint the venv and set LD_LIBRARY_PATH for source-based runtime images
  • Add portability regression coverage around the Dockerfile contract

Root cause

SDK v1.15.0 (commit 06b91863, Mar 26) switched the builder from --managed-python (uv-installed, portable) to --python-preference only-system to fix a legitimate seccomp issue (python-build-standalone's libpython has an executable stack flag rejected under DinD restrictions).

This made the venv non-portable: .venv/bin/python became a symlink to /usr/local/bin/python3 (from the builder's python:3.13-bookworm). When this venv is COPYed onto commit0 base images (Ubuntu 22.04, Python at /usr/bin/python3), the symlink is broken and the container fails to start:

exec: "/agent-server/.venv/bin/python": stat /usr/local/bin/python3: no such file or directory

SWE-bench was unaffected because its base images derive from Python Docker images that have /usr/local/bin/python3.

Timeline:

  • Pre-Mar 26: --managed-python (portable) → worked everywhere
  • Mar 26 (v1.15.0, 06b91863): --python-preference only-system → broke commit0
  • All commit0 evals 100% failing since Mar 26

Fix

Keep --python-preference only-system (no seccomp issues) but bundle the runtime:

  • Copy interpreter binary, stdlib, and libpython into /agent-server/.python/
  • Repoint venv symlinks and pyvenv.cfg at the bundled copy
  • Set LD_LIBRARY_PATH in source targets for libpython resolution

Validation

  • uv run pytest tests/agent_server/test_docker_build.py -q
  • docker buildx build --platform linux/amd64 --target source-minimal --build-arg BASE_IMAGE=docker.io/wentingzhao/wcwidth:v0 -f openhands-agent-server/openhands/agent_server/docker/Dockerfile -t local/commit0-wcwidth-portable:pr --load .
  • docker run --rm --platform linux/amd64 --entrypoint /bin/sh local/commit0-wcwidth-portable:pr -lc '/agent-server/.venv/bin/python -c "import openhands.agent_server, sys; print(sys.executable)"'
  • Verified venv symlinks point to bundled .python/ directory (not /usr/local/bin)
  • Confirmed base image has NO /usr/local/bin/python3 — would have failed without fix

Refs: OpenHands/benchmarks#607
Fixes #2585


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:ab95540-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-ab95540-python \
  ghcr.io/openhands/agent-server:ab95540-python

All tags pushed for this build

ghcr.io/openhands/agent-server:ab95540-golang-amd64
ghcr.io/openhands/agent-server:ab95540-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:ab95540-golang-arm64
ghcr.io/openhands/agent-server:ab95540-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:ab95540-java-amd64
ghcr.io/openhands/agent-server:ab95540-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:ab95540-java-arm64
ghcr.io/openhands/agent-server:ab95540-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:ab95540-python-amd64
ghcr.io/openhands/agent-server:ab95540-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:ab95540-python-arm64
ghcr.io/openhands/agent-server:ab95540-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:ab95540-golang
ghcr.io/openhands/agent-server:ab95540-java
ghcr.io/openhands/agent-server:ab95540-python

About Multi-Architecture Support

  • Each variant tag (e.g., ab95540-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., ab95540-python-amd64) are also available if needed

Partially addresses #2687 (source-image portability design issue) by making the runtime self-contained.

After building the venv with system Python, copy the interpreter binary,
standard library, and libpython shared objects into /agent-server/.python/.
Re-point the venv symlinks and pyvenv.cfg at the bundled copy so that the
entire /agent-server directory is self-contained.

This means eval images (and any other consumer) can COPY /agent-server onto
any base image — even one without Python — and the entrypoint will resolve.

Changes:
- Builder stage: new RUN step bundles Python runtime into .python/
- source / source-minimal targets: set LD_LIBRARY_PATH for libpython
- Add Dockerfile.portability-test for CI validation of the contract
- Add unit tests verifying Dockerfile portability structure

Fixes #2585

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server/docker
   build.py51316168%81, 84–85, 99, 104, 108–110, 114, 119, 136–141, 144, 167, 169, 177, 179–180, 182–186, 188, 192, 195, 198–201, 203–205, 207, 209, 211, 213, 217–218, 303–304, 308–309, 313–314, 318–319, 332, 334–342, 415, 443–444, 446, 517, 531, 534, 538–539, 543–544, 548, 564–566, 569, 575–578, 588, 603, 624, 654, 673–674, 678–680, 734, 745–746, 829, 839–840, 907–908, 913, 916, 923, 928, 933, 939, 944, 951–952, 957, 962, 968, 974, 979, 988, 991–997, 1000–1002, 1005, 1019–1024, 1026–1027, 1029–1031, 1034–1035, 1039–1042, 1044, 1053, 1065, 1068, 1084–1089, 1091–1092, 1094–1098, 1100, 1107, 1111
TOTAL21728979554% 

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Solves a real production problem with a pragmatic approach. One critical issue with the portability test Dockerfile needs fixing.

Key Insight: This is good infrastructure work that makes /agent-server truly portable. The bundled Python approach is sound, but the portability validation Dockerfile is incomplete.

Comment thread openhands-agent-server/openhands/agent_server/docker/Dockerfile.portability-test Outdated
Comment thread tests/agent_server/test_docker_build.py Outdated
Comment thread openhands-agent-server/openhands/agent_server/docker/Dockerfile
ARG USERNAME
COPY --chown=${USERNAME}:${USERNAME} --from=builder /agent-server /agent-server
# Bundled Python's libpython*.so lives under /agent-server/.python/lib
ENV LD_LIBRARY_PATH=/agent-server/.python/lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Acceptable: The ${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH} syntax correctly avoids a trailing colon when the variable is empty. Prepending ensures the bundled libpython takes precedence.

Potential consideration: If a runtime image has conflicting Python libraries in its LD_LIBRARY_PATH, prepending should resolve it. If issues arise, may need LD_LIBRARY_PATH=/agent-server/.python/lib (no append) to force isolation.

@simonrosenberg simonrosenberg self-assigned this Apr 2, 2026
@simonrosenberg
Copy link
Copy Markdown
Collaborator Author

Addressed the review feedback.

Changes pushed on fix/portable-agent-server-python-runtime:

  • moved the portability smoke test into the main Dockerfile as a real portability-test stage
  • removed the broken standalone Dockerfile.portability-test
  • hardened the Dockerfile text tests so source and source-minimal are checked independently
  • pinned the builder runtime to $TARGETPLATFORM
  • fixed the local single-platform build path so buildx --load honors opts.platforms (this mattered for benchmarks-side linux/amd64 validation on Apple Silicon)

Validation:

  • uv run pytest tests/agent_server/test_docker_build.py -q -> 39 passed
  • docker buildx build --platform linux/amd64 --target=portability-test -f openhands-agent-server/openhands/agent_server/docker/Dockerfile . --load -> passed
  • portability target executed /agent-server/.venv/bin/python and imported openhands.agent_server successfully under linux/amd64

@all-hands-bot
Copy link
Copy Markdown
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @simonrosenberg, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

simonrosenberg pushed a commit that referenced this pull request Apr 8, 2026
…+ execstack sanitize

Switches the agent-server builder stage from `uv venv --python-preference
only-system` back to uv-managed python-build-standalone, installed into
`/agent-server/uv-managed-python`. This restores the pre-v1.15.0 property
that `/agent-server/.venv/bin/python` is a symlink inside `/agent-server`,
so downstream consumers can COPY `/agent-server` onto any base image
without needing a matching system Python.

The original reason `only-system` was chosen was that
python-build-standalone's `libpython3.13.so.1.0` ships with
`PT_GNU_STACK PF_X`, which Debian Trixie's glibc NX enforcement and
sysbox-runc seccomp refuse to load. This commit addresses that at its
actual layer — ELF program headers — rather than by dodging managed
Python.

New helper `clear_execstack.py`:
  - Walks a directory tree, finds every `.so*` file, parses ELF program
    headers, and clears PF_X on any PT_GNU_STACK entry that has it.
  - Supports ELF32/ELF64 and both endiannesses.
  - Idempotent; no-op on already-clean ELFs and non-ELF files.
  - Strip-safe: only rewrites a single uint32 inside an existing phdr.
  - Dual-use: runnable as `python clear_execstack.py <path>`, importable
    as `clear_execstack(path)` / `clear_execstack_in_tree(root)`.

Two call sites share the helper:
  1. Builder stage runs it across `/agent-server/uv-managed-python`
     immediately after `uv python install 3.13`, before `uv venv`.
  2. PyInstaller spec loads it via importlib and applies it as a
     post-Analysis hook so the `binary`/`binary-minimal` one-file
     archive also ships sanitized .so files. Supersedes the inline
     version from #2574.

Builder also asserts `.venv/bin/python` resolves inside
`/agent-server/uv-managed-python/` so a future regression fails at
build time instead of at downstream runtime.

Tests (30 cases) cover the full ELF matrix: 32/64-bit × LE/BE,
PT_GNU_STACK RWX / RW / absent, tree walk, symlink skip, non-ELF skip,
truncated ELF, idempotence, and the CLI entrypoint.

Closes #2761. Supersedes #2676 (bundle Debian Python) and #2692
(naive revert to --managed-python without execstack fix).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@all-hands-bot
Copy link
Copy Markdown
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @simonrosenberg, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

1 similar comment
@all-hands-bot
Copy link
Copy Markdown
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @simonrosenberg, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

@simonrosenberg
Copy link
Copy Markdown
Collaborator Author

This was fixed by bumping version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Decouple eval image assembly from SDK Dockerfile internals

3 participants