Skip to content

add apptainer eval support for swe bench#643

Merged
neubig merged 1 commit into
OpenHands:mainfrom
adityasoni9998:apptainer_eval_support
Apr 7, 2026
Merged

add apptainer eval support for swe bench#643
neubig merged 1 commit into
OpenHands:mainfrom
adityasoni9998:apptainer_eval_support

Conversation

@adityasoni9998

Copy link
Copy Markdown
Contributor

This PR adds support to evaluate OpenHands on SWE-Bench with Apptainer runtimes - most HPCs don't support using docker but support apptainer and this PR allows running evals in such HPC clusters easily.

@all-hands-bot all-hands-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Solves a real problem (HPC cluster support) with straightforward implementation, but needs polish on error messages, consistency, and testing.

if not remote_image_exists(agent_server_image):
raise RuntimeError(
f"Agent server image {agent_server_image} does not exist in container registry, "
"make sure to build, push it, and make it public accessible before using apptainer workspace."

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Grammar error in error message.

Suggested change
"make sure to build, push it, and make it public accessible before using apptainer workspace."
"make sure to build, push it, and make it publicly accessible before using apptainer workspace."

Comment on lines +209 to +213
logger.info(
"Skipping local wrap for apptainer workspace; expecting image to be pre-wrapped in registry"
)

workspace = ApptainerWorkspace(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Inconsistent behavior - the docker workspace block (lines 179-193) actively wraps the image when wrap_needed=True, but apptainer just logs and assumes pre-wrapped images. This creates surprising asymmetry.

Either:

  1. Document this difference clearly ("apptainer REQUIRES pre-wrapped images"), or
  2. Fail fast if wrap_needed=True in apptainer mode

Currently a user might expect the same wrapping behavior and be confused when it silently skips.

server_image=agent_server_image,
working_dir="/workspace",
forward_env=forward_env or [],
cache_dir=os.getenv("APPTAINER_CACHEDIR", None),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The README says "ensure that this directory exists" for APPTAINER_CACHEDIR, but the code doesn't validate it. Consider adding a check:

cache_dir = os.getenv("APPTAINER_CACHEDIR", None)
if cache_dir and not os.path.isdir(cache_dir):
    raise RuntimeError(
        f"APPTAINER_CACHEDIR={cache_dir} does not exist. "
        "Please create the directory before running."
    )

This gives better error messages than letting Apptainer fail later with cryptic errors.

Comment on lines +204 to +206
**Optionally**, you can override the default location where Apptainer cache is saved using the below environment variables:

```bash

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The README mentions setting APPTAINER_TMPDIR, but I don't see it used anywhere in the code (only APPTAINER_CACHEDIR is read in run_infer.py:217). Either:

  1. Remove this from docs if it's not needed, or
  2. Pass it to ApptainerWorkspace if it should be used

Is APPTAINER_TMPDIR automatically picked up by the Apptainer runtime without explicit passing?

working_dir="/workspace",
forward_env=forward_env or [],
)
elif self.metadata.workspace_type == "apptainer":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical - Testing Gap: This PR adds a new workspace type but includes no tests. At minimum, add tests for:

  1. Apptainer workspace initialization with valid image
  2. Error handling when remote_image_exists() returns False
  3. Handling of APPTAINER_CACHEDIR environment variable
  4. Behavior when wrap_needed=True

Put tests in benchmarks/swebench/tests/ following the existing test patterns. This is not optional for new components.

@neubig neubig left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@neubig neubig merged commit f3eadfa into OpenHands:main Apr 7, 2026
3 checks passed
GaokaiZhang pushed a commit to GaokaiZhang/benchmarks that referenced this pull request Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants