add apptainer eval support for swe bench#643
Conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Solves a real problem (HPC cluster support) with straightforward implementation, but needs polish on error messages, consistency, and testing.
| if not remote_image_exists(agent_server_image): | ||
| raise RuntimeError( | ||
| f"Agent server image {agent_server_image} does not exist in container registry, " | ||
| "make sure to build, push it, and make it public accessible before using apptainer workspace." |
There was a problem hiding this comment.
🟠 Important: Grammar error in error message.
| "make sure to build, push it, and make it public accessible before using apptainer workspace." | |
| "make sure to build, push it, and make it publicly accessible before using apptainer workspace." |
| logger.info( | ||
| "Skipping local wrap for apptainer workspace; expecting image to be pre-wrapped in registry" | ||
| ) | ||
|
|
||
| workspace = ApptainerWorkspace( |
There was a problem hiding this comment.
🟡 Suggestion: Inconsistent behavior - the docker workspace block (lines 179-193) actively wraps the image when wrap_needed=True, but apptainer just logs and assumes pre-wrapped images. This creates surprising asymmetry.
Either:
- Document this difference clearly ("apptainer REQUIRES pre-wrapped images"), or
- Fail fast if
wrap_needed=Truein apptainer mode
Currently a user might expect the same wrapping behavior and be confused when it silently skips.
| server_image=agent_server_image, | ||
| working_dir="/workspace", | ||
| forward_env=forward_env or [], | ||
| cache_dir=os.getenv("APPTAINER_CACHEDIR", None), |
There was a problem hiding this comment.
🟡 Suggestion: The README says "ensure that this directory exists" for APPTAINER_CACHEDIR, but the code doesn't validate it. Consider adding a check:
cache_dir = os.getenv("APPTAINER_CACHEDIR", None)
if cache_dir and not os.path.isdir(cache_dir):
raise RuntimeError(
f"APPTAINER_CACHEDIR={cache_dir} does not exist. "
"Please create the directory before running."
)This gives better error messages than letting Apptainer fail later with cryptic errors.
| **Optionally**, you can override the default location where Apptainer cache is saved using the below environment variables: | ||
|
|
||
| ```bash |
There was a problem hiding this comment.
🟡 Suggestion: The README mentions setting APPTAINER_TMPDIR, but I don't see it used anywhere in the code (only APPTAINER_CACHEDIR is read in run_infer.py:217). Either:
- Remove this from docs if it's not needed, or
- Pass it to ApptainerWorkspace if it should be used
Is APPTAINER_TMPDIR automatically picked up by the Apptainer runtime without explicit passing?
| working_dir="/workspace", | ||
| forward_env=forward_env or [], | ||
| ) | ||
| elif self.metadata.workspace_type == "apptainer": |
There was a problem hiding this comment.
🔴 Critical - Testing Gap: This PR adds a new workspace type but includes no tests. At minimum, add tests for:
- Apptainer workspace initialization with valid image
- Error handling when
remote_image_exists()returns False - Handling of
APPTAINER_CACHEDIRenvironment variable - Behavior when
wrap_needed=True
Put tests in benchmarks/swebench/tests/ following the existing test patterns. This is not optional for new components.
This PR adds support to evaluate OpenHands on SWE-Bench with Apptainer runtimes - most HPCs don't support using docker but support apptainer and this PR allows running evals in such HPC clusters easily.