fix(phai): clean up sandbox docker containers after each eval case#60089
Merged
Conversation
Eval cases return as soon as the agent emits end_turn, well before the
ProcessTaskWorkflow's finally block runs cleanup_sandbox. With 16GB-per-
sandbox defaults and max_concurrency=2, accumulated containers exhaust
host memory during a session. Add a finally block in run_eval_case that
force-removes any container matching task-sandbox-{task_id}-*, scoped by
task_id so concurrent cases never touch each other's container.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down. Add the Most PRs don't need this. Real regressions still get caught on master and fix-forward. |
Contributor
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
ee/hogai/eval/sandboxed/runner.py:45-49
If `docker ps` exits with a non-zero return code without raising a Python exception (e.g. Docker daemon returns an error), `result.returncode` is never inspected, so the failure goes completely unlogged and the loop silently skips cleanup. Adding a check here surfaces this silent failure mode.
```suggestion
except Exception:
logger.warning("Failed to list sandbox containers for task %s", task_id, exc_info=True)
return
if result.returncode != 0:
logger.warning(
"docker ps returned exit code %d for task %s: %s",
result.returncode,
task_id,
result.stderr.strip(),
)
return
for container_id in result.stdout.strip().splitlines():
```
Reviews (1): Last reviewed commit: "fix(phai): clean up sandbox docker conta..." | Re-trigger Greptile |
joshsny
approved these changes
May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Docker container cleanup didn't work for long-running parallel tests.
Changes
_cleanup_case_containers(task_id)inee/hogai/eval/sandboxed/runner.pythat runsdocker ps -a --filter name=task-sandbox-{task_id}- --format {{.ID}}and force-removes each match.run_eval_casein atry / finallyso the cleanup fires on both the success and exception branches, regardless of what the workflow's own cleanup does.asyncio.to_threadsodocker rm -fdoesn't block the event loop while Docker waits on container shutdown.Scoping by
task_idprefix (get_sandbox_name_for_taskinproducts/tasks/backend/temporal/process_task/utils.py:597) guarantees we never touch a concurrently-running case's container.How did you test this code?
ruff checkandruff format --checkpass on the modified file.hogli test ee/hogai/eval/sandboxed/..., then watchdocker ps --filter name=task-sandbox-between cases — only the in-flight case's container should appear.Publish to changelog?
no