Fix v1 sandbox client reuse and upload errors#1483
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 89f1cc132b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if _SHARED_SANDBOX_CLIENT is None: | ||
| from verifiers.utils.threaded_sandbox_client import ThreadedAsyncSandboxClient | ||
|
|
||
| _SHARED_SANDBOX_CLIENT = cast(SandboxClient, ThreadedAsyncSandboxClient()) |
There was a problem hiding this comment.
Handle omitted command_timeout with threaded sandbox client
When a sandbox program omits command_timeout, run_sandbox_command passes None from sandbox_config.get("command_timeout") into lease.run_background_job. Because this line now always backs leases with ThreadedAsyncSandboxClient, that call reaches verifiers/utils/threaded_sandbox_client.py:78-94, where time.monotonic() + timeout raises TypeError for None. This breaks the default sandbox program path before the command runs; preserve the threaded client's default timeout or coerce None before calling it.
Useful? React with 👍 / 👎.
ApprovabilityVerdict: Needs human review This PR modifies sandbox client lifecycle management including client reuse and cleanup patterns. Two unresolved review comments flag potential runtime issues (TypeError with None timeout, possible resource leaks). These substantive concerns about bug risks warrant human verification. You can customize Macroscope's approvability policy. Learn more. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 54d95fac8c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 54d95fa. Configure here.

Summary
Validation
Note: local 100x1 SWE-bench Pro smoke with max_concurrent=100 and num_workers=10 completed after this patch; one sandbox upload timeout was contained as a sample-level SandboxError instead of crashing the eval.
Note
Medium Risk
Changes sandbox client lifetime and error handling on a hot path for evals; behavior is narrower than before (shared client, contained upload errors) and is covered by new lifecycle tests.
Overview
The v1 Runtime now keeps one lazy
ThreadedAsyncSandboxClientand passes it into every program/tool/user sandbox lease instead of spinning up a separateAsyncSandboxClientper lease.SandboxLeasegainsowns_client: shared-runtime leases skip closing the client ondelete, andRuntime.teardowndeletes leases then always shuts down the shared client viaclose_sandbox_client(prefersteardown(), elseaclose()).run_background_jobonly forwardstimeoutwhen set, so the backend default (e.g. 900s) applies.upload_program_filesmapsAPIError/UploadTimeoutErrortoSandboxErrorso a failed upload fails one rollout instead of crashing the whole eval.Lifecycle tests cover group/global cleanup (delete vs client close), default background-job timeout, upload timeout wrapping, and delete errors without closing a borrowed client.
Reviewed by Cursor Bugbot for commit 04b8df1. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Fix v1 sandbox client reuse and surface upload errors
Runtimenow lazily creates and caches a singleThreadedAsyncSandboxClientshared across all lease creation calls, replacing per-lease client instantiation in runtime.py.SandboxLeasegains anowns_clientflag so shared clients are not closed when a non-owning lease is deleted;teardowncloses the shared client after all leases are cleaned up.upload_program_filesin sandbox_utils.py now catchesAPIErrorandUploadTimeoutErrorfromprime_sandboxesand re-raises asSandboxErrorwith path and sandbox ID context.close_sandbox_clienthelper prefers a synchronousteardown()method on the client before falling back toaclose().finallyblock duringRuntime.teardown, so it is always closed even if lease deletions raise.Macroscope summarized 04b8df1.