Fix/openhands sandbox launch by AmyTao · Pull Request #182 · benchflow-ai/benchflow

AmyTao · 2026-04-23T02:02:53Z

Fixes #169

.venv/bin/benchflow eval create
-t tasks/weighted-gdp-calc
-a openhands
-m gemini-3.1-flash-lite-preview
-e docker
-o jobs/skillsbench-openhands-gemini

Task: weighted-gdp-calc
Agent: openhands (gemini-3.1-flash-lite-preview)
Reward: 0.0
Tool calls: 17

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional findings.

Co-authored-by: Copilot <copilot@github.com>

devin-ai-integration

Devin Review found 2 new potential issues.

View 12 additional findings in Devin Review.

devin-ai-integration · 2026-04-24T15:10:16Z

-    inner = (
-        f"export HOME=/home/{sandbox_user} && {agent_launch}"
-    )
+    inner = f"export HOME=/home/{sandbox_user} && {agent_launch}"


🟡 Removal of cd from build_priv_drop_cmd creates cwd inconsistency between setpriv and su -l paths

The cd /home/{sandbox_user} was removed from the inner command in build_priv_drop_cmd, so setpriv now inherits the working directory from ContainerTransport (typically /app). However, the su -l fallback path still simulates a login shell which changes directory to /home/{sandbox_user} before running the command. This means agents will run in different working directories depending on whether the container has setpriv (Debian/Ubuntu → /app) or falls back to su -l (Alpine/BusyBox → /home/{sandbox_user}). The old explicit cd made both paths consistent.

Prompt for agents

The build_priv_drop_cmd function in src/benchflow/_sandbox.py has two privilege-drop paths: setpriv (primary) and su -l (fallback). After removing the explicit cd /home/{sandbox_user} from the inner command, the two paths produce different working directories: - setpriv: inherits cwd from ContainerTransport (e.g. /app) - su -l: changes to /home/{sandbox_user} because -l simulates a login To make both paths consistent, either: 1. Change su -l to su (without -l) so it doesn't change directory, OR 2. Add an explicit cd to agent_cwd in the inner command for the su -l path only, OR 3. Restore the cd but change it to cd to the workspace (agent_cwd) instead of home The intent of the change was to let ContainerTransport control the working directory, but the su -l fallback defeats this by overriding the cwd.

Was this helpful? React with 👍 or 👎 to provide feedback.

# Conflicts: # src/benchflow/_acp_run.py # src/benchflow/_agent_env.py

Brings 126 ruff errors → 0 so CI's lint check goes green and unblocks the 5 PRs targeting dev-0.3 (#176, #180, #181, #182, #191) that were landing on top of pre-existing repo lint debt. What changed: 1. Auto-fixes via `ruff check --fix --unsafe-fixes`: - 40 F401 unused-imports across src/, tests/, examples/ - 8 I001 unsorted-imports - 6 UP037 quoted-annotations modernized - Other auto-fixable rules 2. Hand fixes: - src/benchflow/__init__.py: removed `Trial` from the `from harbor` re-export block (it was shadowed by `from benchflow.trial import Trial` at line 65, which is the canonical public Trial). Added `trial_config_from_yaml` to __all__. - src/benchflow/process.py: 3x `raise ConnectionError(...) from e` for B904 (errors raised inside except clauses). - src/benchflow/mcp/reviewer_server.py: same B904 fix for fastmcp ImportError reraise. - tests/test_skill_eval.py: raw string for `pytest.raises(match=...)` pattern (RUF043). - 3 files: replaced `×` (Unicode multiplication sign) in comments and f-strings with `x` (latin x) to clear RUF001/RUF003. 3. Per-file ignores added to pyproject.toml `[tool.ruff.lint.per-file-ignores]`: - `experiments/*.py` and `tests/conformance/*.py` ignore E402 — these are standalone scripts that legitimately set sys.path before importing. - `src/benchflow/runtime.py` ignores F821 — uses forward references resolved by `from __future__ import annotations`; explicit TYPE_CHECKING imports would force eager loads. No code behavior changes. 580 tests pass; the 8 pre-existing failures (env-leak between subscription auth tests, Docker compose env, judge model default mismatch) are unrelated to this PR.

devin-ai-integration Bot reviewed Apr 23, 2026

View reviewed changes

AmyTao added 2 commits April 24, 2026 09:04

rebase on upstream/0.3

bc76137

openhand cli add

1eca795

AmyTao force-pushed the fix/openhands-sandbox-launch branch from 968b126 to 1eca795 Compare April 24, 2026 13:15

This comment was marked as resolved.

Sign in to view

enhance api key security

3670225

This comment was marked as resolved.

Sign in to view

refine tests

ab5ce26

Co-authored-by: Copilot <copilot@github.com>

devin-ai-integration Bot reviewed Apr 24, 2026

View reviewed changes

xdotli mentioned this pull request Apr 25, 2026

merge: main → dev-0.3 (release prep for v0.3.2) #195

Merged

4 tasks

Merge remote-tracking branch 'origin/dev-0.3' into pr-182-rebase

7068987

# Conflicts: # src/benchflow/_acp_run.py # src/benchflow/_agent_env.py

xdotli mentioned this pull request Apr 25, 2026

chore: clean up ruff lint debt across repo #197

Merged

4 tasks

xdotli merged commit 871bd21 into benchflow-ai:dev-0.3 Apr 25, 2026
1 check passed

xdotli mentioned this pull request Apr 25, 2026

release: benchflow 0.3.2 — BaseUser, verifier hardening, DinD compose, lint cleanup #199

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/openhands sandbox launch#182

Fix/openhands sandbox launch#182
xdotli merged 5 commits intobenchflow-ai:dev-0.3from
AmyTao:fix/openhands-sandbox-launch

AmyTao commented Apr 23, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

devin-ai-integration Bot Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AmyTao commented Apr 23, 2026

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

devin-ai-integration Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants