Skip to content

Fix/openhands sandbox launch#182

Merged
xdotli merged 5 commits intobenchflow-ai:dev-0.3from
AmyTao:fix/openhands-sandbox-launch
Apr 25, 2026
Merged

Fix/openhands sandbox launch#182
xdotli merged 5 commits intobenchflow-ai:dev-0.3from
AmyTao:fix/openhands-sandbox-launch

Conversation

@AmyTao
Copy link
Copy Markdown
Contributor

@AmyTao AmyTao commented Apr 23, 2026

Fixes #169

.venv/bin/benchflow eval create
-t tasks/weighted-gdp-calc
-a openhands
-m gemini-3.1-flash-lite-preview
-e docker
-o jobs/skillsbench-openhands-gemini

Task: weighted-gdp-calc
Agent: openhands (gemini-3.1-flash-lite-preview)
Reward: 0.0
Tool calls: 17

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional findings.

Open in Devin Review

@AmyTao AmyTao force-pushed the fix/openhands-sandbox-launch branch from 968b126 to 1eca795 Compare April 24, 2026 13:15
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Co-authored-by: Copilot <copilot@github.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 12 additional findings in Devin Review.

Open in Devin Review

Comment thread src/benchflow/_agent_env.py
Comment thread src/benchflow/_sandbox.py
inner = (
f"export HOME=/home/{sandbox_user} && {agent_launch}"
)
inner = f"export HOME=/home/{sandbox_user} && {agent_launch}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Removal of cd from build_priv_drop_cmd creates cwd inconsistency between setpriv and su -l paths

The cd /home/{sandbox_user} was removed from the inner command in build_priv_drop_cmd, so setpriv now inherits the working directory from ContainerTransport (typically /app). However, the su -l fallback path still simulates a login shell which changes directory to /home/{sandbox_user} before running the command. This means agents will run in different working directories depending on whether the container has setpriv (Debian/Ubuntu → /app) or falls back to su -l (Alpine/BusyBox → /home/{sandbox_user}). The old explicit cd made both paths consistent.

Prompt for agents
The build_priv_drop_cmd function in src/benchflow/_sandbox.py has two privilege-drop paths: setpriv (primary) and su -l (fallback). After removing the explicit cd /home/{sandbox_user} from the inner command, the two paths produce different working directories:

- setpriv: inherits cwd from ContainerTransport (e.g. /app)
- su -l: changes to /home/{sandbox_user} because -l simulates a login

To make both paths consistent, either:
1. Change su -l to su (without -l) so it doesn't change directory, OR
2. Add an explicit cd to agent_cwd in the inner command for the su -l path only, OR
3. Restore the cd but change it to cd to the workspace (agent_cwd) instead of home

The intent of the change was to let ContainerTransport control the working directory, but the su -l fallback defeats this by overriding the cwd.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

# Conflicts:
#	src/benchflow/_acp_run.py
#	src/benchflow/_agent_env.py
xdotli added a commit that referenced this pull request Apr 25, 2026
Brings 126 ruff errors → 0 so CI's lint check goes green and unblocks
the 5 PRs targeting dev-0.3 (#176, #180, #181, #182, #191) that were
landing on top of pre-existing repo lint debt.

What changed:
1. Auto-fixes via `ruff check --fix --unsafe-fixes`:
   - 40 F401 unused-imports across src/, tests/, examples/
   - 8 I001 unsorted-imports
   - 6 UP037 quoted-annotations modernized
   - Other auto-fixable rules

2. Hand fixes:
   - src/benchflow/__init__.py: removed `Trial` from the `from harbor`
     re-export block (it was shadowed by `from benchflow.trial import Trial`
     at line 65, which is the canonical public Trial). Added
     `trial_config_from_yaml` to __all__.
   - src/benchflow/process.py: 3x `raise ConnectionError(...) from e` for
     B904 (errors raised inside except clauses).
   - src/benchflow/mcp/reviewer_server.py: same B904 fix for fastmcp
     ImportError reraise.
   - tests/test_skill_eval.py: raw string for `pytest.raises(match=...)`
     pattern (RUF043).
   - 3 files: replaced `×` (Unicode multiplication sign) in comments and
     f-strings with `x` (latin x) to clear RUF001/RUF003.

3. Per-file ignores added to pyproject.toml `[tool.ruff.lint.per-file-ignores]`:
   - `experiments/*.py` and `tests/conformance/*.py` ignore E402 — these
     are standalone scripts that legitimately set sys.path before importing.
   - `src/benchflow/runtime.py` ignores F821 — uses forward references
     resolved by `from __future__ import annotations`; explicit
     TYPE_CHECKING imports would force eager loads.

No code behavior changes. 580 tests pass; the 8 pre-existing failures
(env-leak between subscription auth tests, Docker compose env, judge
model default mismatch) are unrelated to this PR.
@xdotli xdotli merged commit 871bd21 into benchflow-ai:dev-0.3 Apr 25, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants