Update run_id: remove "eval_" prefix by simonrosenberg · Pull Request #389 · OpenHands/benchmarks

simonrosenberg · 2026-02-02T13:51:51Z

run_id had been modified previously to be directly equal to the github action run id without the eval_ prefix.
Here there are fallbacks / bits of code that use the previous version of run_id.

juanmichelini

LGTM

simonrosenberg · 2026-02-03T18:35:04Z

@OpenHands please resolve conflicts

openhands-ai · 2026-02-03T18:35:13Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

Resolved merge conflicts in: - benchmarks/swebenchmultimodal/eval_infer.py: Keep run_id without 'eval_' prefix - benchmarks/swtbench/eval_infer.py: Keep run_id without 'eval_' prefix, use MODEL_NAME_OR_PATH Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-02-03T18:38:19Z

I've successfully resolved the merge conflicts in PR #389 and pushed the changes.

Summary of Changes

Merged main into fix-run-id branch and resolved conflicts in two files:

benchmarks/swebenchmultimodal/eval_infer.py:
- Changed run_id default from f"eval_{predictions_path.stem}" to predictions_path.stem (removing the "eval_" prefix)
benchmarks/swtbench/eval_infer.py:
- Changed run_id from f"eval_{predictions_path.stem}" to predictions_path.stem in the command arguments
- Changed run_id from f"eval_{output_file.stem}" to output_file.stem when constructing the report file path

These changes are consistent with the PR's purpose of removing the "eval_" prefix from run_id values throughout the codebase. The changes have been pushed to the fix-run-id branch.

View full conversation

openhands-ai · 2026-02-03T18:38:50Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Pre-commit checks
There are merge conflicts

If you'd like me to help, just leave a comment, like

@OpenHands please fix the merge conflicts on PR #389 at branch `fix-run-id`

or

@OpenHands please fix the failing actions on PR #389 at branch `fix-run-id`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

simonrosenberg · 2026-02-03T18:39:39Z

@OpenHands fix precommit hooks
Also make sure the PR has a clean diff: it should NOT how vendor changes. The diif should ONLY show removal of eval_

openhands-ai · 2026-02-03T18:39:48Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-02-03T18:42:54Z

I've successfully fixed the PR #389 by:

Checklist

✅ Fixed precommit hooks: All precommit checks (Ruff format, Ruff lint, PEP8 style check, Pyright type check) now pass
✅ Cleaned up the diff: Removed vendor changes (uv.lock and vendor/software-agent-sdk) that were accidentally included
✅ PR now only shows removal of eval_ prefix: The diff is now clean and only contains the intended changes

Summary of Changes

The PR diff now only shows:

benchmarks/swebenchmultimodal/eval_infer.py: Changed run_id default from f"eval_{predictions_path.stem}" to predictions_path.stem
benchmarks/swtbench/eval_infer.py: Changed run_id from f"eval_{predictions_path.stem}" to predictions_path.stem in two places

The changes have been pushed to the fix-run-id branch.

View full conversation

PR OpenHands#389 intentionally removed the eval_ prefix from run_ids in swtbench and swebenchmultimodal. The NeMo PR had re-added it. Revert to the current convention (bare stem, no prefix).

PR #389 intentionally removed the eval_ prefix from run_ids in swtbench and swebenchmultimodal. The NeMo PR had re-added it. Revert to the current convention (bare stem, no prefix).

* update run_id * Revert vendor changes to match main Co-authored-by: openhands <openhands@all-hands.dev> --------- Co-authored-by: openhands <openhands@all-hands.dev>

update run_id

70d4049

simonrosenberg requested a review from juanmichelini February 2, 2026 14:58

Merge branch 'main' into fix-run-id

adceb52

juanmichelini approved these changes Feb 3, 2026

View reviewed changes

Revert vendor changes to match main

472a443

Co-authored-by: openhands <openhands@all-hands.dev>

simonrosenberg merged commit 33be6f3 into main Feb 3, 2026
2 checks passed

simonrosenberg deleted the fix-run-id branch February 3, 2026 21:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update run_id: remove "eval_" prefix #389

Update run_id: remove "eval_" prefix #389
simonrosenberg merged 4 commits intomainfrom
fix-run-id

simonrosenberg commented Feb 2, 2026 •

edited

Loading

Uh oh!

juanmichelini left a comment

Uh oh!

simonrosenberg commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Uh oh!

simonrosenberg commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

simonrosenberg commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juanmichelini left a comment

Choose a reason for hiding this comment

Uh oh!

simonrosenberg commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Summary of Changes

Uh oh!

openhands-ai bot commented Feb 3, 2026

Uh oh!

simonrosenberg commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Uh oh!

openhands-ai bot commented Feb 3, 2026

Checklist

Summary of Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

simonrosenberg commented Feb 2, 2026 •

edited

Loading