Skip to content

Update run_id: remove "eval_" prefix #389

Merged
simonrosenberg merged 4 commits intomainfrom
fix-run-id
Feb 3, 2026
Merged

Update run_id: remove "eval_" prefix #389
simonrosenberg merged 4 commits intomainfrom
fix-run-id

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Feb 2, 2026

run_id had been modified previously to be directly equal to the github action run id without the eval_ prefix.
Here there are fallbacks / bits of code that use the previous version of run_id.

Copy link
Collaborator

@juanmichelini juanmichelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@simonrosenberg
Copy link
Collaborator Author

@OpenHands please resolve conflicts

@openhands-ai
Copy link

openhands-ai bot commented Feb 3, 2026

I'm on it! simonrosenberg can track my progress at all-hands.dev

Resolved merge conflicts in:
- benchmarks/swebenchmultimodal/eval_infer.py: Keep run_id without 'eval_' prefix
- benchmarks/swtbench/eval_infer.py: Keep run_id without 'eval_' prefix, use MODEL_NAME_OR_PATH

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Feb 3, 2026

I've successfully resolved the merge conflicts in PR #389 and pushed the changes.

Summary of Changes

Merged main into fix-run-id branch and resolved conflicts in two files:

  1. benchmarks/swebenchmultimodal/eval_infer.py:

    • Changed run_id default from f"eval_{predictions_path.stem}" to predictions_path.stem (removing the "eval_" prefix)
  2. benchmarks/swtbench/eval_infer.py:

    • Changed run_id from f"eval_{predictions_path.stem}" to predictions_path.stem in the command arguments
    • Changed run_id from f"eval_{output_file.stem}" to output_file.stem when constructing the report file path

These changes are consistent with the PR's purpose of removing the "eval_" prefix from run_id values throughout the codebase. The changes have been pushed to the fix-run-id branch.

View full conversation

@openhands-ai
Copy link

openhands-ai bot commented Feb 3, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Pre-commit checks
  • There are merge conflicts

If you'd like me to help, just leave a comment, like

@OpenHands please fix the merge conflicts on PR #389 at branch `fix-run-id`

or

@OpenHands please fix the failing actions on PR #389 at branch `fix-run-id`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@simonrosenberg
Copy link
Collaborator Author

@OpenHands fix precommit hooks
Also make sure the PR has a clean diff: it should NOT how vendor changes. The diif should ONLY show removal of eval_

@openhands-ai
Copy link

openhands-ai bot commented Feb 3, 2026

I'm on it! simonrosenberg can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Feb 3, 2026

I've successfully fixed the PR #389 by:

Checklist

  • Fixed precommit hooks: All precommit checks (Ruff format, Ruff lint, PEP8 style check, Pyright type check) now pass
  • Cleaned up the diff: Removed vendor changes (uv.lock and vendor/software-agent-sdk) that were accidentally included
  • PR now only shows removal of eval_ prefix: The diff is now clean and only contains the intended changes

Summary of Changes

The PR diff now only shows:

  1. benchmarks/swebenchmultimodal/eval_infer.py: Changed run_id default from f"eval_{predictions_path.stem}" to predictions_path.stem
  2. benchmarks/swtbench/eval_infer.py: Changed run_id from f"eval_{predictions_path.stem}" to predictions_path.stem in two places

The changes have been pushed to the fix-run-id branch.

View full conversation

@simonrosenberg simonrosenberg merged commit 33be6f3 into main Feb 3, 2026
2 checks passed
@simonrosenberg simonrosenberg deleted the fix-run-id branch February 3, 2026 21:42
simonrosenberg added a commit to e-dobrowolska/benchmarks that referenced this pull request Feb 25, 2026
PR OpenHands#389 intentionally removed the eval_ prefix from run_ids in swtbench
and swebenchmultimodal. The NeMo PR had re-added it. Revert to the
current convention (bare stem, no prefix).
simonrosenberg added a commit that referenced this pull request Feb 26, 2026
PR #389 intentionally removed the eval_ prefix from run_ids in swtbench
and swebenchmultimodal. The NeMo PR had re-added it. Revert to the
current convention (bare stem, no prefix).
KTanmay1 pushed a commit to Ethara-Ai/benchmarks that referenced this pull request Mar 3, 2026
* update run_id

* Revert vendor changes to match main

Co-authored-by: openhands <openhands@all-hands.dev>

---------

Co-authored-by: openhands <openhands@all-hands.dev>
KTanmay1 pushed a commit to Ethara-Ai/benchmarks that referenced this pull request Mar 3, 2026
* update run_id

* Revert vendor changes to match main

Co-authored-by: openhands <openhands@all-hands.dev>

---------

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants