[codex] Add Harbor covered benchmark mapping by neubig · Pull Request #728 · OpenHands/benchmarks

neubig · 2026-05-28T15:29:03Z

Summary

add a shared mapping from covered OpenHands benchmark module names to Harbor dataset names
wire Terminal-Bench and SkillsBench config defaults through the shared mapping
add tests for mapping coverage, normalization, uncovered benchmarks, and existing wrapper defaults

Stacked on #727.
Closes #720.

Validation

python -m py_compile benchmarks/utils/harbor_compat.py benchmarks/terminalbench/config.py benchmarks/skillsbench/config.py
uv run pytest tests/test_harbor_compat.py tests/test_terminalbench.py tests/test_skillsbench_run_infer.py
uv run ruff check benchmarks/utils/harbor.py benchmarks/utils/harbor_compat.py benchmarks/terminalbench/run_infer.py benchmarks/skillsbench/run_infer.py tests/test_harbor_compat.py

neubig · 2026-05-28T15:44:54Z

@OpenHands /codereview

openhands-ai · 2026-05-28T15:46:27Z

I'm on it! neubig can track my progress at all-hands.dev

neubig · 2026-05-28T16:09:51Z

@OpenHands /codereview

Validation context for the re-review:

PR CI is green: pre-commit and tests succeeded on commit 538a073.
Local validation passed: uv run ruff check ..., uv run pyright benchmarks/utils/harbor.py benchmarks/utils/harbor_compat.py tests/test_harbor_compat.py, and uv run pytest tests/test_harbor_compat.py tests/test_terminalbench.py tests/test_skillsbench_run_infer.py.
Terminal-Bench smoke using the stacked benchmarks branch succeeded at the GitHub Actions deployment level: OpenHands/evaluation run 26586388116 (run-infer.yml, benchmark=terminalbench, eval_limit=1, benchmarks_branch=codex/harbor-covered-wrapper-map).

Note: the SDK run-eval.yml dispatcher accepted terminalbench, but the downstream eval-job.yml prerequisite switch rejected it before inference; the direct run-infer.yml path is the valid Terminal-Bench smoke path for now.

openhands-ai · 2026-05-28T16:10:48Z

I'm on it! neubig can track my progress at all-hands.dev

neubig · 2026-05-30T21:54:56Z

After #727 was merged and the stacked base branch was deleted, this PR could not be reopened or retargeted by GitHub (state cannot be changed. The codex/harbor-covered-wrapper-map branch was force-pushed or recreated).

I created replacement PR #731 on top of main with the rebased #728 changes and posted the CI smoke validation results there: #731 (comment)

This comment was created by an AI agent (OpenHands) on behalf of the user.

neubig force-pushed the codex/shared-harbor-runner branch from cc30da7 to 057a3aa Compare May 28, 2026 15:33

Add Harbor covered benchmark mapping

538a073

neubig force-pushed the codex/harbor-covered-wrapper-map branch from 2e8c35c to 538a073 Compare May 28, 2026 15:34

neubig marked this pull request as ready for review May 28, 2026 15:34

neubig requested review from all-hands-bot and openhands-agent May 28, 2026 15:37

neubig added the review-this label May 28, 2026

neubig mentioned this pull request May 28, 2026

[codex] Extract shared Harbor benchmark runner #727

Merged

juanmichelini mentioned this pull request May 28, 2026

Copy command should copy sdk branch ref OpenHands/eval-monitor#180

Open

neubig deleted the branch codex/shared-harbor-runner May 30, 2026 20:47

neubig closed this May 30, 2026

neubig mentioned this pull request May 30, 2026

[codex] Add Harbor covered benchmark mapping #731

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Add Harbor covered benchmark mapping#728

[codex] Add Harbor covered benchmark mapping#728
neubig wants to merge 1 commit into
codex/shared-harbor-runnerfrom
codex/harbor-covered-wrapper-map

neubig commented May 28, 2026

Uh oh!

neubig commented May 28, 2026

Uh oh!

openhands-ai Bot commented May 28, 2026

Uh oh!

neubig commented May 28, 2026

Uh oh!

openhands-ai Bot commented May 28, 2026

Uh oh!

neubig commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neubig commented May 28, 2026

Summary

Validation

Uh oh!

neubig commented May 28, 2026

Uh oh!

openhands-ai Bot commented May 28, 2026

Uh oh!

neubig commented May 28, 2026

Uh oh!

openhands-ai Bot commented May 28, 2026

Uh oh!

neubig commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant