Extract shared load_llm_config utility by simonrosenberg · Pull Request #453 · OpenHands/benchmarks

simonrosenberg · 2026-02-26T11:00:48Z

Summary

Extracts the duplicated inline LLM config loading pattern (read file, validate, return LLM) into a shared load_llm_config() helper in benchmarks/utils/llm_config.py
Updates all 7 run_infer.py files to use the shared helper, removing ~5 lines of boilerplate from each

Test plan

Verify imports resolve correctly in CI
Run any benchmark inference to confirm LLM config loading still works

🤖 Generated with Claude Code

Replace duplicated inline LLM config loading pattern across all 7 run_infer files with a shared load_llm_config() helper. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

simonrosenberg · 2026-02-26T11:02:02Z

@OpenHands resolve conflicts

openhands-ai · 2026-02-26T11:02:22Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

all-hands-bot

🟡 Acceptable - Good refactoring that eliminates real duplication, but missing test coverage for a critical shared utility.

benchmarks/utils/llm_config.py

Resolve conflicts in run_infer.py files by keeping Tool import from main while preserving the removal of LLM import (since it's now loaded via the shared load_llm_config helper). Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-02-26T11:07:11Z

Summary

I resolved the merge conflicts in PR #453 as requested.

What was done:

Fetched and merged main into the llm-config-util branch
Resolved conflicts in 5 files where the SDK imports conflicted:
- benchmarks/commit0/run_infer.py
- benchmarks/multiswebench/run_infer.py
- benchmarks/openagentsafety/run_infer.py
- benchmarks/swebench/run_infer.py
- benchmarks/swtbench/run_infer.py
Conflict resolution strategy:
- main added Tool to the imports (new feature)
- llm-config-util removed LLM from imports (since it's loaded via load_llm_config helper)
- Solution: Keep Tool import from main, preserve removal of LLM import
Verified changes with pre-commit checks (ruff format, ruff lint, pycodestyle, pyright) - all passed
Pushed the merge commit to update the PR

Checklist:

✅ Conflicts resolved
✅ Pre-commit checks pass
✅ Changes pushed to origin/llm-config-util
✅ No extraneous changes (only conflict resolution + auto-fixes from ruff)

Address review feedback requesting test coverage for the shared LLM config loading function used by all 7 benchmarks. Tests verify: - Valid JSON config files load correctly (minimal and full configs) - Missing files raise ValueError with appropriate message - Malformed JSON raises clear validation errors - Edge cases (empty file, permissions issues, unicode) are handled 19 test cases covering the critical code paths. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

🟡 Acceptable - Solid refactoring that eliminates real duplication. Simple, focused function with good test coverage. One critical encoding bug must be fixed before merging.

benchmarks/utils/llm_config.py

Co-authored-by: OpenHands Bot <contact@all-hands.dev>

all-hands-bot

✅ Excellent refactoring. Previous issues (encoding, test coverage) are fully resolved. The utility is simple, well-tested with real code paths, and eliminates actual duplication across 7 files. This is what good refactoring looks like - no issues found.

Resolve conflicts in all 7 run_infer files and llm_config.py by taking main's version, since PR OpenHands#453 landed the same changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Extract shared load_llm_config utility

a7f4d64

Replace duplicated inline LLM config loading pattern across all 7 run_infer files with a shared load_llm_config() helper. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

simonrosenberg requested a review from juanmichelini February 26, 2026 11:01

simonrosenberg self-assigned this Feb 26, 2026

all-hands-bot reviewed Feb 26, 2026

View reviewed changes

benchmarks/utils/llm_config.py Show resolved Hide resolved

Merge branch 'main' into llm-config-util

7e2336c

Resolve conflicts in run_infer.py files by keeping Tool import from main while preserving the removal of LLM import (since it's now loaded via the shared load_llm_config helper). Co-authored-by: openhands <openhands@all-hands.dev>

simonrosenberg requested a review from all-hands-bot February 26, 2026 11:28

all-hands-bot reviewed Feb 26, 2026

View reviewed changes

benchmarks/utils/llm_config.py Outdated Show resolved Hide resolved

Update benchmarks/utils/llm_config.py

e117ec8

Co-authored-by: OpenHands Bot <contact@all-hands.dev>

simonrosenberg requested a review from all-hands-bot February 26, 2026 11:34

all-hands-bot approved these changes Feb 26, 2026

View reviewed changes

simonrosenberg merged commit ea3ea50 into main Feb 26, 2026
3 checks passed

simonrosenberg deleted the llm-config-util branch February 26, 2026 11:52

simonrosenberg mentioned this pull request Feb 26, 2026

NeMo Evaluator Integration #455

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract shared load_llm_config utility#453

Extract shared load_llm_config utility#453
simonrosenberg merged 4 commits intomainfrom
llm-config-util

simonrosenberg commented Feb 26, 2026

Uh oh!

simonrosenberg commented Feb 26, 2026

Uh oh!

openhands-ai bot commented Feb 26, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

openhands-ai bot commented Feb 26, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

simonrosenberg commented Feb 26, 2026

Summary

Test plan

Uh oh!

simonrosenberg commented Feb 26, 2026

Uh oh!

openhands-ai bot commented Feb 26, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openhands-ai bot commented Feb 26, 2026

Summary

What was done:

Checklist:

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants