Skip to content

Extract shared load_llm_config utility#453

Merged
simonrosenberg merged 4 commits intomainfrom
llm-config-util
Feb 26, 2026
Merged

Extract shared load_llm_config utility#453
simonrosenberg merged 4 commits intomainfrom
llm-config-util

Conversation

@simonrosenberg
Copy link
Collaborator

Summary

  • Extracts the duplicated inline LLM config loading pattern (read file, validate, return LLM) into a shared load_llm_config() helper in benchmarks/utils/llm_config.py
  • Updates all 7 run_infer.py files to use the shared helper, removing ~5 lines of boilerplate from each

Test plan

  • Verify imports resolve correctly in CI
  • Run any benchmark inference to confirm LLM config loading still works

🤖 Generated with Claude Code

Replace duplicated inline LLM config loading pattern across all 7
run_infer files with a shared load_llm_config() helper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simonrosenberg
Copy link
Collaborator Author

@OpenHands resolve conflicts

@simonrosenberg simonrosenberg self-assigned this Feb 26, 2026
@openhands-ai
Copy link

openhands-ai bot commented Feb 26, 2026

I'm on it! simonrosenberg can track my progress at all-hands.dev

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Good refactoring that eliminates real duplication, but missing test coverage for a critical shared utility.

Resolve conflicts in run_infer.py files by keeping Tool import from main
while preserving the removal of LLM import (since it's now loaded via
the shared load_llm_config helper).

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Feb 26, 2026

Summary

I resolved the merge conflicts in PR #453 as requested.

What was done:

  1. Fetched and merged main into the llm-config-util branch

  2. Resolved conflicts in 5 files where the SDK imports conflicted:

    • benchmarks/commit0/run_infer.py
    • benchmarks/multiswebench/run_infer.py
    • benchmarks/openagentsafety/run_infer.py
    • benchmarks/swebench/run_infer.py
    • benchmarks/swtbench/run_infer.py
  3. Conflict resolution strategy:

    • main added Tool to the imports (new feature)
    • llm-config-util removed LLM from imports (since it's loaded via load_llm_config helper)
    • Solution: Keep Tool import from main, preserve removal of LLM import
  4. Verified changes with pre-commit checks (ruff format, ruff lint, pycodestyle, pyright) - all passed

  5. Pushed the merge commit to update the PR

Checklist:

  • ✅ Conflicts resolved
  • ✅ Pre-commit checks pass
  • ✅ Changes pushed to origin/llm-config-util
  • ✅ No extraneous changes (only conflict resolution + auto-fixes from ruff)

Address review feedback requesting test coverage for the shared
LLM config loading function used by all 7 benchmarks.

Tests verify:
- Valid JSON config files load correctly (minimal and full configs)
- Missing files raise ValueError with appropriate message
- Malformed JSON raises clear validation errors
- Edge cases (empty file, permissions issues, unicode) are handled

19 test cases covering the critical code paths.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Solid refactoring that eliminates real duplication. Simple, focused function with good test coverage. One critical encoding bug must be fixed before merging.

Co-authored-by: OpenHands Bot <contact@all-hands.dev>
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent refactoring. Previous issues (encoding, test coverage) are fully resolved. The utility is simple, well-tested with real code paths, and eliminates actual duplication across 7 files. This is what good refactoring looks like - no issues found.

@simonrosenberg simonrosenberg merged commit ea3ea50 into main Feb 26, 2026
3 checks passed
@simonrosenberg simonrosenberg deleted the llm-config-util branch February 26, 2026 11:52
simonrosenberg added a commit to e-dobrowolska/benchmarks that referenced this pull request Feb 26, 2026
Resolve conflicts in all 7 run_infer files and llm_config.py by
taking main's version, since PR OpenHands#453 landed the same changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants