Skip to content

Add gpt5 benchmark tool preset support#685

Merged
enyst merged 2 commits intomainfrom
fix/gpt5-tool-preset
Apr 21, 2026
Merged

Add gpt5 benchmark tool preset support#685
enyst merged 2 commits intomainfrom
fix/gpt5-tool-preset

Conversation

@enyst
Copy link
Copy Markdown
Collaborator

@enyst enyst commented Apr 21, 2026

Summary

  • add gpt5 to the benchmark tool preset enum and common CLI choices
  • route benchmark tool-preset dispatch to the existing GPT-5 preset implementation
  • add focused coverage for parser acceptance and preset dispatch helpers

Why

The GPT-5 preset already exists in software-agent-sdk here:

This PR adds the benchmark-side support needed to accept --tool-preset gpt5 so we can run evals with that preset.

Testing

  • uv run pytest tests/test_tool_presets.py -q
  • uv run pre-commit run --files benchmarks/utils/args_parser.py benchmarks/utils/models.py benchmarks/swebench/run_infer.py benchmarks/swebenchmultilingual/run_infer.py benchmarks/hybridgym_funclocalize/run_infer.py benchmarks/hybridgym_depsearch/run_infer.py benchmarks/hybridgym_funcgen/run_infer.py benchmarks/hybridgym_issuelocalize/run_infer.py tests/test_tool_presets.py

This PR was created by an AI assistant (OpenHands) on behalf of the user.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-structured additive change that follows existing patterns. Tests are appropriate and focused.

Co-authored-by: openhands <openhands@all-hands.dev>
@enyst enyst merged commit fac87d1 into main Apr 21, 2026
2 checks passed
@enyst enyst deleted the fix/gpt5-tool-preset branch April 21, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants