Conversation
API breakage checks (Griffe)Result: Passed |
Agent server REST API breakage checks (OpenAPI)Result: Failed Log excerpt (first 1000 characters) |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Simple config addition following existing pattern, but PR description and implementation are mismatched.
Linus-Style Take:
This is a straightforward dictionary entry addition—not much to screw up here. The pattern is fine, the test exists, good. But your PR description claims "Temperature: 0.0" and the code doesn't have it. That's sloppy. Either add the field or fix the docs. Pick one.
Since this touches eval config and could affect benchmark behavior if temperature defaults aren't what you expect, flagging for maintainer review before approval.
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Simple config addition following existing pattern.
Linus-Style Take:
This is a straightforward dictionary entry addition—not much to screw up here. The pattern is correct, the test exists, good.
Since this touches eval config and could affect benchmark behavior, leaving as COMMENT (not APPROVE) per eval-risk policy. A human maintainer should verify before merging.
Verdict: ✅ Worth merging after verification of eval impact.
|
We got this report result 🧪 Integration Tests ResultsOverall Success Rate: 94.1% 📊 Summary
📋 Detailed Resultslitellm_proxy_openai_gpt_5.4
Skipped Tests:
Failed Tests:
|
|
@OpenHands GPT-5.4 is a reasoning model, please fix it |
|
I'm on it! juanmichelini can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Fixed: marked gpt-5.4 as a reasoning model by setting in .github/run-eval/resolve_model_config.py (and updated the corresponding test). Commit: 897362c |
|
Fix applied: gpt-5.4 is now treated as a reasoning model by explicitly setting reasoning_effort="high" in .github/run-eval/resolve_model_config.py (and updating the corresponding test). Commit: 897362c |
Final summary (new work since the last summary)What I changed to address the review comment (“GPT-5.4 is a reasoning model, please fix it”):
Verification performed:
Conciseness check:
Completion checklist:
Delivery:
|
Summary
Adds the
gpt-5.4model to resolve_model_config.py.Changes
Configuration
Integration Test Results
Tests will run in CI.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:286a9db-pythonRun
All tags pushed for this build
About Multi-Architecture Support
286a9db-python) is a multi-arch manifest supporting both amd64 and arm64286a9db-python-amd64) are also available if needed