Skip to content

Add gpt-5.4 to resolve_model_config.py#2374

Draft
juanmichelini wants to merge 4 commits intomainfrom
add-gpt-5.4
Draft

Add gpt-5.4 to resolve_model_config.py#2374
juanmichelini wants to merge 4 commits intomainfrom
add-gpt-5.4

Conversation

@juanmichelini
Copy link
Collaborator

@juanmichelini juanmichelini commented Mar 10, 2026

Summary

Adds the gpt-5.4 model to resolve_model_config.py.

Changes

  • Added gpt-5.4 to MODELS dictionary
  • Added test_gpt_5_4_config() test function
  • Added gpt-5.4 to GPT variants in model_prompt_spec.py

Configuration

  • Model ID: gpt-5.4
  • Provider: OpenAI

Integration Test Results

Tests will run in CI.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:286a9db-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-286a9db-python \
  ghcr.io/openhands/agent-server:286a9db-python

All tags pushed for this build

ghcr.io/openhands/agent-server:286a9db-golang-amd64
ghcr.io/openhands/agent-server:286a9db-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:286a9db-golang-arm64
ghcr.io/openhands/agent-server:286a9db-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:286a9db-java-amd64
ghcr.io/openhands/agent-server:286a9db-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:286a9db-java-arm64
ghcr.io/openhands/agent-server:286a9db-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:286a9db-python-amd64
ghcr.io/openhands/agent-server:286a9db-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:286a9db-python-arm64
ghcr.io/openhands/agent-server:286a9db-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:286a9db-golang
ghcr.io/openhands/agent-server:286a9db-java
ghcr.io/openhands/agent-server:286a9db-python

About Multi-Architecture Support

  • Each variant tag (e.g., 286a9db-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 286a9db-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

API breakage checks (Griffe)

Result: Passed

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Failed

Log excerpt (first 1000 characters)
{"asctime": "2026-03-10 21:54:30,136", "levelname": "WARNING", "name": "openhands.agent_server.config", "filename": "config.py", "lineno": 173, "message": "\u26a0\ufe0f OH_SECRET_KEY was not defined. Secrets will not be persisted between restarts."}
::error title=openhands-agent-server REST API::Breaking REST API change detected without MINOR version bump (1.13.0 -> 1.13.0).

Breaking REST API changes detected compared to baseline release:
- the 'file' request property type/format changed from 'string'/'' to 'string'/'binary'
/home/runner/work/software-agent-sdk/software-agent-sdk/.venv/lib/python3.13/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:66: DeprecationWarning: There is no current event loop
  loop = asyncio.get_event_loop()

Action log

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm/utils
   model_prompt_spec.py38294%55, 75
TOTAL19913579970% 

@juanmichelini juanmichelini marked this pull request as ready for review March 10, 2026 13:49
@juanmichelini juanmichelini marked this pull request as draft March 10, 2026 13:50
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Simple config addition following existing pattern, but PR description and implementation are mismatched.

Linus-Style Take:
This is a straightforward dictionary entry addition—not much to screw up here. The pattern is fine, the test exists, good. But your PR description claims "Temperature: 0.0" and the code doesn't have it. That's sloppy. Either add the field or fix the docs. Pick one.

Since this touches eval config and could affect benchmark behavior if temperature defaults aren't what you expect, flagging for maintainer review before approval.

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Simple config addition following existing pattern.

Linus-Style Take:
This is a straightforward dictionary entry addition—not much to screw up here. The pattern is correct, the test exists, good.

Since this touches eval config and could affect benchmark behavior, leaving as COMMENT (not APPROVE) per eval-risk policy. A human maintainer should verify before merging.

Verdict: ✅ Worth merging after verification of eval impact.

@juanmichelini
Copy link
Collaborator Author

We got this report result

🧪 Integration Tests Results

Overall Success Rate: 94.1%
Total Cost: $4.05
Models Tested: 1
Timestamp: 2026-03-10 15:45:49 UTC

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_openai_gpt_5.4 94.1% 16/17 1 18 $4.05 4,875,392

📋 Detailed Results

litellm_proxy_openai_gpt_5.4

  • Success Rate: 94.1% (16/17)
  • Total Cost: $4.05
  • Token Usage: prompt: 4,798,436, completion: 76,956, cache_read: 3,960,320, reasoning: 39,049
  • Run Suffix: litellm_proxy_openai_gpt_5.4_62ae59e_gpt_5_4_run_N18_20260310_153648
  • Skipped Tests: 1

Skipped Tests:

  • c01_thinking_block_condenser: Model litellm_proxy/openai/ch does not support extended thinking or reasoning effort

Failed Tests:

  • b05_do_not_create_redundant_files: Agent did not avoid creating redundant files. Judge reasoning: The agent successfully created the requested examples/tutorial/smolvla/train_smolvla_example.py file with high quality - it properly implements SmolVLA fine-tuning using TrainPipelineConfig, includes appropriate imports, follows the existing tutorial style, and passes syntax validation. However, the agent violated the explicit evaluation criterion by creating an unrequested AGENTS.md file containing repository-specific notes. The evaluation criteria clearly states: "Avoid creating any additional files that were not explicitly requested. Only one README.md file is acceptable if it pertains to the new training script." The AGENTS.md file does not pertain to the new training script itself and was not requested by the user. While the file's content might be moderately useful as documentation for future agents, its creation directly violates the stated constraint against redundant/unrequested files. The main implementation is correct, but the extraneous file creation is a clear breach of requirements. (confidence=0.92) (Cost: $0.64)

@juanmichelini
Copy link
Collaborator Author

@OpenHands GPT-5.4 is a reasoning model, please fix it

@openhands-ai
Copy link

openhands-ai bot commented Mar 10, 2026

I'm on it! juanmichelini can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Collaborator Author

Fixed: marked gpt-5.4 as a reasoning model by setting in .github/run-eval/resolve_model_config.py (and updated the corresponding test).

Commit: 897362c

Copy link
Collaborator Author

Fix applied: gpt-5.4 is now treated as a reasoning model by explicitly setting reasoning_effort="high" in .github/run-eval/resolve_model_config.py (and updating the corresponding test).

Commit: 897362c

@openhands-ai
Copy link

openhands-ai bot commented Mar 10, 2026

Final summary (new work since the last summary)

What I changed to address the review comment (“GPT-5.4 is a reasoning model, please fix it”):

  • Updated .github/run-eval/resolve_model_config.py to treat gpt-5.4 as a reasoning model by adding:
    • reasoning_effort: "high" under the gpt-5.4 entry’s llm_config.
  • Updated tests/github_workflows/test_resolve_model_config.py (test_gpt_5_4_config) to assert:
    • model["llm_config"]["reasoning_effort"] == "high".

Verification performed:

  • Ran pre-commit on the two modified files: passed.
  • Ran the targeted pytest (test_gpt_5_4_config): passed.

Conciseness check:

  • Only the two necessary files were modified; no unrelated refactors or formatting changes beyond what was required for the new field/test assertion.

Completion checklist:

  • Implemented the requested “reasoning model” fix for gpt-5.4
  • Updated/extended tests accordingly
  • Ran formatting/lint/type checks on touched files
  • Ran a relevant pytest
  • Pushed the changes to the PR branch

Delivery:

  • Committed and pushed to add-gpt-5.4 (commit 897362c).
  • Left a PR comment pointing to the fix/commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants