Add gpt-5.4 to resolve_model_config.py by juanmichelini · Pull Request #2374 · OpenHands/software-agent-sdk

juanmichelini · 2026-03-10T00:33:33Z

Summary

Adds the gpt-5.4 model to resolve_model_config.py.

Changes

Added gpt-5.4 to MODELS dictionary
Added test_gpt_5_4_config() test function
Added gpt-5.4 to GPT variants in model_prompt_spec.py

Configuration

Model ID: gpt-5.4
Provider: OpenAI

Integration Test Results

Tests will run in CI.

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:286a9db-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-286a9db-python \
  ghcr.io/openhands/agent-server:286a9db-python

All tags pushed for this build

ghcr.io/openhands/agent-server:286a9db-golang-amd64
ghcr.io/openhands/agent-server:286a9db-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:286a9db-golang-arm64
ghcr.io/openhands/agent-server:286a9db-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:286a9db-java-amd64
ghcr.io/openhands/agent-server:286a9db-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:286a9db-java-arm64
ghcr.io/openhands/agent-server:286a9db-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:286a9db-python-amd64
ghcr.io/openhands/agent-server:286a9db-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:286a9db-python-arm64
ghcr.io/openhands/agent-server:286a9db-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:286a9db-golang
ghcr.io/openhands/agent-server:286a9db-java
ghcr.io/openhands/agent-server:286a9db-python

About Multi-Architecture Support

Each variant tag (e.g., 286a9db-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 286a9db-python-amd64) are also available if needed

github-actions · 2026-03-10T00:33:57Z

API breakage checks (Griffe)

Result: Passed

Action log

github-actions · 2026-03-10T00:34:15Z

Agent server REST API breakage checks (OpenAPI)

Result: Failed

Log excerpt (first 1000 characters)

{"asctime": "2026-03-10 21:54:30,136", "levelname": "WARNING", "name": "openhands.agent_server.config", "filename": "config.py", "lineno": 173, "message": "\u26a0\ufe0f OH_SECRET_KEY was not defined. Secrets will not be persisted between restarts."}
::error title=openhands-agent-server REST API::Breaking REST API change detected without MINOR version bump (1.13.0 -> 1.13.0).

Breaking REST API changes detected compared to baseline release:
- the 'file' request property type/format changed from 'string'/'' to 'string'/'binary'
/home/runner/work/software-agent-sdk/software-agent-sdk/.venv/lib/python3.13/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:66: DeprecationWarning: There is no current event loop
  loop = asyncio.get_event_loop()

Action log

github-actions · 2026-03-10T00:35:57Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/llm/utils
model_prompt_spec.py	38	2	94%	55, 75
TOTAL	19913	5799	70%

all-hands-bot

🟡 Acceptable - Simple config addition following existing pattern, but PR description and implementation are mismatched.

Linus-Style Take:
This is a straightforward dictionary entry addition—not much to screw up here. The pattern is fine, the test exists, good. But your PR description claims "Temperature: 0.0" and the code doesn't have it. That's sloppy. Either add the field or fix the docs. Pick one.

Since this touches eval config and could affect benchmark behavior if temperature defaults aren't what you expect, flagging for maintainer review before approval.

.github/run-eval/resolve_model_config.py

tests/github_workflows/test_resolve_model_config.py

all-hands-bot

🟡 Acceptable - Simple config addition following existing pattern.

Linus-Style Take:
This is a straightforward dictionary entry addition—not much to screw up here. The pattern is correct, the test exists, good.

Since this touches eval config and could affect benchmark behavior, leaving as COMMENT (not APPROVE) per eval-risk policy. A human maintainer should verify before merging.

Verdict: ✅ Worth merging after verification of eval impact.

juanmichelini · 2026-03-10T21:47:48Z

We got this report result

🧪 Integration Tests Results

Overall Success Rate: 94.1%
Total Cost: $4.05
Models Tested: 1
Timestamp: 2026-03-10 15:45:49 UTC

📊 Summary

Model	Overall	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_openai_gpt_5.4	94.1%	16/17	1	18	$4.05	4,875,392

📋 Detailed Results

litellm_proxy_openai_gpt_5.4

Success Rate: 94.1% (16/17)
Total Cost: $4.05
Token Usage: prompt: 4,798,436, completion: 76,956, cache_read: 3,960,320, reasoning: 39,049
Run Suffix: litellm_proxy_openai_gpt_5.4_62ae59e_gpt_5_4_run_N18_20260310_153648
Skipped Tests: 1

Skipped Tests:

c01_thinking_block_condenser: Model litellm_proxy/openai/ch does not support extended thinking or reasoning effort

Failed Tests:

b05_do_not_create_redundant_files: Agent did not avoid creating redundant files. Judge reasoning: The agent successfully created the requested examples/tutorial/smolvla/train_smolvla_example.py file with high quality - it properly implements SmolVLA fine-tuning using TrainPipelineConfig, includes appropriate imports, follows the existing tutorial style, and passes syntax validation. However, the agent violated the explicit evaluation criterion by creating an unrequested AGENTS.md file containing repository-specific notes. The evaluation criteria clearly states: "Avoid creating any additional files that were not explicitly requested. Only one README.md file is acceptable if it pertains to the new training script." The AGENTS.md file does not pertain to the new training script itself and was not requested by the user. While the file's content might be moderately useful as documentation for future agents, its creation directly violates the stated constraint against redundant/unrequested files. The main implementation is correct, but the extraneous file creation is a clear breach of requirements. (confidence=0.92) (Cost: $0.64)

juanmichelini · 2026-03-10T21:49:01Z

@OpenHands GPT-5.4 is a reasoning model, please fix it

openhands-ai · 2026-03-10T21:49:26Z

I'm on it! juanmichelini can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>

juanmichelini · 2026-03-10T21:54:04Z

Fixed: marked gpt-5.4 as a reasoning model by setting in .github/run-eval/resolve_model_config.py (and updated the corresponding test).

Commit: 897362c

juanmichelini · 2026-03-10T21:54:10Z

Fix applied: gpt-5.4 is now treated as a reasoning model by explicitly setting reasoning_effort="high" in .github/run-eval/resolve_model_config.py (and updating the corresponding test).

Commit: 897362c

openhands-ai · 2026-03-10T21:55:03Z

Final summary (new work since the last summary)

What I changed to address the review comment (“GPT-5.4 is a reasoning model, please fix it”):

Updated .github/run-eval/resolve_model_config.py to treat gpt-5.4 as a reasoning model by adding:
- reasoning_effort: "high" under the gpt-5.4 entry’s llm_config.
Updated tests/github_workflows/test_resolve_model_config.py (test_gpt_5_4_config) to assert:
- model["llm_config"]["reasoning_effort"] == "high".

Verification performed:

Ran pre-commit on the two modified files: passed.
Ran the targeted pytest (test_gpt_5_4_config): passed.

Conciseness check:

Only the two necessary files were modified; no unrelated refactors or formatting changes beyond what was required for the new field/test assertion.

Completion checklist:

Implemented the requested “reasoning model” fix for gpt-5.4
Updated/extended tests accordingly
Ran formatting/lint/type checks on touched files
Ran a relevant pytest
Pushed the changes to the PR branch

Delivery:

Committed and pushed to add-gpt-5.4 (commit 897362c).
Left a PR comment pointing to the fix/commit.

Add gpt-5.4 to resolve_model_config.py

bcb0a11

Fix formatting

3cc370f

Merge branch 'main' into add-gpt-5.4

62ae59e

juanmichelini marked this pull request as ready for review March 10, 2026 13:49

juanmichelini marked this pull request as draft March 10, 2026 13:50

juanmichelini requested a review from all-hands-bot March 10, 2026 14:44

all-hands-bot reviewed Mar 10, 2026

View reviewed changes

.github/run-eval/resolve_model_config.py Outdated Show resolved Hide resolved

tests/github_workflows/test_resolve_model_config.py Show resolved Hide resolved

juanmichelini mentioned this pull request Mar 10, 2026

Add gpt-5.4-codex to resolve_model_config.py #2376

Draft

juanmichelini requested a review from all-hands-bot March 10, 2026 15:05

all-hands-bot reviewed Mar 10, 2026

View reviewed changes

Treat gpt-5.4 as reasoning model

897362c

Co-authored-by: openhands <openhands@all-hands.dev>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gpt-5.4 to resolve_model_config.py#2374

Add gpt-5.4 to resolve_model_config.py#2374
juanmichelini wants to merge 4 commits intomainfrom
add-gpt-5.4

juanmichelini commented Mar 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Uh oh!

juanmichelini commented Mar 10, 2026

Uh oh!

juanmichelini commented Mar 10, 2026

Uh oh!

openhands-ai bot commented Mar 10, 2026

Uh oh!

juanmichelini commented Mar 10, 2026

Uh oh!

juanmichelini commented Mar 10, 2026

Uh oh!

openhands-ai bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

juanmichelini commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Configuration

Integration Test Results

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent server REST API breakage checks (OpenAPI)

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

juanmichelini commented Mar 10, 2026

🧪 Integration Tests Results

📊 Summary

📋 Detailed Results

litellm_proxy_openai_gpt_5.4

Uh oh!

juanmichelini commented Mar 10, 2026

Uh oh!

openhands-ai bot commented Mar 10, 2026

Uh oh!

juanmichelini commented Mar 10, 2026

Uh oh!

juanmichelini commented Mar 10, 2026

Uh oh!

openhands-ai bot commented Mar 10, 2026

Final summary (new work since the last summary)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

juanmichelini commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading