🤖 bench: switch Terminal Bench GPT 5.2 to GPT 5.4#2824
Conversation
Update the Terminal Bench workflows and leaderboard submission metadata so benchmark runs target OpenAI GPT 5.4 instead of GPT 5.2. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$0.36`_ <!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=0.36 -->
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 60bcf0bc01
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Keep the GPT-5.2 leaderboard metadata entry alongside the new GPT-5.4 bench target so mixed or historical artifact sets retain canonical naming. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$0.36`_ <!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=0.36 -->
|
@codex review |
|
Codex Review: Didn't find any major issues. Can't wait for the next one! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
openai/gpt-5.2toopenai/gpt-5.4Validation
make static-checkpython3 -m py_compile benchmarks/terminal_bench/prepare_leaderboard_submission.pypython3verification that workflow defaults now reference GPT 5.4 and metadata preserves both GPT 5.2 and GPT 5.4 entriesGenerated with
mux• Model:openai:gpt-5.4• Thinking:xhigh• Cost:$0.36