🤖 bench: switch Terminal Bench GPT 5.2 to GPT 5.4 by ibetitsmike · Pull Request #2824 · coder/mux

ibetitsmike · 2026-03-06T08:43:59Z

Summary

switch the Terminal Bench workflow defaults/examples from openai/gpt-5.2 to openai/gpt-5.4
add GPT-5.4 leaderboard metadata while preserving the GPT-5.2 mapping for mixed or historical artifacts

Validation

make static-check
python3 -m py_compile benchmarks/terminal_bench/prepare_leaderboard_submission.py
targeted python3 verification that workflow defaults now reference GPT 5.4 and metadata preserves both GPT 5.2 and GPT 5.4 entries

Generated with mux • Model: openai:gpt-5.4 • Thinking: xhigh • Cost: $0.36

Update the Terminal Bench workflows and leaderboard submission metadata so benchmark runs target OpenAI GPT 5.4 instead of GPT 5.2. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$0.36`_

ibetitsmike · 2026-03-06T08:44:19Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 60bcf0bc01

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Keep the GPT-5.2 leaderboard metadata entry alongside the new GPT-5.4 bench target so mixed or historical artifact sets retain canonical naming. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$0.36`_

ibetitsmike · 2026-03-06T08:49:29Z

@codex review

chatgpt-codex-connector · 2026-03-06T08:51:44Z

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector Bot reviewed Mar 6, 2026

View reviewed changes

Comment thread benchmarks/terminal_bench/prepare_leaderboard_submission.py

ibetitsmike added this pull request to the merge queue Mar 6, 2026

Merged via the queue into main with commit c4f5cff Mar 6, 2026
22 of 23 checks passed

ibetitsmike deleted the mike/terminal-bench-gpt-5-4 branch March 6, 2026 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🤖 bench: switch Terminal Bench GPT 5.2 to GPT 5.4#2824

🤖 bench: switch Terminal Bench GPT 5.2 to GPT 5.4#2824
ibetitsmike merged 2 commits into
mainfrom
mike/terminal-bench-gpt-5-4

ibetitsmike commented Mar 6, 2026 •

edited

Loading

Uh oh!

ibetitsmike commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

ibetitsmike commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ibetitsmike commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

ibetitsmike commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ibetitsmike commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ibetitsmike commented Mar 6, 2026 •

edited

Loading