Skip to content

🤖 bench: switch Terminal Bench GPT 5.2 to GPT 5.4#2824

Merged
ibetitsmike merged 2 commits into
mainfrom
mike/terminal-bench-gpt-5-4
Mar 6, 2026
Merged

🤖 bench: switch Terminal Bench GPT 5.2 to GPT 5.4#2824
ibetitsmike merged 2 commits into
mainfrom
mike/terminal-bench-gpt-5-4

Conversation

@ibetitsmike
Copy link
Copy Markdown
Contributor

@ibetitsmike ibetitsmike commented Mar 6, 2026

Summary

  • switch the Terminal Bench workflow defaults/examples from openai/gpt-5.2 to openai/gpt-5.4
  • add GPT-5.4 leaderboard metadata while preserving the GPT-5.2 mapping for mixed or historical artifacts

Validation

  • make static-check
  • python3 -m py_compile benchmarks/terminal_bench/prepare_leaderboard_submission.py
  • targeted python3 verification that workflow defaults now reference GPT 5.4 and metadata preserves both GPT 5.2 and GPT 5.4 entries

Generated with mux • Model: openai:gpt-5.4 • Thinking: xhigh • Cost: $0.36

Update the Terminal Bench workflows and leaderboard submission metadata
so benchmark runs target OpenAI GPT 5.4 instead of GPT 5.2.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$0.36`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=0.36 -->
@ibetitsmike
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 60bcf0bc01

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread benchmarks/terminal_bench/prepare_leaderboard_submission.py
Keep the GPT-5.2 leaderboard metadata entry alongside the new GPT-5.4
bench target so mixed or historical artifact sets retain canonical naming.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$0.36`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=0.36 -->
@ibetitsmike
Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ibetitsmike ibetitsmike added this pull request to the merge queue Mar 6, 2026
Merged via the queue into main with commit c4f5cff Mar 6, 2026
22 of 23 checks passed
@ibetitsmike ibetitsmike deleted the mike/terminal-bench-gpt-5-4 branch March 6, 2026 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant