🤖 bench: use GPT-5.5 for tbench by ibetitsmike · Pull Request #3193 · coder/mux

ibetitsmike · 2026-04-25T01:14:59Z

Mux working on behalf of Mike.

Summary

Updates nightly Terminal-Bench defaults to run Opus 4.7 at xhigh thinking and GPT-5.5 at high thinking while dropping the older GPT Codex model from the default matrix. Adds leaderboard metadata for Opus 4.7 and GPT-5.5, and refreshes TBench workflow and skill examples.

Background

GPT-5.5 xhigh runs were timing out in TBench, so the nightly workflow keeps GPT-5.5 at high while preserving xhigh for Opus 4.7.

Validation

make static-check
python3 -m py_compile benchmarks/terminal_bench/prepare_leaderboard_submission.py
go run github.com/rhysd/actionlint/cmd/actionlint@v1.7.7 .github/workflows/nightly-terminal-bench.yml .github/workflows/terminal-bench.yml
/home/coder/.local/bin/uvx ruff format --check benchmarks/terminal_bench/prepare_leaderboard_submission.py
git diff --check

Generated with mux • Model: openai:gpt-5.5 • Thinking: xhigh • Cost: $16.42

Switch nightly Terminal-Bench defaults to GPT-5.5 and Opus 4.7 with xhigh thinking. Add leaderboard metadata for both models and update tbench examples. --- _Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `$9.21`_

ibetitsmike · 2026-04-25T01:15:12Z

@codex review

Mux working on behalf of Mike.

chatgpt-codex-connector · 2026-04-25T01:17:42Z

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Use high thinking for GPT-5.5 Terminal-Bench runs because xhigh was timing out. Keep Opus 4.7 on xhigh. --- _Generated with `mux` • Model: `openai:gpt-5.5` • Thinking: `xhigh` • Cost: `$16.42`_

ibetitsmike · 2026-04-26T01:33:44Z

@codex review

Mux working on behalf of Mike.

chatgpt-codex-connector · 2026-04-26T01:36:13Z

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ibetitsmike enabled auto-merge April 26, 2026 01:34

ibetitsmike added this pull request to the merge queue Apr 26, 2026

Merged via the queue into main with commit e1bf54b Apr 26, 2026
24 checks passed

ibetitsmike deleted the mike/tbench-eq2r branch April 26, 2026 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🤖 bench: use GPT-5.5 for tbench#3193

🤖 bench: use GPT-5.5 for tbench#3193
ibetitsmike merged 2 commits intomainfrom
mike/tbench-eq2r

ibetitsmike commented Apr 25, 2026 •

edited

Loading

Uh oh!

ibetitsmike commented Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 25, 2026

Uh oh!

ibetitsmike commented Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ibetitsmike commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Validation

Uh oh!

ibetitsmike commented Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 25, 2026

Uh oh!

ibetitsmike commented Apr 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ibetitsmike commented Apr 25, 2026 •

edited

Loading