Bump to gpt5 models by Qard · Pull Request #169 · braintrustdata/autoevals

Stephen Belanger (Qard) · 2026-01-30T17:06:11Z

No description provided.

github-actions · 2026-02-04T18:13:10Z

Braintrust eval report

Score	Average	Improvements	Regressions
NumericDiff	79.5% (+0pp)	7 🟢	4 🔴
Time_to_first_token	8.59tok (-0.11tok)	69 🟢	50 🔴
Llm_calls	1.09 (+0)	-	-
Tool_calls	0 (+0)	-	-
Errors	0 (+0)	-	-
Llm_errors	0 (+0)	-	-
Tool_errors	0 (+0)	-	-
Prompt_tokens	317.7tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-
Completion_tokens	257.89tok (-3.97tok)	55 🟢	43 🔴
Completion_reasoning_tokens	0tok (+0tok)	-	-
Total_tokens	575.59tok (-3.97tok)	55 🟢	43 🔴
Estimated_cost	0$ (0$)	53 🟢	41 🔴
Duration	8.97s (-0.52s)	135 🟢	84 🔴
Llm_duration	10.08s (-0.02s)	67 🟢	52 🔴

Olmo Maldonado (ibolmo) · 2026-02-11T22:40:29Z

js/llm.fixtures.ts

    object: "chat.completion",
    created: 1741135832,
-    model: "gpt-4o-2024-08-06",
+    model: "gpt-5-mini-2025-08-07",


you should send up a monolith repo PR. you'll likely need to update various expect tests

- Default completion model changed from gpt-4o to gpt-5-mini - GPT-5 models use OpenAI Responses API (openai.responses.create) instead of Chat Completions API - Converts between Chat Completions and Responses API formats automatically - Removes span_info from Responses API params (OpenAI rejects unknown params) - Adds async support for Responses API wrapper - Updates tests to mock Responses API endpoint for gpt-5-mini Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-03-09T12:28:04Z

Braintrust eval report

Autoevals (main-1773059288)

Score	Average	Improvements	Regressions
NumericDiff	79.6% (+0pp)	6 🟢	4 🔴
Time_to_first_token	10.18tok (+1.59tok)	36 🟢	83 🔴
Llm_calls	1.09 (+0)	-	-
Tool_calls	0 (+0)	-	-
Errors	0 (+0)	-	-
Llm_errors	0 (+0)	-	-
Tool_errors	0 (+0)	-	-
Prompt_tokens	317.7tok (+0tok)	-	-
Prompt_cached_tokens	0tok (+0tok)	-	-
Prompt_cache_creation_tokens	0tok (+0tok)	-	-
Completion_tokens	250.23tok (-7.66tok)	52 🟢	52 🔴
Completion_reasoning_tokens	0tok (+0tok)	-	-
Total_tokens	567.93tok (-7.66tok)	52 🟢	52 🔴
Estimated_cost	0$ (0$)	49 🟢	49 🔴
Duration	10.15s (+1.18s)	78 🟢	141 🔴
Llm_duration	11.55s (+1.47s)	40 🟢	79 🔴

Stephen Belanger (Qard) requested review from Ankur Goyal (ankrgyl) and Olmo Maldonado (ibolmo) January 30, 2026 17:06

Stephen Belanger (Qard) self-assigned this Jan 30, 2026

Stephen Belanger (Qard) added the enhancement New feature or request label Jan 30, 2026

Stephen Belanger (Qard) force-pushed the gpt5 branch from 3f0c510 to 11d9f77 Compare February 4, 2026 18:12

Stephen Belanger (Qard) force-pushed the gpt5 branch 2 times, most recently from 37bbe98 to 2a023f7 Compare February 5, 2026 20:01

Olmo Maldonado (ibolmo) reviewed Feb 11, 2026

View reviewed changes

Olmo Maldonado (ibolmo) approved these changes Feb 11, 2026

View reviewed changes

Stephen Belanger (Qard) force-pushed the gpt5 branch 3 times, most recently from 4d9dbea to 35c0dc7 Compare February 24, 2026 15:33

Stephen Belanger (Qard) force-pushed the gpt5 branch 2 times, most recently from e829d3a to e6c3470 Compare March 5, 2026 19:26

Stephen Belanger (Qard) force-pushed the gpt5 branch from 63c86f1 to 76a0bfc Compare March 9, 2026 09:15

Abhijeet Prasad (AbhiPrasad) approved these changes Mar 9, 2026

View reviewed changes

Stephen Belanger (Qard) merged commit c52da64 into main Mar 9, 2026
7 checks passed

Stephen Belanger (Qard) deleted the gpt5 branch March 9, 2026 12:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump to gpt5 models#169

Bump to gpt5 models#169
Stephen Belanger (Qard) merged 1 commit intomainfrom
gpt5

Stephen Belanger (Qard) commented Jan 30, 2026

Uh oh!

github-actions bot commented Feb 4, 2026 •

edited

Loading

Uh oh!

Olmo Maldonado (ibolmo) Feb 11, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Stephen Belanger (Qard) commented Jan 30, 2026

Uh oh!

github-actions bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

Olmo Maldonado (ibolmo) Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Braintrust eval report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Feb 4, 2026 •

edited

Loading

github-actions bot commented Mar 9, 2026 •

edited

Loading