Skip to content

Bump to gpt5 models#169

Merged
Stephen Belanger (Qard) merged 1 commit intomainfrom
gpt5
Mar 9, 2026
Merged

Bump to gpt5 models#169
Stephen Belanger (Qard) merged 1 commit intomainfrom
gpt5

Conversation

@Qard
Copy link
Contributor

No description provided.

@github-actions
Copy link

github-actions bot commented Feb 4, 2026

Braintrust eval report

Autoevals (gpt5-1773047782)

Score Average Improvements Regressions
NumericDiff 79.5% (+0pp) 7 🟢 4 🔴
Time_to_first_token 8.59tok (-0.11tok) 69 🟢 50 🔴
Llm_calls 1.09 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 317.7tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 257.89tok (-3.97tok) 55 🟢 43 🔴
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 575.59tok (-3.97tok) 55 🟢 43 🔴
Estimated_cost 0$ (0$) 53 🟢 41 🔴
Duration 8.97s (-0.52s) 135 🟢 84 🔴
Llm_duration 10.08s (-0.02s) 67 🟢 52 🔴

object: "chat.completion",
created: 1741135832,
model: "gpt-4o-2024-08-06",
model: "gpt-5-mini-2025-08-07",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should send up a monolith repo PR. you'll likely need to update various expect tests

- Default completion model changed from gpt-4o to gpt-5-mini
- GPT-5 models use OpenAI Responses API (openai.responses.create) instead
  of Chat Completions API
- Converts between Chat Completions and Responses API formats automatically
- Removes span_info from Responses API params (OpenAI rejects unknown params)
- Adds async support for Responses API wrapper
- Updates tests to mock Responses API endpoint for gpt-5-mini

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Qard Stephen Belanger (Qard) merged commit c52da64 into main Mar 9, 2026
7 checks passed
@github-actions
Copy link

github-actions bot commented Mar 9, 2026

Braintrust eval report

Autoevals (main-1773059288)

Score Average Improvements Regressions
NumericDiff 79.6% (+0pp) 6 🟢 4 🔴
Time_to_first_token 10.18tok (+1.59tok) 36 🟢 83 🔴
Llm_calls 1.09 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 317.7tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 250.23tok (-7.66tok) 52 🟢 52 🔴
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 567.93tok (-7.66tok) 52 🟢 52 🔴
Estimated_cost 0$ (0$) 49 🟢 49 🔴
Duration 10.15s (+1.18s) 78 🟢 141 🔴
Llm_duration 11.55s (+1.47s) 40 🟢 79 🔴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants