feat(agent): add OpenAI Responses API with auto endpoint detection#604
Merged
Conversation
tlongwell-block
added a commit
that referenced
this pull request
May 17, 2026
…nses auto-upgrade Two follow-ups from review on #604. 1. Anthropic startup hardening (Max #1) `OPENAI_COMPAT_API` was parsed unconditionally, so a stray bad value in an Anthropic-only env broke startup. Parse it only inside the `Provider::OpenAi` arm of `Config::from_env`. Anthropic gets a placeholder `OpenAiApi::ChatCompletions` it never reads. New tests pin the parser behavior without touching process env. 2. One-shot chat→responses auto-upgrade (Max #2, Tyler "automatic detection/fallthrough") When `OPENAI_COMPAT_API=auto` and the provider replies to a Chat Completions request with a body that explicitly names `/v1/responses` (or the prose "use the Responses API"), latch a process-wide sticky-cached upgrade and re-issue the same request on `/v1/responses`. Subsequent calls skip the chat attempt entirely. Pinned values (`OPENAI_COMPAT_API=chat`|`responses`) never auto-upgrade. Signal matcher (`is_responses_required_error`) is intentionally narrow — only matches the literal path `/v1/responses` or specific prose phrases, so we don't get fooled by unrelated 4xx bodies. New `Config.openai_api_auto: bool` records whether the operator resolved-by-auto vs. pinned, so we know when to enable the upgrade. `Llm` gains an `AtomicBool` for the sticky upgrade, plus three small helpers (`effective_openai_api`, `should_try_auto_upgrade`, `latch_responses_upgrade`) so the dispatch reads straight through. Logged at WARN once per process: `"openai chat-completions endpoint reported that this model requires the Responses API; auto-upgrading subsequent OpenAI calls to /v1/responses for the rest of this process"`, with the provider error body attached. Tests: - 4 new unit tests for `is_responses_required_error` covering the Databricks GPT-5.5 signal, OpenAI prose phrasing, and explicit non-matches for `invalid_api_key`, generic `unsupported_parameter`, and empty body. - 3 new unit tests for `parse_openai_api` covering unset-defaults-to-auto, case-insensitive explicit values with whitespace, and rejected garbage. - New integration test `tests/openai_auto_upgrade.rs` spawns a fake provider that 400s on `/chat/completions` with the Databricks signal and 200s on `/responses`. Drives sprout-agent through ACP and asserts `stopReason=end_turn` plus chat-hit-once / responses-hit-once. 65 tests pass, 0 fail. clippy `-D warnings` clean. cargo fmt clean. Live smoke against api.openai.com with gpt-5-mini still 3/3 PASS. Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
OpenAI's GPT-5 / o-series models on api.openai.com require the
Responses API (/v1/responses) for tool calling; the legacy Chat
Completions endpoint rejects them. OpenAI-compatible servers (vLLM,
Ollama, llama.cpp, OpenRouter, Block Gateway, Databricks) almost all
still speak only Chat Completions. This change teaches sprout-agent
both dialects and routes between them automatically.
New env: OPENAI_COMPAT_API={auto,chat,responses}, default auto.
auto picks Responses for *.openai.com hosts, Chat Completions for
everything else. Operators can pin the choice explicitly.
Implementation:
- config.rs: OpenAiApi enum + parse_openai_api_env() + auto_openai_api()
with a small zero-dep host extractor. Lookalike-safe (`.openai.com`
suffix match, not substring).
- llm.rs: Provider::OpenAi now dispatches on cfg.openai_api. New
responses_body / parse_responses pair handles the Responses wire
shape (flat tool schema, input[] of typed items, max_output_tokens,
output[] walk with reasoning-item skip). Serializer emits each
prior assistant function_call before its function_call_output —
the API rejects with "No tool call found for call_id ..." otherwise.
- README.md: provider table updated, new env documented.
Tests (11 new, all passing):
- responses_body shape: instructions/max_output_tokens/flat tools
- replay ordering invariant (function_call before function_call_output)
- empty-assistant text skipped, tool_calls still serialized
- image tool result → trailing input_image user message
- parse: end_turn / tool_use / max_output_tokens branches
- parse: rejects malformed function_call.arguments JSON
- auto-detection: official OpenAI → Responses; vLLM/Ollama/OpenRouter/
Block Gateway/malformed → Chat Completions; lookalike host
(api.openai.com.evil.example) → Chat Completions
cargo fmt + cargo clippy --all-targets -D warnings clean.
Live smoke against api.openai.com with gpt-5-mini: plain prompt,
tool-roundtrip (dev__shell), and explicit chat-mode fallback all
return stopReason=end_turn. See
scripts at ~/scratch/sprout-agent-demos/test_openai_responses_smoke.py
(out-of-tree).
Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
…nses auto-upgrade Two follow-ups from review on #604. 1. Anthropic startup hardening (Max #1) `OPENAI_COMPAT_API` was parsed unconditionally, so a stray bad value in an Anthropic-only env broke startup. Parse it only inside the `Provider::OpenAi` arm of `Config::from_env`. Anthropic gets a placeholder `OpenAiApi::ChatCompletions` it never reads. New tests pin the parser behavior without touching process env. 2. One-shot chat→responses auto-upgrade (Max #2, Tyler "automatic detection/fallthrough") When `OPENAI_COMPAT_API=auto` and the provider replies to a Chat Completions request with a body that explicitly names `/v1/responses` (or the prose "use the Responses API"), latch a process-wide sticky-cached upgrade and re-issue the same request on `/v1/responses`. Subsequent calls skip the chat attempt entirely. Pinned values (`OPENAI_COMPAT_API=chat`|`responses`) never auto-upgrade. Signal matcher (`is_responses_required_error`) is intentionally narrow — only matches the literal path `/v1/responses` or specific prose phrases, so we don't get fooled by unrelated 4xx bodies. New `Config.openai_api_auto: bool` records whether the operator resolved-by-auto vs. pinned, so we know when to enable the upgrade. `Llm` gains an `AtomicBool` for the sticky upgrade, plus three small helpers (`effective_openai_api`, `should_try_auto_upgrade`, `latch_responses_upgrade`) so the dispatch reads straight through. Logged at WARN once per process: `"openai chat-completions endpoint reported that this model requires the Responses API; auto-upgrading subsequent OpenAI calls to /v1/responses for the rest of this process"`, with the provider error body attached. Tests: - 4 new unit tests for `is_responses_required_error` covering the Databricks GPT-5.5 signal, OpenAI prose phrasing, and explicit non-matches for `invalid_api_key`, generic `unsupported_parameter`, and empty body. - 3 new unit tests for `parse_openai_api` covering unset-defaults-to-auto, case-insensitive explicit values with whitespace, and rejected garbage. - New integration test `tests/openai_auto_upgrade.rs` spawns a fake provider that 400s on `/chat/completions` with the Databricks signal and 200s on `/responses`. Drives sprout-agent through ACP and asserts `stopReason=end_turn` plus chat-hit-once / responses-hit-once. 65 tests pass, 0 fail. clippy `-D warnings` clean. cargo fmt clean. Live smoke against api.openai.com with gpt-5-mini still 3/3 PASS. Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
Review pass per Max's PR comments. Same behavior, same core safety
tests, fewer production lines.
Cuts:
- Collapsed `openai_api: OpenAiApi` + `openai_api_auto: bool` into a
single tri-state enum `OpenAiApi::{Chat,Responses,Auto}`. The auto-
upgrade-on-error path now keys on `cfg.openai_api == Auto` directly
instead of a parallel flag.
- Replaced the duplicated chat/responses dispatch across `complete`
and `summarize` with a single `openai_request` helper. Callers pass a
`FnMut(bool) -> (Value, OpenAiParse)` so the body is only built for
the endpoint actually selected.
- Pre-resolved `OpenAiApi::Auto`'s host check is gone — the endpoint
is computed at call time inside `openai_request`. Drops the
`auto_openai_api` helper; the test surface is now a single pure
`is_openai_host` function in `config.rs`.
- Inlined `responses_image_user_content` (single caller).
- Inlined `responses_stop` into `parse_responses` (single caller).
- Removed wall-of-text protocol comments; kept only the non-obvious
replay-ordering invariant and a spec link.
- Collapsed test fan-out: `auto_openai_api_*` (3 fns) → `is_openai_host_matrix`
(1 table); `parse_openai_api_*` (3 fns) → `parse_openai_api_values`
(1 table); `is_responses_required_error_*` (3 fns) → `_matrix` (1 table).
What stayed:
- Replay-ordering integration test (`responses_body_replay_emits_function_call_before_output`).
- Live-fake Databricks integration test (`tests/openai_auto_upgrade.rs`).
- Lookalike host safety case (`api.openai.com.evil.example` → Chat).
- Narrow `is_responses_required_error` matcher.
Net production diff (excluding `#[cfg(test)] mod tests` blocks):
config.rs +97 → +48
llm.rs +358 → +265
total +455 → +313
59 tests pass (was 65 — 4 collapsed into 2 tables, 2 helpers removed).
cargo fmt + cargo clippy -p sprout-agent --all-targets -D warnings clean.
Live smoke against api.openai.com with gpt-5-mini: 3/3 PASS unchanged
(plain, tool roundtrip, explicit chat-mode override).
Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
9c57a2b to
d2d06ce
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
OpenAI's GPT-5 and o-series models on
api.openai.comrequire the Responses API (/v1/responses) for tool calling; the legacy Chat Completions endpoint rejects them withunsupported_parameter. Meanwhile every OpenAI-compatible server in the wild (vLLM, Ollama, llama.cpp, OpenRouter, Block Gateway, Databricks) still speaks only Chat Completions. This change teachessprout-agentboth dialects and routes between them automatically.New configuration
autopicks Responses whenOPENAI_COMPAT_BASE_URLpoints at an*.openai.comhost, and Chat Completions everywhere else. Operators can pin the choice explicitly for providers that diverge from the default (e.g. a Responses-compatible self-hosted gateway, or a*.openai.comhost that for some reason needs the legacy endpoint).Implementation
config.rsOpenAiApienum (ChatCompletions/Responses) onConfig.parse_openai_api_env()parsesOPENAI_COMPAT_API;auto_openai_api()picks perbase_url.api.openai.com.evil.example→ Chat Completions, not Responses).llm.rsProvider::OpenAinow dispatches oncfg.openai_apifor bothcomplete()andsummarize().responses_body+parse_responses(withresponses_image_user_content,responses_stophelpers) handle the Responses wire shape:{type, name, description, parameters}— no nestedfunction: {…}).input[]of typed items: typed user/assistant messages,function_call,function_call_output.max_output_tokens(notmax_tokens/max_completion_tokens).function_callbefore its matchingfunction_call_output— the API rejects with"No tool call found for call_id ..."otherwise (caught in live testing).output[], collectsmessagecontent as text andfunction_callitems asToolCall. Skipsreasoningitems (stateless across turns; carrying them requires the encrypted-passthrough flow we don't need).incomplete + reason=max_output_tokens → MaxTokens;completedwithfunction_call→ToolUse;completedotherwise →EndTurn.README.mdauto.Tests
11 new unit tests (in
llm.rs::testsandconfig.rs::tests), all passing:responses_body_top_level_shape—instructions/max_output_tokens/input; tools are flat; no straymessages/max_tokens/max_completion_tokensfieldsresponses_body_replay_emits_function_call_before_output— pins the replay-ordering invariantresponses_body_skips_empty_assistant_text— mirrors#559/#560behavior; tool_calls still serializedresponses_body_image_tool_result_attaches_input_image— image attachment via trailinginput_imageuser messageparse_responses_completed_with_text_is_end_turnparse_responses_completed_with_function_call_is_tool_use— also verifiesreasoningitems are skippedparse_responses_incomplete_max_output_tokensparse_responses_rejects_malformed_function_argumentsauto_openai_api_picks_responses_for_official_openaiauto_openai_api_picks_chat_for_third_parties— vLLM/Ollama/OpenRouter/Block Gateway/self-hosted vLLM/malformed inputauto_openai_api_does_not_match_lookalike_hostsFull suite: 58 pass, 0 fail.
cargo fmt --all -- --checkclean.cargo clippy -p sprout-agent --all-targets -- -D warningsclean.Live smoke against
api.openai.comwithgpt-5-mini(key referenced only by filename, never read by the script):3/3 PASS — including a tool roundtrip through
dev__shellthat exercises the function_call → function_call_output replay ordering.Not in this PR
OPENAI_COMPAT_APIexplicitly, which keeps failure modes debuggable.OPENAI_COMPAT_*env names for zero migration cost; "compat" reads a little awkwardly now that we natively call Responses, but every operator already configures them that way.Blast radius
Anthropic path: unchanged.
provider=openaiwith non-*.openai.combase_url: same Chat Completions wire as before.provider=openaiwith official OpenAI base_url: now uses Responses by default; operators can pin tochatif they need the old behavior.