feat(agent): add OpenAI Responses API with auto endpoint detection by tlongwell-block · Pull Request #604 · block/sprout

tlongwell-block · 2026-05-17T00:10:26Z

Summary

OpenAI's GPT-5 and o-series models on api.openai.com require the Responses API (/v1/responses) for tool calling; the legacy Chat Completions endpoint rejects them with unsupported_parameter. Meanwhile every OpenAI-compatible server in the wild (vLLM, Ollama, llama.cpp, OpenRouter, Block Gateway, Databricks) still speaks only Chat Completions. This change teaches sprout-agent both dialects and routes between them automatically.

New configuration

OPENAI_COMPAT_API={auto,chat,responses}   # default: auto

auto picks Responses when OPENAI_COMPAT_BASE_URL points at an *.openai.com host, and Chat Completions everywhere else. Operators can pin the choice explicitly for providers that diverge from the default (e.g. a Responses-compatible self-hosted gateway, or a *.openai.com host that for some reason needs the legacy endpoint).

Implementation

`config.rs`

New OpenAiApi enum (ChatCompletions / Responses) on Config.
parse_openai_api_env() parses OPENAI_COMPAT_API; auto_openai_api() picks per base_url.
Zero-dep host extractor with a lookalike-safety test (api.openai.com.evil.example → Chat Completions, not Responses).

`llm.rs`

Provider::OpenAi now dispatches on cfg.openai_api for both complete() and summarize().
New responses_body + parse_responses (with responses_image_user_content, responses_stop helpers) handle the Responses wire shape:
- Flat tool schema ({type, name, description, parameters} — no nested function: {…}).
- input[] of typed items: typed user/assistant messages, function_call, function_call_output.
- max_output_tokens (not max_tokens / max_completion_tokens).
- Serializer emits each prior assistant function_call before its matching function_call_output — the API rejects with "No tool call found for call_id ..." otherwise (caught in live testing).
- Parser walks output[], collects message content as text and function_call items as ToolCall. Skips reasoning items (stateless across turns; carrying them requires the encrypted-passthrough flow we don't need).
- Stop mapping: incomplete + reason=max_output_tokens → MaxTokens; completed with function_call → ToolUse; completed otherwise → EndTurn.

`README.md`

Provider table updated to show per-provider endpoint under auto.
New env documented.

Tests

11 new unit tests (in llm.rs::tests and config.rs::tests), all passing:

responses_body_top_level_shape — instructions/max_output_tokens/input; tools are flat; no stray messages/max_tokens/max_completion_tokens fields
responses_body_replay_emits_function_call_before_output — pins the replay-ordering invariant
responses_body_skips_empty_assistant_text — mirrors #559/#560 behavior; tool_calls still serialized
responses_body_image_tool_result_attaches_input_image — image attachment via trailing input_image user message
parse_responses_completed_with_text_is_end_turn
parse_responses_completed_with_function_call_is_tool_use — also verifies reasoning items are skipped
parse_responses_incomplete_max_output_tokens
parse_responses_rejects_malformed_function_arguments
auto_openai_api_picks_responses_for_official_openai
auto_openai_api_picks_chat_for_third_parties — vLLM/Ollama/OpenRouter/Block Gateway/self-hosted vLLM/malformed input
auto_openai_api_does_not_match_lookalike_hosts

Full suite: 58 pass, 0 fail. cargo fmt --all -- --check clean. cargo clippy -p sprout-agent --all-targets -- -D warnings clean.

Live smoke against api.openai.com with gpt-5-mini (key referenced only by filename, never read by the script):

=== auto + plain prompt (OPENAI_COMPAT_API=auto) ===
  stopReason=end_turn  tool_call=False  text='ready'
=== auto + tool prompt (OPENAI_COMPAT_API=auto) ===
  stopReason=end_turn  tool_call=True  text='It printed: hello'
=== chat + plain prompt (OPENAI_COMPAT_API=chat) ===
  stopReason=end_turn  tool_call=False  text='ready'

3/3 PASS — including a tool roundtrip through dev__shell that exercises the function_call → function_call_output replay ordering.

Not in this PR

Reasoning passthrough. Reasoning items are dropped, not carried forward across turns. Sprout's flow is already stateless across turns; carrying encrypted reasoning state would be a separate change with its own privacy/storage considerations.
Auto-fallback on Chat Completions errors. Suggested but rejected as too magical — operators with non-default endpoints set OPENAI_COMPAT_API explicitly, which keeps failure modes debuggable.
Env rename. Kept the existing OPENAI_COMPAT_* env names for zero migration cost; "compat" reads a little awkwardly now that we natively call Responses, but every operator already configures them that way.

Blast radius

Anthropic path: unchanged. provider=openai with non-*.openai.com base_url: same Chat Completions wire as before. provider=openai with official OpenAI base_url: now uses Responses by default; operators can pin to chat if they need the old behavior.

…nses auto-upgrade Two follow-ups from review on #604. 1. Anthropic startup hardening (Max #1) `OPENAI_COMPAT_API` was parsed unconditionally, so a stray bad value in an Anthropic-only env broke startup. Parse it only inside the `Provider::OpenAi` arm of `Config::from_env`. Anthropic gets a placeholder `OpenAiApi::ChatCompletions` it never reads. New tests pin the parser behavior without touching process env. 2. One-shot chat→responses auto-upgrade (Max #2, Tyler "automatic detection/fallthrough") When `OPENAI_COMPAT_API=auto` and the provider replies to a Chat Completions request with a body that explicitly names `/v1/responses` (or the prose "use the Responses API"), latch a process-wide sticky-cached upgrade and re-issue the same request on `/v1/responses`. Subsequent calls skip the chat attempt entirely. Pinned values (`OPENAI_COMPAT_API=chat`|`responses`) never auto-upgrade. Signal matcher (`is_responses_required_error`) is intentionally narrow — only matches the literal path `/v1/responses` or specific prose phrases, so we don't get fooled by unrelated 4xx bodies. New `Config.openai_api_auto: bool` records whether the operator resolved-by-auto vs. pinned, so we know when to enable the upgrade. `Llm` gains an `AtomicBool` for the sticky upgrade, plus three small helpers (`effective_openai_api`, `should_try_auto_upgrade`, `latch_responses_upgrade`) so the dispatch reads straight through. Logged at WARN once per process: `"openai chat-completions endpoint reported that this model requires the Responses API; auto-upgrading subsequent OpenAI calls to /v1/responses for the rest of this process"`, with the provider error body attached. Tests: - 4 new unit tests for `is_responses_required_error` covering the Databricks GPT-5.5 signal, OpenAI prose phrasing, and explicit non-matches for `invalid_api_key`, generic `unsupported_parameter`, and empty body. - 3 new unit tests for `parse_openai_api` covering unset-defaults-to-auto, case-insensitive explicit values with whitespace, and rejected garbage. - New integration test `tests/openai_auto_upgrade.rs` spawns a fake provider that 400s on `/chat/completions` with the Databricks signal and 200s on `/responses`. Drives sprout-agent through ACP and asserts `stopReason=end_turn` plus chat-hit-once / responses-hit-once. 65 tests pass, 0 fail. clippy `-D warnings` clean. cargo fmt clean. Live smoke against api.openai.com with gpt-5-mini still 3/3 PASS. Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>

OpenAI's GPT-5 / o-series models on api.openai.com require the Responses API (/v1/responses) for tool calling; the legacy Chat Completions endpoint rejects them. OpenAI-compatible servers (vLLM, Ollama, llama.cpp, OpenRouter, Block Gateway, Databricks) almost all still speak only Chat Completions. This change teaches sprout-agent both dialects and routes between them automatically. New env: OPENAI_COMPAT_API={auto,chat,responses}, default auto. auto picks Responses for *.openai.com hosts, Chat Completions for everything else. Operators can pin the choice explicitly. Implementation: - config.rs: OpenAiApi enum + parse_openai_api_env() + auto_openai_api() with a small zero-dep host extractor. Lookalike-safe (`.openai.com` suffix match, not substring). - llm.rs: Provider::OpenAi now dispatches on cfg.openai_api. New responses_body / parse_responses pair handles the Responses wire shape (flat tool schema, input[] of typed items, max_output_tokens, output[] walk with reasoning-item skip). Serializer emits each prior assistant function_call before its function_call_output — the API rejects with "No tool call found for call_id ..." otherwise. - README.md: provider table updated, new env documented. Tests (11 new, all passing): - responses_body shape: instructions/max_output_tokens/flat tools - replay ordering invariant (function_call before function_call_output) - empty-assistant text skipped, tool_calls still serialized - image tool result → trailing input_image user message - parse: end_turn / tool_use / max_output_tokens branches - parse: rejects malformed function_call.arguments JSON - auto-detection: official OpenAI → Responses; vLLM/Ollama/OpenRouter/ Block Gateway/malformed → Chat Completions; lookalike host (api.openai.com.evil.example) → Chat Completions cargo fmt + cargo clippy --all-targets -D warnings clean. Live smoke against api.openai.com with gpt-5-mini: plain prompt, tool-roundtrip (dev__shell), and explicit chat-mode fallback all return stopReason=end_turn. See scripts at ~/scratch/sprout-agent-demos/test_openai_responses_smoke.py (out-of-tree). Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>

…nses auto-upgrade Two follow-ups from review on #604. 1. Anthropic startup hardening (Max #1) `OPENAI_COMPAT_API` was parsed unconditionally, so a stray bad value in an Anthropic-only env broke startup. Parse it only inside the `Provider::OpenAi` arm of `Config::from_env`. Anthropic gets a placeholder `OpenAiApi::ChatCompletions` it never reads. New tests pin the parser behavior without touching process env. 2. One-shot chat→responses auto-upgrade (Max #2, Tyler "automatic detection/fallthrough") When `OPENAI_COMPAT_API=auto` and the provider replies to a Chat Completions request with a body that explicitly names `/v1/responses` (or the prose "use the Responses API"), latch a process-wide sticky-cached upgrade and re-issue the same request on `/v1/responses`. Subsequent calls skip the chat attempt entirely. Pinned values (`OPENAI_COMPAT_API=chat`|`responses`) never auto-upgrade. Signal matcher (`is_responses_required_error`) is intentionally narrow — only matches the literal path `/v1/responses` or specific prose phrases, so we don't get fooled by unrelated 4xx bodies. New `Config.openai_api_auto: bool` records whether the operator resolved-by-auto vs. pinned, so we know when to enable the upgrade. `Llm` gains an `AtomicBool` for the sticky upgrade, plus three small helpers (`effective_openai_api`, `should_try_auto_upgrade`, `latch_responses_upgrade`) so the dispatch reads straight through. Logged at WARN once per process: `"openai chat-completions endpoint reported that this model requires the Responses API; auto-upgrading subsequent OpenAI calls to /v1/responses for the rest of this process"`, with the provider error body attached. Tests: - 4 new unit tests for `is_responses_required_error` covering the Databricks GPT-5.5 signal, OpenAI prose phrasing, and explicit non-matches for `invalid_api_key`, generic `unsupported_parameter`, and empty body. - 3 new unit tests for `parse_openai_api` covering unset-defaults-to-auto, case-insensitive explicit values with whitespace, and rejected garbage. - New integration test `tests/openai_auto_upgrade.rs` spawns a fake provider that 400s on `/chat/completions` with the Databricks signal and 200s on `/responses`. Drives sprout-agent through ACP and asserts `stopReason=end_turn` plus chat-hit-once / responses-hit-once. 65 tests pass, 0 fail. clippy `-D warnings` clean. cargo fmt clean. Live smoke against api.openai.com with gpt-5-mini still 3/3 PASS. Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>

Review pass per Max's PR comments. Same behavior, same core safety tests, fewer production lines. Cuts: - Collapsed `openai_api: OpenAiApi` + `openai_api_auto: bool` into a single tri-state enum `OpenAiApi::{Chat,Responses,Auto}`. The auto- upgrade-on-error path now keys on `cfg.openai_api == Auto` directly instead of a parallel flag. - Replaced the duplicated chat/responses dispatch across `complete` and `summarize` with a single `openai_request` helper. Callers pass a `FnMut(bool) -> (Value, OpenAiParse)` so the body is only built for the endpoint actually selected. - Pre-resolved `OpenAiApi::Auto`'s host check is gone — the endpoint is computed at call time inside `openai_request`. Drops the `auto_openai_api` helper; the test surface is now a single pure `is_openai_host` function in `config.rs`. - Inlined `responses_image_user_content` (single caller). - Inlined `responses_stop` into `parse_responses` (single caller). - Removed wall-of-text protocol comments; kept only the non-obvious replay-ordering invariant and a spec link. - Collapsed test fan-out: `auto_openai_api_*` (3 fns) → `is_openai_host_matrix` (1 table); `parse_openai_api_*` (3 fns) → `parse_openai_api_values` (1 table); `is_responses_required_error_*` (3 fns) → `_matrix` (1 table). What stayed: - Replay-ordering integration test (`responses_body_replay_emits_function_call_before_output`). - Live-fake Databricks integration test (`tests/openai_auto_upgrade.rs`). - Lookalike host safety case (`api.openai.com.evil.example` → Chat). - Narrow `is_responses_required_error` matcher. Net production diff (excluding `#[cfg(test)] mod tests` blocks): config.rs +97 → +48 llm.rs +358 → +265 total +455 → +313 59 tests pass (was 65 — 4 collapsed into 2 tables, 2 helpers removed). cargo fmt + cargo clippy -p sprout-agent --all-targets -D warnings clean. Live smoke against api.openai.com with gpt-5-mini: 3/3 PASS unchanged (plain, tool roundtrip, explicit chat-mode override). Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>

tlongwell-block requested a review from wesbillman as a code owner May 17, 2026 00:10

tlongwell-block added 3 commits May 16, 2026 21:03

tlongwell-block force-pushed the feat/openai-responses-api branch from 9c57a2b to d2d06ce Compare May 17, 2026 01:04

tlongwell-block merged commit e4e9923 into main May 17, 2026
15 checks passed

tlongwell-block deleted the feat/openai-responses-api branch May 17, 2026 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): add OpenAI Responses API with auto endpoint detection#604

feat(agent): add OpenAI Responses API with auto endpoint detection#604
tlongwell-block merged 3 commits into
mainfrom
feat/openai-responses-api

tlongwell-block commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tlongwell-block commented May 17, 2026

Summary

New configuration

Implementation

config.rs

llm.rs

README.md

Tests

Not in this PR

Blast radius

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`config.rs`

`llm.rs`

`README.md`