fix(llm): surface code, type, and nested fields on provider stream errors#28757
Merged
kitlangton merged 2 commits intoMay 22, 2026
Merged
Conversation
Contributor
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
Contributor
|
This PR doesn't fully meet our contributing guidelines and PR template. What needs to be fixed:
Please edit this PR description to address the above within 2 hours, or it will be automatically closed. If you believe this was flagged incorrectly, please let a maintainer know. |
Contributor
|
This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window. Feel free to open a new pull request that follows our guidelines. |
This was referenced May 22, 2026
…rors
The OpenAI Responses and Anthropic Messages stream-error handlers
collapsed every provider failure into one of two opaque strings:
- `OpenAI Responses stream error` / `OpenAI Responses response failed`
- `Anthropic Messages stream error`
In production this meant rate limits, context-length overflows, model
overloads, and image-validation failures were all surfaced identically.
The OpenAI `response.failed` path was the worst case: the error
details live under `response.error`, not at the top level, so the
previous `event.message ?? event.code` lookup was always undefined and
the catch-all string was the only thing callers ever saw.
This commit:
- Reads OpenAI errors from `event.{code,message,param}` and falls back
to `event.response.error.{code,message,param}` so `response.failed`
finally surfaces the underlying cause. Adds `error` to the typed
`response` schema instead of leaving it as an untyped rest field.
- Prefixes the failure code/type when both code and message are present
(`rate_limit_exceeded: Slow down`, `overloaded_error: Overloaded`)
so consumers can branch on the failure mode without parsing prose.
- Marks Anthropic's `error.{type,message}` fields optional so partial
payloads from OpenAI-compatible proxies still parse, and falls back to
whichever field is populated.
- Adds focused tests for every shape (nested code+message, code-only,
empty error, missing error entirely) on both protocols.
Stacked on top of fix/openai-responses-tool-image so the OpenAI
Responses error-event schema widening reuses the new `response.error`
field.
d05e3f1 to
952a610
Compare
teknium1
added a commit
to NousResearch/hermes-agent
that referenced
this pull request
May 29, 2026
…hain probe (#34340) * fix(codex): surface error code in Responses 'failed' status errors When a Codex Responses turn ends with status=failed, the response carries the failure details under `response.error` as `{code, message, param, ...}`. The previous extractor pulled only `message`, so users seeing a rate-limit failure got a bare "Slow down" string indistinguishable from a generic stream truncation; an internal_error with empty message degraded to a dict dump ("{'code': 'internal_error', 'message': ''}"). Extract a `_format_responses_error()` helper that: - prefixes `code` when both code and message are present (e.g. 'rate_limit_exceeded: Slow down') - falls back to the bare `code` when message is empty - accepts both dict and attribute-style payloads (SDK and JSON-RPC paths) - preserves the prior status-only fallback when no error payload exists Apply the same helper at the sibling site in `codex_app_server_session.run_turn()` so codex-CLI subprocess turn failures get the same treatment. Tests: - 8 new unit tests for `_format_responses_error` covering both shapes, empty/missing fields, non-string fields, and the status-only fallback. - 2 regression tests on `_normalize_codex_response` for failed status with and without a code, asserting the exact RuntimeError message. - All 3603 tests in tests/agent/ pass. Adapted from anomalyco/opencode#28757. * feat(prompt): universal task-completion guidance + local Python toolchain probe Two cross-model failure modes get a single-line answer in the cached system prompt. Both gated by config (default on), both add zero overhead when not needed, both verified via real AIAgent prompt builds. ## What changed `TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models. Targets two failure modes observed on a real Sarasota real-estate build task: (1) Opus stopped after writing an 85-byte stub and gave a prose response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed through a PEP-668 wall, then returned fabricated listings instead of admitting the blocker. Both behaviors are model-family-agnostic, so the guidance lives outside the existing tool_use_enforcement gate (~192 tokens, paid once per session via prefix cache). `tools/env_probe.py` — local Python toolchain probe. Detects python3/pip/uv/PEP-668 state and emits ONE short line in the system prompt when something is non-default. Emits NOTHING when the env is clean (zero token cost for normal users). Skipped entirely for remote terminal backends (docker/modal/ssh) — they have their own probe. Example output on a broken environment (the actual case): Python toolchain: python3=3.11.15 (no pip module), python=missing (use python3), pip→python3.12 (mismatch), PEP 668=yes (use venv or uv). ## Config Both flags live under `agent.` in config.yaml, default True: agent: task_completion_guidance: true # universal "finish the job" block environment_probe: true # local Python toolchain hints Neither addition required a `_config_version` bump — deep-merge fills defaults in for existing user configs. ## Validation | Test surface | Result | |---|---| | tests/tools/test_env_probe.py | 10/10 pass (probe unit) | | tests/run_agent/test_run_agent.py — new classes | 8/8 pass (integration) | | TestToolUseEnforcementConfig | 17/17 pass (no regression) | | TestBuildSystemPrompt | 9/9 pass (no regression) | | TestInvalidateSystemPrompt | 2/2 pass (no regression) | | tests/agent/test_prompt_builder.py | 124/124 pass (no regression) | | tests/hermes_cli/ | 5662/5662 pass (config defaults) | | E2E AIAgent build (broken env) | Both blocks present, 2,178 chars | | E2E AIAgent build (clean env) | 771-char net overhead, env probe silent |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #28860
Type of change
What does this PR do?
OpenAI Responses and Anthropic Messages stream-error handlers were collapsing provider failures into opaque generic strings. That made rate limits, context-length errors, model overloads, and nested OpenAI
response.errorpayloads hard to diagnose.This PR surfaces richer provider-error messages:
event.{code,message,param}and nestedevent.response.error.{code,message,param}.error.typewhen available.typeormessageis missing.The branch has been rebased onto current
devafter #28754 and #28755 merged, so the diff now only contains the provider-error changes.How did you verify your code works?
packages/llm: bun run test test/provider/openai-responses.test.tspassed with 35 passpackages/llm: bun run test test/provider/anthropic-messages.test.tspassed with 22 passpackages/llm: bun run testpassed with 221 pass, 28 skippackages/llm: bun run typecheckpassedbun run typecheckpassed with 15 successful tasksScreenshots / recordings
Not applicable. This is an LLM protocol error-handling fix.
Checklist