Skip to content

fix(llm): surface code, type, and nested fields on provider stream errors#28757

Merged
kitlangton merged 2 commits into
anomalyco:devfrom
kitlangton:fix/openai-responses-provider-error-detail
May 22, 2026
Merged

fix(llm): surface code, type, and nested fields on provider stream errors#28757
kitlangton merged 2 commits into
anomalyco:devfrom
kitlangton:fix/openai-responses-provider-error-detail

Conversation

@kitlangton
Copy link
Copy Markdown
Contributor

@kitlangton kitlangton commented May 22, 2026

Issue for this PR

Closes #28860

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

OpenAI Responses and Anthropic Messages stream-error handlers were collapsing provider failures into opaque generic strings. That made rate limits, context-length errors, model overloads, and nested OpenAI response.error payloads hard to diagnose.

This PR surfaces richer provider-error messages:

  • OpenAI Responses reads top-level event.{code,message,param} and nested event.response.error.{code,message,param}.
  • Anthropic Messages prefixes error.type when available.
  • Empty OpenAI error messages are treated as absent so the code fallback still appears.
  • Partial Anthropic proxy/gateway error payloads still parse when one of type or message is missing.

The branch has been rebased onto current dev after #28754 and #28755 merged, so the diff now only contains the provider-error changes.

How did you verify your code works?

  • packages/llm: bun run test test/provider/openai-responses.test.ts passed with 35 pass
  • packages/llm: bun run test test/provider/anthropic-messages.test.ts passed with 22 pass
  • packages/llm: bun run test passed with 221 pass, 28 skip
  • packages/llm: bun run typecheck passed
  • root bun run typecheck passed with 15 successful tasks

Screenshots / recordings

Not applicable. This is an LLM protocol error-handling fix.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions github-actions Bot added contributor needs:compliance This means the issue will auto-close after 2 hours. labels May 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window.

Feel free to open a new pull request that follows our guidelines.

…rors

The OpenAI Responses and Anthropic Messages stream-error handlers
collapsed every provider failure into one of two opaque strings:

- `OpenAI Responses stream error` / `OpenAI Responses response failed`
- `Anthropic Messages stream error`

In production this meant rate limits, context-length overflows, model
overloads, and image-validation failures were all surfaced identically.
The OpenAI `response.failed` path was the worst case: the error
details live under `response.error`, not at the top level, so the
previous `event.message ?? event.code` lookup was always undefined and
the catch-all string was the only thing callers ever saw.

This commit:

- Reads OpenAI errors from `event.{code,message,param}` and falls back
  to `event.response.error.{code,message,param}` so `response.failed`
  finally surfaces the underlying cause. Adds `error` to the typed
  `response` schema instead of leaving it as an untyped rest field.
- Prefixes the failure code/type when both code and message are present
  (`rate_limit_exceeded: Slow down`, `overloaded_error: Overloaded`)
  so consumers can branch on the failure mode without parsing prose.
- Marks Anthropic's `error.{type,message}` fields optional so partial
  payloads from OpenAI-compatible proxies still parse, and falls back to
  whichever field is populated.
- Adds focused tests for every shape (nested code+message, code-only,
  empty error, missing error entirely) on both protocols.

Stacked on top of fix/openai-responses-tool-image so the OpenAI
Responses error-event schema widening reuses the new `response.error`
field.
@kitlangton kitlangton force-pushed the fix/openai-responses-provider-error-detail branch from d05e3f1 to 952a610 Compare May 22, 2026 16:28
@kitlangton kitlangton merged commit d0cb587 into anomalyco:dev May 22, 2026
10 checks passed
teknium1 added a commit to NousResearch/hermes-agent that referenced this pull request May 29, 2026
…hain probe (#34340)

* fix(codex): surface error code in Responses 'failed' status errors

When a Codex Responses turn ends with status=failed, the response carries
the failure details under `response.error` as
`{code, message, param, ...}`. The previous extractor pulled only
`message`, so users seeing a rate-limit failure got a bare "Slow down"
string indistinguishable from a generic stream truncation; an
internal_error with empty message degraded to a dict dump
("{'code': 'internal_error', 'message': ''}").

Extract a `_format_responses_error()` helper that:
- prefixes `code` when both code and message are present
  (e.g. 'rate_limit_exceeded: Slow down')
- falls back to the bare `code` when message is empty
- accepts both dict and attribute-style payloads (SDK and JSON-RPC paths)
- preserves the prior status-only fallback when no error payload exists

Apply the same helper at the sibling site in
`codex_app_server_session.run_turn()` so codex-CLI subprocess turn
failures get the same treatment.

Tests:
- 8 new unit tests for `_format_responses_error` covering both shapes,
  empty/missing fields, non-string fields, and the status-only fallback.
- 2 regression tests on `_normalize_codex_response` for failed status
  with and without a code, asserting the exact RuntimeError message.
- All 3603 tests in tests/agent/ pass.

Adapted from anomalyco/opencode#28757.

* feat(prompt): universal task-completion guidance + local Python toolchain probe

Two cross-model failure modes get a single-line answer in the cached
system prompt. Both gated by config (default on), both add zero overhead
when not needed, both verified via real AIAgent prompt builds.

## What changed

`TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models.
Targets two failure modes observed on a real Sarasota real-estate build
task: (1) Opus stopped after writing an 85-byte stub and gave a prose
response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed
through a PEP-668 wall, then returned fabricated listings instead of
admitting the blocker. Both behaviors are model-family-agnostic, so the
guidance lives outside the existing tool_use_enforcement gate (~192
tokens, paid once per session via prefix cache).

`tools/env_probe.py` — local Python toolchain probe. Detects
python3/pip/uv/PEP-668 state and emits ONE short line in the system
prompt when something is non-default. Emits NOTHING when the env is
clean (zero token cost for normal users). Skipped entirely for remote
terminal backends (docker/modal/ssh) — they have their own probe.

Example output on a broken environment (the actual case):

    Python toolchain: python3=3.11.15 (no pip module),
    python=missing (use python3), pip→python3.12 (mismatch),
    PEP 668=yes (use venv or uv).

## Config

Both flags live under `agent.` in config.yaml, default True:

    agent:
      task_completion_guidance: true   # universal "finish the job" block
      environment_probe: true          # local Python toolchain hints

Neither addition required a `_config_version` bump — deep-merge fills
defaults in for existing user configs.

## Validation

| Test surface | Result |
|---|---|
| tests/tools/test_env_probe.py | 10/10 pass (probe unit) |
| tests/run_agent/test_run_agent.py — new classes | 8/8 pass (integration) |
| TestToolUseEnforcementConfig | 17/17 pass (no regression) |
| TestBuildSystemPrompt | 9/9 pass (no regression) |
| TestInvalidateSystemPrompt | 2/2 pass (no regression) |
| tests/agent/test_prompt_builder.py | 124/124 pass (no regression) |
| tests/hermes_cli/ | 5662/5662 pass (config defaults) |
| E2E AIAgent build (broken env) | Both blocks present, 2,178 chars |
| E2E AIAgent build (clean env) | 771-char net overhead, env probe silent |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Native LLM stream errors hide provider error code and nested details

1 participant