Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The

### Added

- **Structured output (proposal 0016, spec v0.14.0).** `Provider.complete()` now accepts an optional `response_schema` parameter — either a JSON Schema dict or a Pydantic `BaseModel` subclass. When supplied, the provider constrains the model's output to the schema and populates `Response.parsed` with the validated value (`dict` for dict-schema input, a `BaseModel` instance for class input). New `StructuredOutputInvalid` error category (non-transient by default) raises on JSON parse failure or schema validation failure; carries the requested schema, the raw response content, and a failure description.
- **Image content blocks for user messages (proposal 0015, introduced in spec v0.13.0).** `UserMessage.content` now accepts `str | list[ContentBlock]`. The block surface introduces `TextBlock`, `ImageBlock`, `ImageSourceURL`, `ImageSourceInline`, and the `ContentBlock` / `ImageSource` discriminated unions over the block / source `type` field. `ImageBlock` carries a `media_type` (required for inline sources; ignored for URL sources; typed as `str | None` so callers MAY pass any `image/*` type the bound model supports) and an optional `detail` hint (`"auto"` / `"low"` / `"high"`; `None` default omits the field from the wire so providers apply their own default). System, assistant, and tool messages stay text-string-only; image inputs are user-only in v1.
- **`OpenAIProvider` content-array wire mapping.** When `UserMessage.content` is a content-block sequence, the wire body uses OpenAI's `content` array per §8.1.1. `TextBlock → {type: "text", text}`. `ImageBlock` with a URL source maps to `{type: "image_url", image_url: {url, detail?}}`. `ImageBlock` with an inline source constructs an RFC 2397 `data:<media_type>;base64,<base64_data>` URI and goes through the same `image_url` entry shape. Inline bytes pass through unchanged — no inspection, transcoding, or re-encoding.
- **New error category `ProviderUnsupportedContentBlock` (non-transient).** Raised when the bound model rejects a content block type / media variant. Distinct from `ProviderInvalidRequest` (which covers spec-shape malformation): this category surfaces a *capability* mismatch, letting callers route differently (e.g., fall back to a multimodal-capable provider) without overloading the malformed-request category. Carries `block_type` ("image" / "audio" / "video") and `reason` (provider's human-readable message) when those are recoverable from the rejection. `OpenAIProvider` detects content rejection via HTTP 400 bodies — heuristic on `error.code` (known set: `image_content_not_supported`, `unsupported_image_media_type`, `audio_content_not_supported`, etc.), `error.type` (`image_parse_error`), and `error.message` ("does not support" + image/audio/video).
- **Structured output (proposal 0016, introduced in spec v0.14.0).** `Provider.complete()` now accepts an optional `response_schema` parameter — either a JSON Schema dict or a Pydantic `BaseModel` subclass. When supplied, the provider constrains the model's output to the schema and populates `Response.parsed` with the validated value (`dict` for dict-schema input, a `BaseModel` instance for class input). New `StructuredOutputInvalid` error category (non-transient by default) raises on JSON parse failure or schema validation failure; carries the requested schema, the raw response content, and a failure description.
- **`OpenAIProvider` native response_format wire path.** When `response_schema` is supplied, the chat-completions request body carries `response_format: { type: "json_schema", json_schema: { name, schema, strict } }`. The `strict` flag is determined by a deep recursive walk over the schema (object-property required-coverage rule across `anyOf` / `oneOf` / `allOf` and `$ref` targets, with cycle protection); unresolvable refs fall through to `strict: false`. The `name` field uses `schema.title` when present, otherwise a deterministic sha256-prefix hash.
- **`OpenAIProvider` prompt-augmentation fallback.** Constructor flag `force_prompt_augmentation_fallback: bool` (default `False`) and read-only inspect property `uses_prompt_augmentation_fallback: bool`. When the flag is on, structured-output calls build a fresh message list with a system directive containing the serialized schema, omit `response_format` from the wire, and validate the response post-receive. The caller's original `messages` list is never mutated. Use for OpenAI-compatible servers (older vLLM, some LM Studio releases, llama.cpp variants) that reject or silently ignore `response_format`.
- **Provider-agnostic schema helpers.** `openarmature.llm.validate_response_schema(schema)` (raises `ProviderInvalidRequest` when the schema is not a dict with a top-level `type: "object"`) and `openarmature.llm.strict_mode_supported(schema)` (the deep-tree strict-mode constraint check) are exported for reuse by future Anthropic/Gemini providers.
Expand Down
111 changes: 111 additions & 0 deletions docs/concepts/llms.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,117 @@ on every object. Pydantic-derived schemas may need `model_config =
ConfigDict(extra="forbid")` on the class to get the
`additionalProperties: false` in the generated JSON Schema.

## Content blocks (multimodal user messages)

User messages carry content in one of two shapes: a plain text string,
or an ordered sequence of typed content blocks. The string form is the
common case. Blocks are how you mix non-text modalities into a single
turn. v1 defines two block types: text and image. Audio and video are
deferred to future proposals.

System, assistant, and tool messages stay text-string only. Image
inputs are user-only in v1; image outputs (assistant-message-borne
images, e.g. DALL-E-style generation) are out of scope.

### Text and image blocks

A text block is the array-form equivalent of a text-string message:
`TextBlock(text="describe this")`. A user message holding a single
text block is normatively equivalent to one with `content="describe
this"`.

An image block carries one source — URL or inline base64 — plus an
optional `detail` hint:

```python
from openarmature.llm import (
ImageBlock,
ImageSourceInline,
ImageSourceURL,
OpenAIProvider,
TextBlock,
UserMessage,
)


async def describe_image(provider: OpenAIProvider) -> str:
response = await provider.complete(
[
UserMessage(
content=[
ImageBlock(
source=ImageSourceURL(url="https://example.com/diagram.png"),
detail="high", # optional; omitted from wire when None
),
TextBlock(text="What does this diagram show?"),
]
)
]
)
return response.message.content
```

Block order is preserved on the wire. Providers vary in whether they
treat order as semantically meaningful (an image followed by its
describing text is a different signal from text followed by the
image); construct the sequence in the order you want the model to
perceive it.

### URL vs inline sources

- **URL source** (`ImageSourceURL`): the provider fetches the URL. Any
scheme the provider documents support for is valid (`http(s)://`,
`data:`, etc.). The framework passes it through unchanged.
- **Inline source** (`ImageSourceInline`): the image is sent as
base64-encoded bytes in the request body. The `media_type` field on
the surrounding `ImageBlock` is **required** for inline sources (and
ignored for URL sources). The framework constructs an RFC 2397
`data:<media_type>;base64,<bytes>` URI for the wire; it does not
inspect, transcode, or re-encode the bytes.

OpenAI, Anthropic, and Google all accept `image/png`, `image/jpeg`,
and `image/webp` as guaranteed media types. `media_type` is typed as
`str | None`, so callers MAY pass additional `image/*` types when
they know the bound model supports them; portable code sticks to the
three.

### The `detail` hint

`detail` is a per-image hint to the provider about processing
fidelity: `"auto"`, `"low"`, or `"high"`. The class default is `None`,
which **omits the field from the wire** and lets the provider apply
its own default (conceptually `"auto"`). Setting `detail="auto"`
explicitly on the spec block forces the wire to carry an explicit
`"auto"` — usually unnecessary, since the provider's default is the
same value.

### When the model can't handle the block

`provider_unsupported_content_block` raises when the bound model
rejects a content block type or media variant. Concrete cases:

- A text-only model (e.g., `gpt-3.5-turbo`) received an image block.
- The model supports images but not the requested `media_type`.
- The model supports the type but rejected the specific source variant
(a URL the provider can't fetch, for example).

The error category is **non-transient**: retrying without changing
the request, the bound model, or the provider won't succeed. Userland
fallback patterns (e.g., a middleware that routes to a multimodal
provider on this category) compose cleanly against it.

`ProviderUnsupportedContentBlock` carries `block_type` ("image",
"audio", "video") and `reason` (the provider's human-readable
message) when those are recoverable from the rejection.

`OpenAIProvider` detects content rejection via the response body —
HTTP 400 with an error code like `image_content_not_supported` or a
message like "does not support image inputs." Pre-send capability
checks (failing fast before the wire trip when you know the model
doesn't support images) live above the provider as userland
middleware — the provider doesn't ship a static model-capability
catalog.

## Routing on parsed fields

A conditional edge is a function `state -> str` that names the next
Expand Down
10 changes: 10 additions & 0 deletions docs/model-providers/authoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,16 @@ of:
- **Tool calls.** Wire-mapping the `tool_calls` array on
`AssistantMessage` to the Provider's expected shape, parsing tool
results back from `ToolMessage`s.
- **Content blocks (multimodal user input).** Wire-mapping the
`list[ContentBlock]` form of `UserMessage.content` to the provider's
multimodal shape (OpenAI's `image_url` content-array entries,
Anthropic's image blocks, Google's `inlineData` parts, etc.). The
spec types (`TextBlock`, `ImageBlock`, `ImageSourceURL`,
`ImageSourceInline`) are stable across providers; only the wire
shape differs. Provider authors targeting non-multimodal models
MUST surface `ProviderUnsupportedContentBlock` when the request
carries blocks the bound model can't serve — pre-send or
post-receive per §7.
- **Structured output.** Threading `response_schema` through the
request body (native `response_format` if the underlying wire
supports it; prompt-augmentation fallback otherwise) and validating
Expand Down
39 changes: 25 additions & 14 deletions docs/model-providers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,24 +64,35 @@ class Provider(Protocol):

## Errors

Eight canonical error categories cover every failure mode:

| Error | Trigger |
| --------------------------- | ---------------------------------------------------------------------- |
| `ProviderAuthentication` | 401 / 403 (bad key, expired token) |
| `ProviderUnavailable` | 5xx, network failure, timeout |
| `ProviderInvalidModel` | Bound model doesn't exist on the provider |
| `ProviderModelNotLoaded` | Model known but not currently serving |
| `ProviderRateLimit` | 429 (with `Retry-After` exposed) |
| `ProviderInvalidResponse` | 200 OK that fails to parse |
| `ProviderInvalidRequest` | Malformed request (per-message or list-level) |
| `StructuredOutputInvalid` | Response failed to parse as JSON or failed to validate against schema |
Nine canonical error categories cover every failure mode:

| Error | Trigger |
| ---------------------------------- | ---------------------------------------------------------------------- |
| `ProviderAuthentication` | 401 / 403 (bad key, expired token) |
| `ProviderUnavailable` | 5xx, network failure, timeout |
| `ProviderInvalidModel` | Bound model doesn't exist on the provider |
| `ProviderModelNotLoaded` | Model known but not currently serving |
| `ProviderRateLimit` | 429 (with `Retry-After` exposed) |
| `ProviderInvalidResponse` | 200 OK that fails to parse |
| `ProviderInvalidRequest` | Malformed request (per-message or list-level) |
| `ProviderUnsupportedContentBlock` | Bound model rejected a content block (image / audio / media-type) |
| `StructuredOutputInvalid` | Response failed to parse as JSON or failed to validate against schema |

Three of these (`Unavailable`, `RateLimit`, `ModelNotLoaded`) are
exported in `TRANSIENT_CATEGORIES`, the canonical "safe to retry"
set used by the default retry-middleware classifier.
`StructuredOutputInvalid` is non-transient by default; see
[Structured output](#structured-output) below.
`StructuredOutputInvalid` and `ProviderUnsupportedContentBlock` are
non-transient by default. See [Content blocks](../concepts/llms.md#content-blocks-multimodal-user-messages)
in the LLMs concept page for the multimodal contract; see
[Structured output](#structured-output) below for the
`response_schema` path.

`OpenAIProvider` detects unsupported-content-block rejections via
the response body (HTTP 400 with an error code or message indicating
content rejection) — a post-receive mapping rather than a static
pre-send capability check. Pre-send protection is a userland
middleware pattern when callers know the bound model's capabilities
up front.

## Structured output

Expand Down
16 changes: 16 additions & 0 deletions src/openarmature/llm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
PROVIDER_MODEL_NOT_LOADED,
PROVIDER_RATE_LIMIT,
PROVIDER_UNAVAILABLE,
PROVIDER_UNSUPPORTED_CONTENT_BLOCK,
STRUCTURED_OUTPUT_INVALID,
TRANSIENT_CATEGORIES,
LlmProviderError,
Expand All @@ -40,12 +41,19 @@
ProviderModelNotLoaded,
ProviderRateLimit,
ProviderUnavailable,
ProviderUnsupportedContentBlock,
StructuredOutputInvalid,
)
from .messages import (
AssistantMessage,
ContentBlock,
ImageBlock,
ImageSource,
ImageSourceInline,
ImageSourceURL,
Message,
SystemMessage,
TextBlock,
Tool,
ToolCall,
ToolMessage,
Expand All @@ -69,10 +77,16 @@
"PROVIDER_MODEL_NOT_LOADED",
"PROVIDER_RATE_LIMIT",
"PROVIDER_UNAVAILABLE",
"PROVIDER_UNSUPPORTED_CONTENT_BLOCK",
"STRUCTURED_OUTPUT_INVALID",
"TRANSIENT_CATEGORIES",
"AssistantMessage",
"ContentBlock",
"FinishReason",
"ImageBlock",
"ImageSource",
"ImageSourceInline",
"ImageSourceURL",
"LlmProviderError",
"Message",
"OpenAIProvider",
Expand All @@ -85,10 +99,12 @@
"ProviderModelNotLoaded",
"ProviderRateLimit",
"ProviderUnavailable",
"ProviderUnsupportedContentBlock",
"Response",
"RuntimeConfig",
"StructuredOutputInvalid",
"SystemMessage",
"TextBlock",
"Tool",
"ToolCall",
"ToolMessage",
Expand Down
45 changes: 45 additions & 0 deletions src/openarmature/llm/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
PROVIDER_RATE_LIMIT = "provider_rate_limit"
PROVIDER_INVALID_RESPONSE = "provider_invalid_response"
PROVIDER_INVALID_REQUEST = "provider_invalid_request"
PROVIDER_UNSUPPORTED_CONTENT_BLOCK = "provider_unsupported_content_block"
STRUCTURED_OUTPUT_INVALID = "structured_output_invalid"


Expand Down Expand Up @@ -137,6 +138,48 @@ class ProviderInvalidRequest(LlmProviderError):
category = PROVIDER_INVALID_REQUEST


# Non-transient by default — the bound model's capability set does
# not change between calls, so retrying without changing the request
# (the message list, the bound model, or the provider) will not
# succeed.
#
# Distinct from ProviderInvalidRequest. ProviderInvalidRequest covers
# spec-shape violations (the request is malformed at the wire layer);
# ProviderUnsupportedContentBlock covers capability mismatches (the
# request is well-formed but the bound model can't fulfill it).
# Splitting them lets callers route the unsupported-content case
# differently (e.g., fall back to a multimodal-capable provider)
# without overloading the malformed-request category.
class ProviderUnsupportedContentBlock(LlmProviderError):
"""Raised when the bound model does not support a content block
type used in the request.

Examples: a text-only model received an image block, or the model
supports images but not the requested ``media_type`` or ``source``
variant.

Attributes:
block_type: The block type that was rejected (e.g., ``"image"``),
when the provider's response makes this identifiable.
reason: The provider's human-readable description of the
rejection, when available.
"""

category = PROVIDER_UNSUPPORTED_CONTENT_BLOCK
block_type: str | None
reason: str | None

def __init__(
self,
*args: Any,
block_type: str | None = None,
reason: str | None = None,
) -> None:
super().__init__(*args)
self.block_type = block_type
self.reason = reason


# Non-transient by default — a model that fails schema compliance on a
# given prompt usually fails the same way on retry. The default
# RetryMiddleware classifier does NOT retry this category. Users wanting
Expand Down Expand Up @@ -184,6 +227,7 @@ def __init__(
"PROVIDER_MODEL_NOT_LOADED",
"PROVIDER_RATE_LIMIT",
"PROVIDER_UNAVAILABLE",
"PROVIDER_UNSUPPORTED_CONTENT_BLOCK",
"STRUCTURED_OUTPUT_INVALID",
"TRANSIENT_CATEGORIES",
"LlmProviderError",
Expand All @@ -194,5 +238,6 @@ def __init__(
"ProviderModelNotLoaded",
"ProviderRateLimit",
"ProviderUnavailable",
"ProviderUnsupportedContentBlock",
"StructuredOutputInvalid",
]
Loading