LunarCommand · chris-colinsky · May 16, 2026 · May 16, 2026 · May 16, 2026 · May 16, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,7 +8,10 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
 
 ### Added
 
-- **Structured output (proposal 0016, spec v0.14.0).** `Provider.complete()` now accepts an optional `response_schema` parameter — either a JSON Schema dict or a Pydantic `BaseModel` subclass. When supplied, the provider constrains the model's output to the schema and populates `Response.parsed` with the validated value (`dict` for dict-schema input, a `BaseModel` instance for class input). New `StructuredOutputInvalid` error category (non-transient by default) raises on JSON parse failure or schema validation failure; carries the requested schema, the raw response content, and a failure description.
+- **Image content blocks for user messages (proposal 0015, introduced in spec v0.13.0).** `UserMessage.content` now accepts `str | list[ContentBlock]`. The block surface introduces `TextBlock`, `ImageBlock`, `ImageSourceURL`, `ImageSourceInline`, and the `ContentBlock` / `ImageSource` discriminated unions over the block / source `type` field. `ImageBlock` carries a `media_type` (required for inline sources; ignored for URL sources; typed as `str | None` so callers MAY pass any `image/*` type the bound model supports) and an optional `detail` hint (`"auto"` / `"low"` / `"high"`; `None` default omits the field from the wire so providers apply their own default). System, assistant, and tool messages stay text-string-only; image inputs are user-only in v1.
+- **`OpenAIProvider` content-array wire mapping.** When `UserMessage.content` is a content-block sequence, the wire body uses OpenAI's `content` array per §8.1.1. `TextBlock → {type: "text", text}`. `ImageBlock` with a URL source maps to `{type: "image_url", image_url: {url, detail?}}`. `ImageBlock` with an inline source constructs an RFC 2397 `data:<media_type>;base64,<base64_data>` URI and goes through the same `image_url` entry shape. Inline bytes pass through unchanged — no inspection, transcoding, or re-encoding.
+- **New error category `ProviderUnsupportedContentBlock` (non-transient).** Raised when the bound model rejects a content block type / media variant. Distinct from `ProviderInvalidRequest` (which covers spec-shape malformation): this category surfaces a *capability* mismatch, letting callers route differently (e.g., fall back to a multimodal-capable provider) without overloading the malformed-request category. Carries `block_type` ("image" / "audio" / "video") and `reason` (provider's human-readable message) when those are recoverable from the rejection. `OpenAIProvider` detects content rejection via HTTP 400 bodies — heuristic on `error.code` (known set: `image_content_not_supported`, `unsupported_image_media_type`, `audio_content_not_supported`, etc.), `error.type` (`image_parse_error`), and `error.message` ("does not support" + image/audio/video).
+- **Structured output (proposal 0016, introduced in spec v0.14.0).** `Provider.complete()` now accepts an optional `response_schema` parameter — either a JSON Schema dict or a Pydantic `BaseModel` subclass. When supplied, the provider constrains the model's output to the schema and populates `Response.parsed` with the validated value (`dict` for dict-schema input, a `BaseModel` instance for class input). New `StructuredOutputInvalid` error category (non-transient by default) raises on JSON parse failure or schema validation failure; carries the requested schema, the raw response content, and a failure description.
 - **`OpenAIProvider` native response_format wire path.** When `response_schema` is supplied, the chat-completions request body carries `response_format: { type: "json_schema", json_schema: { name, schema, strict } }`. The `strict` flag is determined by a deep recursive walk over the schema (object-property required-coverage rule across `anyOf` / `oneOf` / `allOf` and `$ref` targets, with cycle protection); unresolvable refs fall through to `strict: false`. The `name` field uses `schema.title` when present, otherwise a deterministic sha256-prefix hash.
 - **`OpenAIProvider` prompt-augmentation fallback.** Constructor flag `force_prompt_augmentation_fallback: bool` (default `False`) and read-only inspect property `uses_prompt_augmentation_fallback: bool`. When the flag is on, structured-output calls build a fresh message list with a system directive containing the serialized schema, omit `response_format` from the wire, and validate the response post-receive. The caller's original `messages` list is never mutated. Use for OpenAI-compatible servers (older vLLM, some LM Studio releases, llama.cpp variants) that reject or silently ignore `response_format`.
 - **Provider-agnostic schema helpers.** `openarmature.llm.validate_response_schema(schema)` (raises `ProviderInvalidRequest` when the schema is not a dict with a top-level `type: "object"`) and `openarmature.llm.strict_mode_supported(schema)` (the deep-tree strict-mode constraint check) are exported for reuse by future Anthropic/Gemini providers.

diff --git a/docs/concepts/llms.md b/docs/concepts/llms.md
@@ -221,6 +221,117 @@ on every object. Pydantic-derived schemas may need `model_config =
 ConfigDict(extra="forbid")` on the class to get the
 `additionalProperties: false` in the generated JSON Schema.
 
+## Content blocks (multimodal user messages)
+
+User messages carry content in one of two shapes: a plain text string,
+or an ordered sequence of typed content blocks. The string form is the
+common case. Blocks are how you mix non-text modalities into a single
+turn. v1 defines two block types: text and image. Audio and video are
+deferred to future proposals.
+
+System, assistant, and tool messages stay text-string only. Image
+inputs are user-only in v1; image outputs (assistant-message-borne
+images, e.g. DALL-E-style generation) are out of scope.
+
+### Text and image blocks
+
+A text block is the array-form equivalent of a text-string message:
+`TextBlock(text="describe this")`. A user message holding a single
+text block is normatively equivalent to one with `content="describe
+this"`.
+
+An image block carries one source — URL or inline base64 — plus an
+optional `detail` hint:
+
+```python
+from openarmature.llm import (
+    ImageBlock,
+    ImageSourceInline,
+    ImageSourceURL,
+    OpenAIProvider,
+    TextBlock,
+    UserMessage,
+)
+
+
+async def describe_image(provider: OpenAIProvider) -> str:
+    response = await provider.complete(
+        [
+            UserMessage(
+                content=[
+                    ImageBlock(
+                        source=ImageSourceURL(url="https://example.com/diagram.png"),
+                        detail="high",  # optional; omitted from wire when None
+                    ),
+                    TextBlock(text="What does this diagram show?"),
+                ]
+            )
+        ]
+    )
+    return response.message.content
+```
+
+Block order is preserved on the wire. Providers vary in whether they
+treat order as semantically meaningful (an image followed by its
+describing text is a different signal from text followed by the
+image); construct the sequence in the order you want the model to
+perceive it.
+
+### URL vs inline sources
+
+- **URL source** (`ImageSourceURL`): the provider fetches the URL. Any
+  scheme the provider documents support for is valid (`http(s)://`,
+  `data:`, etc.). The framework passes it through unchanged.
+- **Inline source** (`ImageSourceInline`): the image is sent as
+  base64-encoded bytes in the request body. The `media_type` field on
+  the surrounding `ImageBlock` is **required** for inline sources (and
+  ignored for URL sources). The framework constructs an RFC 2397
+  `data:<media_type>;base64,<bytes>` URI for the wire; it does not
+  inspect, transcode, or re-encode the bytes.
+
+OpenAI, Anthropic, and Google all accept `image/png`, `image/jpeg`,
+and `image/webp` as guaranteed media types. `media_type` is typed as
+`str | None`, so callers MAY pass additional `image/*` types when
+they know the bound model supports them; portable code sticks to the
+three.
+
+### The `detail` hint
+
+`detail` is a per-image hint to the provider about processing
+fidelity: `"auto"`, `"low"`, or `"high"`. The class default is `None`,
+which **omits the field from the wire** and lets the provider apply
+its own default (conceptually `"auto"`). Setting `detail="auto"`
+explicitly on the spec block forces the wire to carry an explicit
+`"auto"` — usually unnecessary, since the provider's default is the
+same value.
+
+### When the model can't handle the block
+
+`provider_unsupported_content_block` raises when the bound model
+rejects a content block type or media variant. Concrete cases:
+
+- A text-only model (e.g., `gpt-3.5-turbo`) received an image block.
+- The model supports images but not the requested `media_type`.
+- The model supports the type but rejected the specific source variant
+  (a URL the provider can't fetch, for example).
+
+The error category is **non-transient**: retrying without changing
+the request, the bound model, or the provider won't succeed. Userland
+fallback patterns (e.g., a middleware that routes to a multimodal
+provider on this category) compose cleanly against it.
+
+`ProviderUnsupportedContentBlock` carries `block_type` ("image",
+"audio", "video") and `reason` (the provider's human-readable
+message) when those are recoverable from the rejection.
+
+`OpenAIProvider` detects content rejection via the response body —
+HTTP 400 with an error code like `image_content_not_supported` or a
+message like "does not support image inputs." Pre-send capability
+checks (failing fast before the wire trip when you know the model
+doesn't support images) live above the provider as userland
+middleware — the provider doesn't ship a static model-capability
+catalog.
+
 ## Routing on parsed fields
 
 A conditional edge is a function `state -> str` that names the next

diff --git a/docs/model-providers/authoring.md b/docs/model-providers/authoring.md
@@ -198,6 +198,16 @@ of:
 - **Tool calls.** Wire-mapping the `tool_calls` array on
   `AssistantMessage` to the Provider's expected shape, parsing tool
   results back from `ToolMessage`s.
+- **Content blocks (multimodal user input).** Wire-mapping the
+  `list[ContentBlock]` form of `UserMessage.content` to the provider's
+  multimodal shape (OpenAI's `image_url` content-array entries,
+  Anthropic's image blocks, Google's `inlineData` parts, etc.). The
+  spec types (`TextBlock`, `ImageBlock`, `ImageSourceURL`,
+  `ImageSourceInline`) are stable across providers; only the wire
+  shape differs. Provider authors targeting non-multimodal models
+  MUST surface `ProviderUnsupportedContentBlock` when the request
+  carries blocks the bound model can't serve — pre-send or
+  post-receive per §7.
 - **Structured output.** Threading `response_schema` through the
   request body (native `response_format` if the underlying wire
   supports it; prompt-augmentation fallback otherwise) and validating

diff --git a/docs/model-providers/index.md b/docs/model-providers/index.md
@@ -64,24 +64,35 @@ class Provider(Protocol):
 
 ## Errors
 
-Eight canonical error categories cover every failure mode:
-
-| Error                       | Trigger                                                                |
-| --------------------------- | ---------------------------------------------------------------------- |
-| `ProviderAuthentication`    | 401 / 403 (bad key, expired token)                                     |
-| `ProviderUnavailable`       | 5xx, network failure, timeout                                          |
-| `ProviderInvalidModel`      | Bound model doesn't exist on the provider                              |
-| `ProviderModelNotLoaded`    | Model known but not currently serving                                  |
-| `ProviderRateLimit`         | 429 (with `Retry-After` exposed)                                       |
-| `ProviderInvalidResponse`   | 200 OK that fails to parse                                             |
-| `ProviderInvalidRequest`    | Malformed request (per-message or list-level)                          |
-| `StructuredOutputInvalid`   | Response failed to parse as JSON or failed to validate against schema  |
+Nine canonical error categories cover every failure mode:
+
+| Error                              | Trigger                                                                |
+| ---------------------------------- | ---------------------------------------------------------------------- |
+| `ProviderAuthentication`           | 401 / 403 (bad key, expired token)                                     |
+| `ProviderUnavailable`              | 5xx, network failure, timeout                                          |
+| `ProviderInvalidModel`             | Bound model doesn't exist on the provider                              |
+| `ProviderModelNotLoaded`           | Model known but not currently serving                                  |
+| `ProviderRateLimit`                | 429 (with `Retry-After` exposed)                                       |
+| `ProviderInvalidResponse`          | 200 OK that fails to parse                                             |
+| `ProviderInvalidRequest`           | Malformed request (per-message or list-level)                          |
+| `ProviderUnsupportedContentBlock`  | Bound model rejected a content block (image / audio / media-type)      |
+| `StructuredOutputInvalid`          | Response failed to parse as JSON or failed to validate against schema  |
 
 Three of these (`Unavailable`, `RateLimit`, `ModelNotLoaded`) are
 exported in `TRANSIENT_CATEGORIES`, the canonical "safe to retry"
 set used by the default retry-middleware classifier.
-`StructuredOutputInvalid` is non-transient by default; see
-[Structured output](#structured-output) below.
+`StructuredOutputInvalid` and `ProviderUnsupportedContentBlock` are
+non-transient by default. See [Content blocks](../concepts/llms.md#content-blocks-multimodal-user-messages)
+in the LLMs concept page for the multimodal contract; see
+[Structured output](#structured-output) below for the
+`response_schema` path.
+
+`OpenAIProvider` detects unsupported-content-block rejections via
+the response body (HTTP 400 with an error code or message indicating
+content rejection) — a post-receive mapping rather than a static
+pre-send capability check. Pre-send protection is a userland
+middleware pattern when callers know the bound model's capabilities
+up front.
 
 ## Structured output
 

diff --git a/src/openarmature/llm/__init__.py b/src/openarmature/llm/__init__.py
@@ -30,6 +30,7 @@
     PROVIDER_MODEL_NOT_LOADED,
     PROVIDER_RATE_LIMIT,
     PROVIDER_UNAVAILABLE,
+    PROVIDER_UNSUPPORTED_CONTENT_BLOCK,
     STRUCTURED_OUTPUT_INVALID,
     TRANSIENT_CATEGORIES,
     LlmProviderError,
@@ -40,12 +41,19 @@
     ProviderModelNotLoaded,
     ProviderRateLimit,
     ProviderUnavailable,
+    ProviderUnsupportedContentBlock,
     StructuredOutputInvalid,
 )
 from .messages import (
     AssistantMessage,
+    ContentBlock,
+    ImageBlock,
+    ImageSource,
+    ImageSourceInline,
+    ImageSourceURL,
     Message,
     SystemMessage,
+    TextBlock,
     Tool,
     ToolCall,
     ToolMessage,
@@ -69,10 +77,16 @@
     "PROVIDER_MODEL_NOT_LOADED",
     "PROVIDER_RATE_LIMIT",
     "PROVIDER_UNAVAILABLE",
+    "PROVIDER_UNSUPPORTED_CONTENT_BLOCK",
     "STRUCTURED_OUTPUT_INVALID",
     "TRANSIENT_CATEGORIES",
     "AssistantMessage",
+    "ContentBlock",
     "FinishReason",
+    "ImageBlock",
+    "ImageSource",
+    "ImageSourceInline",
+    "ImageSourceURL",
     "LlmProviderError",
     "Message",
     "OpenAIProvider",
@@ -85,10 +99,12 @@
     "ProviderModelNotLoaded",
     "ProviderRateLimit",
     "ProviderUnavailable",
+    "ProviderUnsupportedContentBlock",
     "Response",
     "RuntimeConfig",
     "StructuredOutputInvalid",
     "SystemMessage",
+    "TextBlock",
     "Tool",
     "ToolCall",
     "ToolMessage",

diff --git a/src/openarmature/llm/errors.py b/src/openarmature/llm/errors.py
@@ -29,6 +29,7 @@
 PROVIDER_RATE_LIMIT = "provider_rate_limit"
 PROVIDER_INVALID_RESPONSE = "provider_invalid_response"
 PROVIDER_INVALID_REQUEST = "provider_invalid_request"
+PROVIDER_UNSUPPORTED_CONTENT_BLOCK = "provider_unsupported_content_block"
 STRUCTURED_OUTPUT_INVALID = "structured_output_invalid"
 
 
@@ -137,6 +138,48 @@ class ProviderInvalidRequest(LlmProviderError):
     category = PROVIDER_INVALID_REQUEST
 
 
+# Non-transient by default — the bound model's capability set does
+# not change between calls, so retrying without changing the request
+# (the message list, the bound model, or the provider) will not
+# succeed.
+#
+# Distinct from ProviderInvalidRequest. ProviderInvalidRequest covers
+# spec-shape violations (the request is malformed at the wire layer);
+# ProviderUnsupportedContentBlock covers capability mismatches (the
+# request is well-formed but the bound model can't fulfill it).
+# Splitting them lets callers route the unsupported-content case
+# differently (e.g., fall back to a multimodal-capable provider)
+# without overloading the malformed-request category.
+class ProviderUnsupportedContentBlock(LlmProviderError):
+    """Raised when the bound model does not support a content block
+    type used in the request.
+
+    Examples: a text-only model received an image block, or the model
+    supports images but not the requested ``media_type`` or ``source``
+    variant.
+
+    Attributes:
+        block_type: The block type that was rejected (e.g., ``"image"``),
+            when the provider's response makes this identifiable.
+        reason: The provider's human-readable description of the
+            rejection, when available.
+    """
+
+    category = PROVIDER_UNSUPPORTED_CONTENT_BLOCK
+    block_type: str | None
+    reason: str | None
+
+    def __init__(
+        self,
+        *args: Any,
+        block_type: str | None = None,
+        reason: str | None = None,
+    ) -> None:
+        super().__init__(*args)
+        self.block_type = block_type
+        self.reason = reason
+
+
 # Non-transient by default — a model that fails schema compliance on a
 # given prompt usually fails the same way on retry. The default
 # RetryMiddleware classifier does NOT retry this category. Users wanting
@@ -184,6 +227,7 @@ def __init__(
     "PROVIDER_MODEL_NOT_LOADED",
     "PROVIDER_RATE_LIMIT",
     "PROVIDER_UNAVAILABLE",
+    "PROVIDER_UNSUPPORTED_CONTENT_BLOCK",
     "STRUCTURED_OUTPUT_INVALID",
     "TRANSIENT_CATEGORIES",
     "LlmProviderError",
@@ -194,5 +238,6 @@ def __init__(
     "ProviderModelNotLoaded",
     "ProviderRateLimit",
     "ProviderUnavailable",
+    "ProviderUnsupportedContentBlock",
     "StructuredOutputInvalid",
 ]