feat(llm): image content blocks (proposal 0015)#44
Merged
Conversation
Adds the provider_unsupported_content_block canonical category from llm-provider §7 (introduced by proposal 0015). Raised when the bound model does not support a content block type used in the request (e.g., a text-only model received an image block, or the model supports images but not the requested media_type or source variant). The exception carries block_type and reason attributes so callers can route on the specific unsupported case; mirrors the precedent StructuredOutputInvalid set in PR-1 (carry the structured payload the caller needs for diagnostics + recovery). Non-transient by default — NOT added to TRANSIENT_CATEGORIES. The bound model's capability set doesn't change between calls, so retrying without changing the request, the bound model, or the provider won't succeed. Users who want fallback semantics MAY route on the category in a userland middleware (e.g., switch to a multimodal-capable provider). Distinct from ProviderInvalidRequest: ProviderInvalidRequest covers spec-shape violations (the request is malformed); this category covers capability mismatches (the request is well-formed but the bound model can't fulfill it).
Adds the content-block surface from llm-provider §3.1 (proposal 0015): - TextBlock(type, text) with a non-empty-text validator - ImageSourceURL(type, url) and ImageSourceInline(type, base64_data), joined by an ImageSource discriminated union over the source's ``type`` field - ImageBlock(type, source, media_type, detail) with a validator that rejects inline sources missing a media_type. detail defaults to None so the wire omits the field unless explicitly set (providers apply their own conceptual default of "auto"); the docstring spells out the subtle case of an explicit detail="auto" - ContentBlock discriminated union over TextBlock | ImageBlock UserMessage.content becomes ``str | list[ContentBlock]``. The existing _check_content validator extends to enforce the non-empty rule on both shapes. Other roles (system, assistant, tool) stay text-string only — content blocks are user-only in v1 per the spec. media_type is typed as ``str | None`` (not a Literal of the three guaranteed types) so callers can pass additional image/* types providers document support for.
Two extensions in OpenAIProvider for proposal 0015:
- _message_to_wire's user case now branches on content shape: string
maps directly (the v0.4.0 form); a content-block sequence maps to
OpenAI's content-array form per §8.1.1 via the new _block_to_wire
helper. TextBlock → {type: "text", text}. ImageBlock(URL) →
{type: "image_url", image_url: {url, detail?}}. ImageBlock(inline)
constructs an RFC 2397 data: URI from media_type + base64_data and
routes through the same image_url entry shape. The detail hint goes
on the wire only when the spec block has it set (None on the spec
block omits it from the wire; providers apply their own default of
"auto" per §3.1.2).
- classify_http_error's 400 branch now routes content-rejection
bodies to ProviderUnsupportedContentBlock rather than the generic
ProviderInvalidRequest. Detection is a heuristic on error.code
(known set: image_content_not_supported,
unsupported_image_media_type, audio_content_not_supported,
video_content_not_supported, unsupported_content_block; plus an
image+not_supported substring fallback), error.type
(image_parse_error, image_content_not_supported), and
error.message ("does not support" + image/audio/video). The spec
is implementation-defined on the detection rule (§8.3); the
heuristic lives inline so it's evolvable as OpenAI's error-code
surface shifts.
_extract_rejected_block_type pulls a best-effort "image" / "audio"
/ "video" identifier out of the error code or message for surfacing
on ProviderUnsupportedContentBlock.block_type.
Removes the 12 deferred-skip rows for content-block fixtures from both _DEFERRED_FIXTURES dicts (test_llm_provider.py runtime + the test_fixture_parsing.py typed parser). _build_message in test_llm_provider.py extends the user case to pass raw["content"] through (str or list) unchanged; Pydantic's discriminated union on the content-block ``type`` field parses each dict in the list to the right TextBlock / ImageBlock variant automatically. LlmCallSpec.messages in harness/directives.py is already typed as list[dict[str, Any]] (permissive), so the typed parser accepts the content-block list-of-dicts shape without model extensions. The parsing tests slip past for the 009-020 fixtures via the same path PR-1's 021-028 used. All 28 llm-provider conformance fixtures now pass (the prior 16 plus the 12 new content-block ones). Full suite: 515 pass, 72 skipped (down from 84 — only the 16 deferred fixtures for proposals 0011 / 0014 / 0017 remain).
Adds tests/unit/test_content_blocks.py (24 tests) covering bits the conformance fixtures don't exercise directly: - TextBlock / ImageBlock construction validation (non-empty text, inline-needs-media_type, detail enum, URL source can skip media_type) - UserMessage construction from dict-form content blocks (the path the conformance test fixture loader uses) - _block_to_wire mapping for text, URL with/without detail, inline base64 (RFC 2397 data URI construction) - classify_http_error 400 routing to ProviderUnsupportedContentBlock via the heuristic; negative cases (unrelated 400 stays ProviderInvalidRequest) - _extract_rejected_block_type picks up "image" / "audio" from error.code or error.message Docs: - docs/concepts/llms.md: new "Content blocks (multimodal user messages)" section between Structured output and Routing, covering the two content shapes, URL vs inline sources, the detail hint, and the new ProviderUnsupportedContentBlock category. - docs/model-providers/index.md: errors table extended to 9 categories with the new row + a Behaviour-guarantees note that OpenAIProvider does post-receive detection only; pre-send is a userland-middleware pattern. - docs/model-providers/authoring.md: "Beyond the skeleton" gains a content-blocks entry pointing custom-provider authors at the multimodal wire mapping + the unsupported-content category. CHANGELOG [Unreleased] gains 3 entries: the user-message content extension, the OpenAI wire mapping, and the new error category. All in the same release as PR-1's 0016 entries per the consolidated- release strategy.
There was a problem hiding this comment.
Pull request overview
Implements proposal 0015 (multimodal image content blocks) in openarmature.llm. Extends UserMessage.content to accept either a string or a list of typed content blocks, adds new block/source types and a new non-transient error category, and teaches OpenAIProvider how to map content-block sequences onto OpenAI's content-array wire shape and how to classify 400 content-rejection responses.
Changes:
- New types
TextBlock,ImageBlock,ImageSourceURL,ImageSourceInlineand discriminated unionsContentBlock/ImageSource, with conditionalmedia_typevalidation for inline sources. - New error category
ProviderUnsupportedContentBlock(non-transient) withblock_type/reason, plusOpenAIProviderHTTP 400 heuristic routing. - OpenAI
_message_to_wirecontent-array branch with_block_to_wire, RFC 2397 data URI assembly for inline images,detailomitted from wire whenNone; conformance fixtures 009–020 un-deferred and unit tests added.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/openarmature/llm/messages.py |
Adds content-block types, discriminated unions, and updates UserMessage.content to dual-shape. |
src/openarmature/llm/errors.py |
Adds ProviderUnsupportedContentBlock category with block_type/reason. |
src/openarmature/llm/__init__.py |
Re-exports new types and error category. |
src/openarmature/llm/providers/openai.py |
Wire mapping for content blocks; 400 content-rejection heuristic. |
tests/unit/test_content_blocks.py |
Per-class construction, wire mapping, and 400 classification tests. |
tests/conformance/test_llm_provider.py |
Clears 0015 deferred fixtures; user content passes through unchanged for discriminated parsing. |
tests/conformance/test_fixture_parsing.py |
Comment update only — no longer carries 0015 fixture entries. |
docs/concepts/llms.md |
New "Content blocks" section. |
docs/model-providers/index.md |
Errors table updated to 9 categories. |
docs/model-providers/authoring.md |
Authoring guidance for the multimodal mapping. |
CHANGELOG.md |
Unreleased entry for proposal 0015. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- audio/video symmetry in the substring fallback of _looks_like_content_rejection - explicit isinstance(block, ImageBlock) guard in _block_to_wire to surface added union variants as a TypeError instead of an AttributeError on .source - clarify ImageBlock.media_type docstring: permitted but redundant on URL sources (the URL payload carries content-type), provider implementations MAY consume it as a hint - reword CHANGELOG qualifier '(proposal X, spec vY.Z)' → '(proposal X, introduced in spec vY.Z)' on the 0015 and 0016 entries so it doesn't read like a per-entry submodule pin change
This was referenced May 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
openarmature.llm. PR-2 of the five-PR batch following PR-1 (feat(llm): structured output (proposal 0016) #42, proposal 0016).UserMessage.contentextends tostr | list[ContentBlock]. New content-block types:TextBlock,ImageBlock,ImageSourceURL,ImageSourceInline, plus theContentBlockandImageSourcediscriminated unions.ProviderUnsupportedContentBlock(the 9th canonical category).OpenAIProviderdetects content-rejection 400s via a heuristic onerror.code/error.type/error.messageand routes there rather than the genericProviderInvalidRequest.TextBlock → {type: "text", text},ImageBlock(URL) →{type: "image_url", image_url: {url, detail?}},ImageBlock(inline) → RFC 2397data:<media_type>;base64,<bytes>URI routed through the sameimage_urlshape.detailonly goes on the wire when explicitly set;None(the class default) omits it and lets providers apply their own"auto".What's new
UserMessage.contentstr(the v0.4.0 form) orlist[ContentBlock]. Other roles (system/assistant/tool) stay text-string-only.TextBlock,ImageBlock,ImageSource*openarmature.llm. Discriminated unions ontypefield.ImageBlock.detaildefaults toNone(omitted from wire);ImageBlock.media_typeis required for inline sources and typed asstr | None(provider may accept more than the three guaranteedimage/*types).ProviderUnsupportedContentBlockblock_typeandreason. Distinct fromProviderInvalidRequest: capability mismatch vs spec-shape malformation.OpenAIProvider._message_to_wireOpenAIProvider.classify_http_errorProviderUnsupportedContentBlock.009–020) all green;_DEFERRED_FIXTURESrow count drops accordingly.Release gate
PR-2 of a five-PR batch (
0016→0015→0017→0014→0011). Do not tag a release until all five land — the CHANGELOG[Unreleased]Notes section carries the gate from PR-1.Commits
feat(llm): add ProviderUnsupportedContentBlock error categoryfeat(llm): content-block types + UserMessage extensionfeat(llm/openai): content-array wire mapping + content-rejection mappingtest(conformance): drive 0015 fixtures 009-020test+docs: content-block unit tests + docs + CHANGELOG entryTest plan
uv run pytest— 539 pass, 73 skipped (down from 84 in PR-1 — only the 16 fixtures for proposals 0011 / 0014 / 0017 remain), 0 failed.uv run pyright— clean.uv run ruff check+uv run ruff format— clean.uv run --group docs mkdocs build --strict— clean.009-content-blocks-text-only-equivalencethrough020-content-blocks-inline-image-missing-media-type) pass.gpt-4o-mini(or equivalent multimodal model). Worth doing once the PR is up; PR-1 verified the broader provider lifecycle, so this is the multimodal happy-path check.Pre-1.0 SemVer
Additive change. Existing callers using
UserMessage(content="…")see no behavior change — the newlist[ContentBlock]shape is opt-in.