feat(llm): image content blocks (proposal 0015) by chris-colinsky · Pull Request #44 · LunarCommand/openarmature-python

chris-colinsky · 2026-05-16T00:57:54Z

Summary

Implements spec proposal 0015 (image content blocks for user messages) in openarmature.llm. PR-2 of the five-PR batch following PR-1 (feat(llm): structured output (proposal 0016) #42, proposal 0016).
UserMessage.content extends to str | list[ContentBlock]. New content-block types: TextBlock, ImageBlock, ImageSourceURL, ImageSourceInline, plus the ContentBlock and ImageSource discriminated unions.
New non-transient error category ProviderUnsupportedContentBlock (the 9th canonical category). OpenAIProvider detects content-rejection 400s via a heuristic on error.code / error.type / error.message and routes there rather than the generic ProviderInvalidRequest.
OpenAI content-array wire mapping per §8.1.1: TextBlock → {type: "text", text}, ImageBlock (URL) → {type: "image_url", image_url: {url, detail?}}, ImageBlock (inline) → RFC 2397 data:<media_type>;base64,<bytes> URI routed through the same image_url shape. detail only goes on the wire when explicitly set; None (the class default) omits it and lets providers apply their own "auto".

What's new

Surface	Change
`UserMessage.content`	Accepts `str` (the v0.4.0 form) or `list[ContentBlock]`. Other roles (`system` / `assistant` / `tool`) stay text-string-only.
`TextBlock`, `ImageBlock`, `ImageSource*`	New Pydantic types under `openarmature.llm`. Discriminated unions on `type` field. `ImageBlock.detail` defaults to `None` (omitted from wire); `ImageBlock.media_type` is required for inline sources and typed as `str \| None` (provider may accept more than the three guaranteed `image/*` types).
`ProviderUnsupportedContentBlock`	9th canonical error category. Non-transient. Carries `block_type` and `reason`. Distinct from `ProviderInvalidRequest`: capability mismatch vs spec-shape malformation.
`OpenAIProvider._message_to_wire`	User case branches on content shape; content-block sequence maps to the content array.
`OpenAIProvider.classify_http_error`	HTTP 400 branch routes content-rejection bodies to `ProviderUnsupportedContentBlock`.
Conformance	12 new fixtures (`009`–`020`) all green; `_DEFERRED_FIXTURES` row count drops accordingly.

Release gate

PR-2 of a five-PR batch (0016 → 0015 → 0017 → 0014 → 0011). Do not tag a release until all five land — the CHANGELOG [Unreleased] Notes section carries the gate from PR-1.

Commits

feat(llm): add ProviderUnsupportedContentBlock error category
feat(llm): content-block types + UserMessage extension
feat(llm/openai): content-array wire mapping + content-rejection mapping
test(conformance): drive 0015 fixtures 009-020
test+docs: content-block unit tests + docs + CHANGELOG entry

Test plan

uv run pytest — 539 pass, 73 skipped (down from 84 in PR-1 — only the 16 fixtures for proposals 0011 / 0014 / 0017 remain), 0 failed.
uv run pyright — clean.
uv run ruff check + uv run ruff format — clean.
uv run --group docs mkdocs build --strict — clean.
All 12 new content-block conformance fixtures (009-content-blocks-text-only-equivalence through 020-content-blocks-inline-image-missing-media-type) pass.
Manual: structured-image-content-block call against live OpenAI gpt-4o-mini (or equivalent multimodal model). Worth doing once the PR is up; PR-1 verified the broader provider lifecycle, so this is the multimodal happy-path check.

Pre-1.0 SemVer

Additive change. Existing callers using UserMessage(content="…") see no behavior change — the new list[ContentBlock] shape is opt-in.

Adds the provider_unsupported_content_block canonical category from llm-provider §7 (introduced by proposal 0015). Raised when the bound model does not support a content block type used in the request (e.g., a text-only model received an image block, or the model supports images but not the requested media_type or source variant). The exception carries block_type and reason attributes so callers can route on the specific unsupported case; mirrors the precedent StructuredOutputInvalid set in PR-1 (carry the structured payload the caller needs for diagnostics + recovery). Non-transient by default — NOT added to TRANSIENT_CATEGORIES. The bound model's capability set doesn't change between calls, so retrying without changing the request, the bound model, or the provider won't succeed. Users who want fallback semantics MAY route on the category in a userland middleware (e.g., switch to a multimodal-capable provider). Distinct from ProviderInvalidRequest: ProviderInvalidRequest covers spec-shape violations (the request is malformed); this category covers capability mismatches (the request is well-formed but the bound model can't fulfill it).

Adds the content-block surface from llm-provider §3.1 (proposal 0015): - TextBlock(type, text) with a non-empty-text validator - ImageSourceURL(type, url) and ImageSourceInline(type, base64_data), joined by an ImageSource discriminated union over the source's ``type`` field - ImageBlock(type, source, media_type, detail) with a validator that rejects inline sources missing a media_type. detail defaults to None so the wire omits the field unless explicitly set (providers apply their own conceptual default of "auto"); the docstring spells out the subtle case of an explicit detail="auto" - ContentBlock discriminated union over TextBlock | ImageBlock UserMessage.content becomes ``str | list[ContentBlock]``. The existing _check_content validator extends to enforce the non-empty rule on both shapes. Other roles (system, assistant, tool) stay text-string only — content blocks are user-only in v1 per the spec. media_type is typed as ``str | None`` (not a Literal of the three guaranteed types) so callers can pass additional image/* types providers document support for.

Two extensions in OpenAIProvider for proposal 0015: - _message_to_wire's user case now branches on content shape: string maps directly (the v0.4.0 form); a content-block sequence maps to OpenAI's content-array form per §8.1.1 via the new _block_to_wire helper. TextBlock → {type: "text", text}. ImageBlock(URL) → {type: "image_url", image_url: {url, detail?}}. ImageBlock(inline) constructs an RFC 2397 data: URI from media_type + base64_data and routes through the same image_url entry shape. The detail hint goes on the wire only when the spec block has it set (None on the spec block omits it from the wire; providers apply their own default of "auto" per §3.1.2). - classify_http_error's 400 branch now routes content-rejection bodies to ProviderUnsupportedContentBlock rather than the generic ProviderInvalidRequest. Detection is a heuristic on error.code (known set: image_content_not_supported, unsupported_image_media_type, audio_content_not_supported, video_content_not_supported, unsupported_content_block; plus an image+not_supported substring fallback), error.type (image_parse_error, image_content_not_supported), and error.message ("does not support" + image/audio/video). The spec is implementation-defined on the detection rule (§8.3); the heuristic lives inline so it's evolvable as OpenAI's error-code surface shifts. _extract_rejected_block_type pulls a best-effort "image" / "audio" / "video" identifier out of the error code or message for surfacing on ProviderUnsupportedContentBlock.block_type.

Removes the 12 deferred-skip rows for content-block fixtures from both _DEFERRED_FIXTURES dicts (test_llm_provider.py runtime + the test_fixture_parsing.py typed parser). _build_message in test_llm_provider.py extends the user case to pass raw["content"] through (str or list) unchanged; Pydantic's discriminated union on the content-block ``type`` field parses each dict in the list to the right TextBlock / ImageBlock variant automatically. LlmCallSpec.messages in harness/directives.py is already typed as list[dict[str, Any]] (permissive), so the typed parser accepts the content-block list-of-dicts shape without model extensions. The parsing tests slip past for the 009-020 fixtures via the same path PR-1's 021-028 used. All 28 llm-provider conformance fixtures now pass (the prior 16 plus the 12 new content-block ones). Full suite: 515 pass, 72 skipped (down from 84 — only the 16 deferred fixtures for proposals 0011 / 0014 / 0017 remain).

Adds tests/unit/test_content_blocks.py (24 tests) covering bits the conformance fixtures don't exercise directly: - TextBlock / ImageBlock construction validation (non-empty text, inline-needs-media_type, detail enum, URL source can skip media_type) - UserMessage construction from dict-form content blocks (the path the conformance test fixture loader uses) - _block_to_wire mapping for text, URL with/without detail, inline base64 (RFC 2397 data URI construction) - classify_http_error 400 routing to ProviderUnsupportedContentBlock via the heuristic; negative cases (unrelated 400 stays ProviderInvalidRequest) - _extract_rejected_block_type picks up "image" / "audio" from error.code or error.message Docs: - docs/concepts/llms.md: new "Content blocks (multimodal user messages)" section between Structured output and Routing, covering the two content shapes, URL vs inline sources, the detail hint, and the new ProviderUnsupportedContentBlock category. - docs/model-providers/index.md: errors table extended to 9 categories with the new row + a Behaviour-guarantees note that OpenAIProvider does post-receive detection only; pre-send is a userland-middleware pattern. - docs/model-providers/authoring.md: "Beyond the skeleton" gains a content-blocks entry pointing custom-provider authors at the multimodal wire mapping + the unsupported-content category. CHANGELOG [Unreleased] gains 3 entries: the user-message content extension, the OpenAI wire mapping, and the new error category. All in the same release as PR-1's 0016 entries per the consolidated- release strategy.

Copilot

Pull request overview

Implements proposal 0015 (multimodal image content blocks) in openarmature.llm. Extends UserMessage.content to accept either a string or a list of typed content blocks, adds new block/source types and a new non-transient error category, and teaches OpenAIProvider how to map content-block sequences onto OpenAI's content-array wire shape and how to classify 400 content-rejection responses.

Changes:

New types TextBlock, ImageBlock, ImageSourceURL, ImageSourceInline and discriminated unions ContentBlock / ImageSource, with conditional media_type validation for inline sources.
New error category ProviderUnsupportedContentBlock (non-transient) with block_type/reason, plus OpenAIProvider HTTP 400 heuristic routing.
OpenAI _message_to_wire content-array branch with _block_to_wire, RFC 2397 data URI assembly for inline images, detail omitted from wire when None; conformance fixtures 009–020 un-deferred and unit tests added.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/openarmature/llm/messages.py`	Adds content-block types, discriminated unions, and updates `UserMessage.content` to dual-shape.
`src/openarmature/llm/errors.py`	Adds `ProviderUnsupportedContentBlock` category with `block_type`/`reason`.
`src/openarmature/llm/__init__.py`	Re-exports new types and error category.
`src/openarmature/llm/providers/openai.py`	Wire mapping for content blocks; 400 content-rejection heuristic.
`tests/unit/test_content_blocks.py`	Per-class construction, wire mapping, and 400 classification tests.
`tests/conformance/test_llm_provider.py`	Clears 0015 deferred fixtures; user content passes through unchanged for discriminated parsing.
`tests/conformance/test_fixture_parsing.py`	Comment update only — no longer carries 0015 fixture entries.
`docs/concepts/llms.md`	New "Content blocks" section.
`docs/model-providers/index.md`	Errors table updated to 9 categories.
`docs/model-providers/authoring.md`	Authoring guidance for the multimodal mapping.
`CHANGELOG.md`	Unreleased entry for proposal 0015.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- audio/video symmetry in the substring fallback of _looks_like_content_rejection - explicit isinstance(block, ImageBlock) guard in _block_to_wire to surface added union variants as a TypeError instead of an AttributeError on .source - clarify ImageBlock.media_type docstring: permitted but redundant on URL sources (the URL payload carries content-type), provider implementations MAY consume it as a hint - reword CHANGELOG qualifier '(proposal X, spec vY.Z)' → '(proposal X, introduced in spec vY.Z)' on the 0015 and 0016 entries so it doesn't read like a per-entry submodule pin change

chris-colinsky added 5 commits May 15, 2026 17:42

Copilot AI review requested due to automatic review settings May 16, 2026 00:57

Copilot started reviewing on behalf of chris-colinsky May 16, 2026 00:58 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Comment thread src/openarmature/llm/providers/openai.py Outdated

Comment thread src/openarmature/llm/messages.py

Comment thread src/openarmature/llm/providers/openai.py

Comment thread CHANGELOG.md Outdated

chris-colinsky merged commit 5f6f1e1 into main May 16, 2026
6 checks passed

chris-colinsky deleted the feature/0015-multimodal-images branch May 16, 2026 01:19

This was referenced May 16, 2026

feat(prompts): prompt-management core (proposal 0017) #45

Merged

feat(checkpoint): state migration for checkpoints (proposal 0014) #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm): image content blocks (proposal 0015)#44

feat(llm): image content blocks (proposal 0015)#44
chris-colinsky merged 6 commits into
mainfrom
feature/0015-multimodal-images

chris-colinsky commented May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chris-colinsky commented May 16, 2026

Summary

What's new

Release gate

Commits

Test plan

Pre-1.0 SemVer

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants