Skip to content

feat(llm): image content blocks (proposal 0015)#44

Merged
chris-colinsky merged 6 commits into
mainfrom
feature/0015-multimodal-images
May 16, 2026
Merged

feat(llm): image content blocks (proposal 0015)#44
chris-colinsky merged 6 commits into
mainfrom
feature/0015-multimodal-images

Conversation

@chris-colinsky
Copy link
Copy Markdown
Member

Summary

  • Implements spec proposal 0015 (image content blocks for user messages) in openarmature.llm. PR-2 of the five-PR batch following PR-1 (feat(llm): structured output (proposal 0016) #42, proposal 0016).
  • UserMessage.content extends to str | list[ContentBlock]. New content-block types: TextBlock, ImageBlock, ImageSourceURL, ImageSourceInline, plus the ContentBlock and ImageSource discriminated unions.
  • New non-transient error category ProviderUnsupportedContentBlock (the 9th canonical category). OpenAIProvider detects content-rejection 400s via a heuristic on error.code / error.type / error.message and routes there rather than the generic ProviderInvalidRequest.
  • OpenAI content-array wire mapping per §8.1.1: TextBlock → {type: "text", text}, ImageBlock (URL) → {type: "image_url", image_url: {url, detail?}}, ImageBlock (inline) → RFC 2397 data:<media_type>;base64,<bytes> URI routed through the same image_url shape. detail only goes on the wire when explicitly set; None (the class default) omits it and lets providers apply their own "auto".

What's new

Surface Change
UserMessage.content Accepts str (the v0.4.0 form) or list[ContentBlock]. Other roles (system / assistant / tool) stay text-string-only.
TextBlock, ImageBlock, ImageSource* New Pydantic types under openarmature.llm. Discriminated unions on type field. ImageBlock.detail defaults to None (omitted from wire); ImageBlock.media_type is required for inline sources and typed as str | None (provider may accept more than the three guaranteed image/* types).
ProviderUnsupportedContentBlock 9th canonical error category. Non-transient. Carries block_type and reason. Distinct from ProviderInvalidRequest: capability mismatch vs spec-shape malformation.
OpenAIProvider._message_to_wire User case branches on content shape; content-block sequence maps to the content array.
OpenAIProvider.classify_http_error HTTP 400 branch routes content-rejection bodies to ProviderUnsupportedContentBlock.
Conformance 12 new fixtures (009020) all green; _DEFERRED_FIXTURES row count drops accordingly.

Release gate

PR-2 of a five-PR batch (00160015001700140011). Do not tag a release until all five land — the CHANGELOG [Unreleased] Notes section carries the gate from PR-1.

Commits

  1. feat(llm): add ProviderUnsupportedContentBlock error category
  2. feat(llm): content-block types + UserMessage extension
  3. feat(llm/openai): content-array wire mapping + content-rejection mapping
  4. test(conformance): drive 0015 fixtures 009-020
  5. test+docs: content-block unit tests + docs + CHANGELOG entry

Test plan

  • uv run pytest — 539 pass, 73 skipped (down from 84 in PR-1 — only the 16 fixtures for proposals 0011 / 0014 / 0017 remain), 0 failed.
  • uv run pyright — clean.
  • uv run ruff check + uv run ruff format — clean.
  • uv run --group docs mkdocs build --strict — clean.
  • All 12 new content-block conformance fixtures (009-content-blocks-text-only-equivalence through 020-content-blocks-inline-image-missing-media-type) pass.
  • Manual: structured-image-content-block call against live OpenAI gpt-4o-mini (or equivalent multimodal model). Worth doing once the PR is up; PR-1 verified the broader provider lifecycle, so this is the multimodal happy-path check.

Pre-1.0 SemVer

Additive change. Existing callers using UserMessage(content="…") see no behavior change — the new list[ContentBlock] shape is opt-in.

Adds the provider_unsupported_content_block canonical category from
llm-provider §7 (introduced by proposal 0015). Raised when the bound
model does not support a content block type used in the request
(e.g., a text-only model received an image block, or the model
supports images but not the requested media_type or source variant).

The exception carries block_type and reason attributes so callers
can route on the specific unsupported case; mirrors the precedent
StructuredOutputInvalid set in PR-1 (carry the structured payload
the caller needs for diagnostics + recovery).

Non-transient by default — NOT added to TRANSIENT_CATEGORIES. The
bound model's capability set doesn't change between calls, so
retrying without changing the request, the bound model, or the
provider won't succeed. Users who want fallback semantics MAY route
on the category in a userland middleware (e.g., switch to a
multimodal-capable provider).

Distinct from ProviderInvalidRequest: ProviderInvalidRequest covers
spec-shape violations (the request is malformed); this category
covers capability mismatches (the request is well-formed but the
bound model can't fulfill it).
Adds the content-block surface from llm-provider §3.1 (proposal 0015):

- TextBlock(type, text) with a non-empty-text validator
- ImageSourceURL(type, url) and ImageSourceInline(type, base64_data),
  joined by an ImageSource discriminated union over the source's
  ``type`` field
- ImageBlock(type, source, media_type, detail) with a validator that
  rejects inline sources missing a media_type. detail defaults to
  None so the wire omits the field unless explicitly set (providers
  apply their own conceptual default of "auto"); the docstring spells
  out the subtle case of an explicit detail="auto"
- ContentBlock discriminated union over TextBlock | ImageBlock

UserMessage.content becomes ``str | list[ContentBlock]``. The existing
_check_content validator extends to enforce the non-empty rule on
both shapes. Other roles (system, assistant, tool) stay text-string
only — content blocks are user-only in v1 per the spec.

media_type is typed as ``str | None`` (not a Literal of the three
guaranteed types) so callers can pass additional image/* types
providers document support for.
Two extensions in OpenAIProvider for proposal 0015:

- _message_to_wire's user case now branches on content shape: string
  maps directly (the v0.4.0 form); a content-block sequence maps to
  OpenAI's content-array form per §8.1.1 via the new _block_to_wire
  helper. TextBlock → {type: "text", text}. ImageBlock(URL) →
  {type: "image_url", image_url: {url, detail?}}. ImageBlock(inline)
  constructs an RFC 2397 data: URI from media_type + base64_data and
  routes through the same image_url entry shape. The detail hint goes
  on the wire only when the spec block has it set (None on the spec
  block omits it from the wire; providers apply their own default of
  "auto" per §3.1.2).

- classify_http_error's 400 branch now routes content-rejection
  bodies to ProviderUnsupportedContentBlock rather than the generic
  ProviderInvalidRequest. Detection is a heuristic on error.code
  (known set: image_content_not_supported,
  unsupported_image_media_type, audio_content_not_supported,
  video_content_not_supported, unsupported_content_block; plus an
  image+not_supported substring fallback), error.type
  (image_parse_error, image_content_not_supported), and
  error.message ("does not support" + image/audio/video). The spec
  is implementation-defined on the detection rule (§8.3); the
  heuristic lives inline so it's evolvable as OpenAI's error-code
  surface shifts.

_extract_rejected_block_type pulls a best-effort "image" / "audio"
/ "video" identifier out of the error code or message for surfacing
on ProviderUnsupportedContentBlock.block_type.
Removes the 12 deferred-skip rows for content-block fixtures from
both _DEFERRED_FIXTURES dicts (test_llm_provider.py runtime + the
test_fixture_parsing.py typed parser).

_build_message in test_llm_provider.py extends the user case to
pass raw["content"] through (str or list) unchanged; Pydantic's
discriminated union on the content-block ``type`` field parses each
dict in the list to the right TextBlock / ImageBlock variant
automatically.

LlmCallSpec.messages in harness/directives.py is already typed as
list[dict[str, Any]] (permissive), so the typed parser accepts the
content-block list-of-dicts shape without model extensions. The
parsing tests slip past for the 009-020 fixtures via the same path
PR-1's 021-028 used.

All 28 llm-provider conformance fixtures now pass (the prior 16
plus the 12 new content-block ones). Full suite: 515 pass, 72
skipped (down from 84 — only the 16 deferred fixtures for
proposals 0011 / 0014 / 0017 remain).
Adds tests/unit/test_content_blocks.py (24 tests) covering bits
the conformance fixtures don't exercise directly:

- TextBlock / ImageBlock construction validation (non-empty text,
  inline-needs-media_type, detail enum, URL source can skip
  media_type)
- UserMessage construction from dict-form content blocks (the path
  the conformance test fixture loader uses)
- _block_to_wire mapping for text, URL with/without detail, inline
  base64 (RFC 2397 data URI construction)
- classify_http_error 400 routing to ProviderUnsupportedContentBlock
  via the heuristic; negative cases (unrelated 400 stays
  ProviderInvalidRequest)
- _extract_rejected_block_type picks up "image" / "audio" from
  error.code or error.message

Docs:

- docs/concepts/llms.md: new "Content blocks (multimodal user
  messages)" section between Structured output and Routing,
  covering the two content shapes, URL vs inline sources, the
  detail hint, and the new ProviderUnsupportedContentBlock category.
- docs/model-providers/index.md: errors table extended to 9
  categories with the new row + a Behaviour-guarantees note that
  OpenAIProvider does post-receive detection only; pre-send is a
  userland-middleware pattern.
- docs/model-providers/authoring.md: "Beyond the skeleton" gains
  a content-blocks entry pointing custom-provider authors at the
  multimodal wire mapping + the unsupported-content category.

CHANGELOG [Unreleased] gains 3 entries: the user-message content
extension, the OpenAI wire mapping, and the new error category. All
in the same release as PR-1's 0016 entries per the consolidated-
release strategy.
Copilot AI review requested due to automatic review settings May 16, 2026 00:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements proposal 0015 (multimodal image content blocks) in openarmature.llm. Extends UserMessage.content to accept either a string or a list of typed content blocks, adds new block/source types and a new non-transient error category, and teaches OpenAIProvider how to map content-block sequences onto OpenAI's content-array wire shape and how to classify 400 content-rejection responses.

Changes:

  • New types TextBlock, ImageBlock, ImageSourceURL, ImageSourceInline and discriminated unions ContentBlock / ImageSource, with conditional media_type validation for inline sources.
  • New error category ProviderUnsupportedContentBlock (non-transient) with block_type/reason, plus OpenAIProvider HTTP 400 heuristic routing.
  • OpenAI _message_to_wire content-array branch with _block_to_wire, RFC 2397 data URI assembly for inline images, detail omitted from wire when None; conformance fixtures 009–020 un-deferred and unit tests added.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/openarmature/llm/messages.py Adds content-block types, discriminated unions, and updates UserMessage.content to dual-shape.
src/openarmature/llm/errors.py Adds ProviderUnsupportedContentBlock category with block_type/reason.
src/openarmature/llm/__init__.py Re-exports new types and error category.
src/openarmature/llm/providers/openai.py Wire mapping for content blocks; 400 content-rejection heuristic.
tests/unit/test_content_blocks.py Per-class construction, wire mapping, and 400 classification tests.
tests/conformance/test_llm_provider.py Clears 0015 deferred fixtures; user content passes through unchanged for discriminated parsing.
tests/conformance/test_fixture_parsing.py Comment update only — no longer carries 0015 fixture entries.
docs/concepts/llms.md New "Content blocks" section.
docs/model-providers/index.md Errors table updated to 9 categories.
docs/model-providers/authoring.md Authoring guidance for the multimodal mapping.
CHANGELOG.md Unreleased entry for proposal 0015.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/openarmature/llm/providers/openai.py Outdated
Comment thread src/openarmature/llm/messages.py
Comment thread src/openarmature/llm/providers/openai.py
Comment thread CHANGELOG.md Outdated
- audio/video symmetry in the substring fallback of
  _looks_like_content_rejection
- explicit isinstance(block, ImageBlock) guard in _block_to_wire to
  surface added union variants as a TypeError instead of an
  AttributeError on .source
- clarify ImageBlock.media_type docstring: permitted but redundant on
  URL sources (the URL payload carries content-type), provider
  implementations MAY consume it as a hint
- reword CHANGELOG qualifier '(proposal X, spec vY.Z)' →
  '(proposal X, introduced in spec vY.Z)' on the 0015 and 0016 entries
  so it doesn't read like a per-entry submodule pin change
@chris-colinsky chris-colinsky merged commit 5f6f1e1 into main May 16, 2026
6 checks passed
@chris-colinsky chris-colinsky deleted the feature/0015-multimodal-images branch May 16, 2026 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants