Skip to content

feat(ai): add imageInputs / videoInputs / audioInputs for image-conditioned generation#624

Draft
tombeckenham wants to merge 2 commits into
mainfrom
618-image-to-image-and-image-to-video-support
Draft

feat(ai): add imageInputs / videoInputs / audioInputs for image-conditioned generation#624
tombeckenham wants to merge 2 commits into
mainfrom
618-image-to-image-and-image-to-video-support

Conversation

@tombeckenham
Copy link
Copy Markdown
Contributor

Closes #618.

Summary

Adds optional imageInputs?: ImagePart[], videoInputs?: VideoPart[], and audioInputs?: AudioPart[] to generateImage() and generateVideo() for image-to-image, multi-reference, mask / inpaint, and image-to-video flows. Each input part may carry an optional metadata.role hint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field.

The field reuses the existing ImagePart / VideoPart / AudioPart multimodal primitives — no new public types beyond the role convention.

await generateImage({
  adapter: openaiImage('gpt-image-1'),
  prompt: 'Turn this into a cinematic product photo',
  imageInputs: [productRef, styleRef],
})

await generateVideo({
  adapter: falVideo('fal-ai/kling-video/v3/pro/image-to-video'),
  prompt: 'Slow push-in then a hard cut',
  imageInputs: [
    { type: 'image', source: { type: 'url', value: firstFrameUrl } },
    { type: 'image', source: { type: 'url', value: lastFrameUrl }, metadata: { role: 'end_frame' } },
  ],
})

Design discussion + provider research captured on the issue: #618 (comment)

Provider behavior

Provider generateImage generateVideo
OpenAI gpt-image-1 / -mini → images.edit() (up to 16 + mask). dall-e-2 → edit (1). dall-e-3 throws. Sora-2 / -pro → input_reference (single only).
Gemini Native models → multimodal contents. Imagen throws (text-only). Veo adapter not yet implemented.
fal.ai 1 → image_url; >1 → image_urls; roles → mask_url / control_image_url / reference_image_urls. Interim mapping until the dedicated fal schemas library lands. Same plus start_image_url / end_image_url. Video-to-video via videoInputs.
Grok Throws — needs xAI Imagine API rewrite (follow-up). n/a
OpenRouter Throws — needs multimodal injection (follow-up). n/a
Anthropic n/a (no image gen). n/a

Source conversion: URL inputs are fetched + converted to File for OpenAI's upload-only edit endpoint; base64 sources are decoded to a Buffer and wrapped. fal / Gemini receive URLs and base64 data URIs directly.

Files

17 modified, 4 new, ~1000 net new lines.

  • Core types: packages/typescript/ai/src/types.ts (new MediaInputRole, MediaInputMetadata; ImageGenerationOptions and VideoGenerationOptions gain the three input fields)
  • Activities: generateImage and generateVideo thread the inputs through to adapters
  • Adapters: OpenAI image+video, Gemini image, fal image+video implement the routing; Grok + OpenRouter throw with a link to image-to-image and image-to-video support #618
  • Helpers: ai-openai/src/image/image-input-to-file.ts, ai-fal/src/image/image-inputs.ts

Tests

729 tests passing across 7 touched packages.

  • New OpenAI image-adapter tests: images.edit() routing, mask routing, dall-e-3 rejection, dall-e-2 single-image gate, videoInputs/audioInputs rejection.
  • New fal mapping tests: 11 cases covering single/multi sources, mask, reference, control, start/end frame, base64 data-URI encoding, duplicate-mask rejection.

E2E

'image-to-image' and 'image-to-video' feature flags added to testing/e2e/src/lib/types.ts with empty support sets in feature-support.ts. aimock doesn't currently mock /v1/images/edits or Sora's input_reference upload field, so functional E2E specs are deferred until aimock gains support. Adapter behavior is comprehensively covered by unit tests in the meantime.

Docs / Skill

  • docs/media/image-generation.md: new "Image-Conditioned Generation" section with role table, provider matrix, mask + multi-reference examples.
  • docs/media/video-generation.md: new "Image-to-Video" section.
  • packages/typescript/ai/skills/ai-core/media-generation/SKILL.md: new image-conditioned section + new HIGH common-mistake entry on passing inputs to unsupported models.
  • pnpm test:docs passes.

Follow-ups (separate issues / PRs)

  • Grok Imagine API adapter (image-to-image + image-to-video via xAI's native endpoint, not the OpenAI-compat shim).
  • OpenRouter multimodal injection for image-conditioned models routed through chat completions.
  • Gemini Veo video adapter.
  • Replace fal's interim per-endpoint heuristic with the dedicated fal schemas library once it lands.
  • Type-level gating: make imageInputs show up in the option type only for models that support it (per-model input capability map, mirroring how size and modelOptions resolve today). Sketched in PR conversation — landing as a follow-up to keep this PR scoped to runtime behavior.
  • aimock support for /v1/images/edits and Sora input_reference — unblocks functional E2E specs for the new flags.

Test plan

  • CI: pnpm test:lib, pnpm test:types, pnpm test:eslint, pnpm test:build, pnpm test:docs
  • Manual: run an examples/ts-react-chat-style flow with openaiImage('gpt-image-1') + a reference image, confirm round-trip
  • Manual: confirm dall-e-3 + imageInputs throws with a helpful message at runtime
  • Manual: confirm falImage('fal-ai/flux/dev') + a single imageInputs part routes to image_url

🤖 Generated with Claude Code

…tioned generation (closes #618)

Adds optional `imageInputs`, `videoInputs`, and `audioInputs` to `generateImage()`
and `generateVideo()` for image-to-image, multi-reference, mask / inpaint,
image-to-video, and starting-frame flows. Each input part may carry a
`metadata.role` hint (`'reference' | 'mask' | 'control' | 'start_frame' |
'end_frame' | 'character'`) that adapters use to route to the provider-specific
field.

Provider behavior:
- OpenAI image: gpt-image-1 / -mini route to `images.edit()` (up to 16 + mask);
  dall-e-2 routes to `images.edit()` with one source; dall-e-3 throws.
- OpenAI video: Sora-2 / -pro accept a single `input_reference`; throws on >1.
- Gemini: native models receive inputs as multimodal `contents` parts; Imagen
  throws (text-only).
- fal: 1 input → `image_url`, >1 → `image_urls`; metadata roles map to
  `mask_url` / `control_image_url` / `reference_image_urls`; video adds
  `start_image_url` / `end_image_url`. Interim mapping until the fal schemas
  library lands.
- Grok, OpenRouter: throw with a link back to #618 (pending native Imagine API
  rewrite and multimodal injection work respectively).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tombeckenham tombeckenham linked an issue May 22, 2026 that may be closed by this pull request
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 22, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f9163c2a-500d-45b2-818e-ccb9589a2742

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 618-image-to-image-and-image-to-video-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

🚀 Changeset Version Preview

7 package(s) bumped directly, 23 bumped as dependents.

🟥 Major bumps

Package Version Reason
@tanstack/ai-event-client 0.3.8 → 1.0.0 Changeset
@tanstack/ai-fal 0.7.11 → 1.0.0 Changeset
@tanstack/ai-gemini 0.10.9 → 1.0.0 Changeset
@tanstack/ai-grok 0.8.6 → 1.0.0 Changeset
@tanstack/ai-openai 0.9.6 → 1.0.0 Changeset
@tanstack/ai-openrouter 0.9.6 → 1.0.0 Changeset
@tanstack/ai-anthropic 0.10.2 → 1.0.0 Dependent
@tanstack/ai-code-mode 0.1.18 → 1.0.0 Dependent
@tanstack/ai-code-mode-skills 0.1.18 → 1.0.0 Dependent
@tanstack/ai-elevenlabs 0.2.9 → 1.0.0 Dependent
@tanstack/ai-groq 0.2.5 → 1.0.0 Dependent
@tanstack/ai-isolate-node 0.1.18 → 1.0.0 Dependent
@tanstack/ai-isolate-quickjs 0.1.18 → 1.0.0 Dependent
@tanstack/ai-ollama 0.6.20 → 1.0.0 Dependent
@tanstack/ai-preact 0.6.30 → 1.0.0 Dependent
@tanstack/ai-react 0.11.5 → 1.0.0 Dependent
@tanstack/ai-react-ui 0.8.0 → 1.0.0 Dependent
@tanstack/ai-solid 0.10.5 → 1.0.0 Dependent
@tanstack/ai-solid-ui 0.7.0 → 1.0.0 Dependent
@tanstack/ai-svelte 0.10.5 → 1.0.0 Dependent
@tanstack/ai-vue 0.10.6 → 1.0.0 Dependent
@tanstack/openai-base 0.3.5 → 1.0.0 Dependent

🟨 Minor bumps

Package Version Reason
@tanstack/ai 0.21.1 → 0.22.0 Changeset

🟩 Patch bumps

Package Version Reason
@tanstack/ai-client 0.11.5 → 0.11.6 Dependent
@tanstack/ai-devtools-core 0.3.35 → 0.3.36 Dependent
@tanstack/ai-isolate-cloudflare 0.2.9 → 0.2.10 Dependent
@tanstack/ai-vue-ui 0.2.1 → 0.2.2 Dependent
@tanstack/preact-ai-devtools 0.1.39 → 0.1.40 Dependent
@tanstack/react-ai-devtools 0.2.39 → 0.2.40 Dependent
@tanstack/solid-ai-devtools 0.2.39 → 0.2.40 Dependent

@nx-cloud
Copy link
Copy Markdown

nx-cloud Bot commented May 22, 2026

View your CI Pipeline Execution ↗ for commit 92857e6

Command Status Duration Result
nx run-many --targets=build --exclude=examples/... ✅ Succeeded 1m 9s View ↗

☁️ Nx Cloud last updated this comment at 2026-05-22 09:59:09 UTC

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 22, 2026

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/@tanstack/ai@624

@tanstack/ai-anthropic

npm i https://pkg.pr.new/@tanstack/ai-anthropic@624

@tanstack/ai-client

npm i https://pkg.pr.new/@tanstack/ai-client@624

@tanstack/ai-code-mode

npm i https://pkg.pr.new/@tanstack/ai-code-mode@624

@tanstack/ai-code-mode-skills

npm i https://pkg.pr.new/@tanstack/ai-code-mode-skills@624

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/@tanstack/ai-devtools-core@624

@tanstack/ai-elevenlabs

npm i https://pkg.pr.new/@tanstack/ai-elevenlabs@624

@tanstack/ai-event-client

npm i https://pkg.pr.new/@tanstack/ai-event-client@624

@tanstack/ai-fal

npm i https://pkg.pr.new/@tanstack/ai-fal@624

@tanstack/ai-gemini

npm i https://pkg.pr.new/@tanstack/ai-gemini@624

@tanstack/ai-grok

npm i https://pkg.pr.new/@tanstack/ai-grok@624

@tanstack/ai-groq

npm i https://pkg.pr.new/@tanstack/ai-groq@624

@tanstack/ai-isolate-cloudflare

npm i https://pkg.pr.new/@tanstack/ai-isolate-cloudflare@624

@tanstack/ai-isolate-node

npm i https://pkg.pr.new/@tanstack/ai-isolate-node@624

@tanstack/ai-isolate-quickjs

npm i https://pkg.pr.new/@tanstack/ai-isolate-quickjs@624

@tanstack/ai-ollama

npm i https://pkg.pr.new/@tanstack/ai-ollama@624

@tanstack/ai-openai

npm i https://pkg.pr.new/@tanstack/ai-openai@624

@tanstack/ai-openrouter

npm i https://pkg.pr.new/@tanstack/ai-openrouter@624

@tanstack/ai-preact

npm i https://pkg.pr.new/@tanstack/ai-preact@624

@tanstack/ai-react

npm i https://pkg.pr.new/@tanstack/ai-react@624

@tanstack/ai-react-ui

npm i https://pkg.pr.new/@tanstack/ai-react-ui@624

@tanstack/ai-solid

npm i https://pkg.pr.new/@tanstack/ai-solid@624

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/@tanstack/ai-solid-ui@624

@tanstack/ai-svelte

npm i https://pkg.pr.new/@tanstack/ai-svelte@624

@tanstack/ai-utils

npm i https://pkg.pr.new/@tanstack/ai-utils@624

@tanstack/ai-vue

npm i https://pkg.pr.new/@tanstack/ai-vue@624

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/@tanstack/ai-vue-ui@624

@tanstack/openai-base

npm i https://pkg.pr.new/@tanstack/openai-base@624

@tanstack/preact-ai-devtools

npm i https://pkg.pr.new/@tanstack/preact-ai-devtools@624

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/@tanstack/react-ai-devtools@624

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/@tanstack/solid-ai-devtools@624

commit: 92857e6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

image-to-image and image-to-video support

1 participant