feat(ai): add imageInputs / videoInputs / audioInputs for image-conditioned generation by tombeckenham · Pull Request #624 · TanStack/ai

tombeckenham · 2026-05-22T09:55:32Z

Closes #618.

Summary

Adds optional imageInputs?: ImagePart[], videoInputs?: VideoPart[], and audioInputs?: AudioPart[] to generateImage() and generateVideo() for image-to-image, multi-reference, mask / inpaint, and image-to-video flows. Each input part may carry an optional metadata.role hint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field.

The field reuses the existing ImagePart / VideoPart / AudioPart multimodal primitives — no new public types beyond the role convention.

await generateImage({
  adapter: openaiImage('gpt-image-1'),
  prompt: 'Turn this into a cinematic product photo',
  imageInputs: [productRef, styleRef],
})

await generateVideo({
  adapter: falVideo('fal-ai/kling-video/v3/pro/image-to-video'),
  prompt: 'Slow push-in then a hard cut',
  imageInputs: [
    { type: 'image', source: { type: 'url', value: firstFrameUrl } },
    { type: 'image', source: { type: 'url', value: lastFrameUrl }, metadata: { role: 'end_frame' } },
  ],
})

Design discussion + provider research captured on the issue: #618 (comment)

Provider behavior

Provider	`generateImage`	`generateVideo`
OpenAI	gpt-image-1 / -mini → `images.edit()` (up to 16 + mask). dall-e-2 → edit (1). dall-e-3 throws.	Sora-2 / -pro → `input_reference` (single only).
Gemini	Native models → multimodal `contents`. Imagen throws (text-only).	Veo adapter not yet implemented.
fal.ai	1 → `image_url`; >1 → `image_urls`; roles → `mask_url` / `control_image_url` / `reference_image_urls`. Interim mapping until the dedicated fal schemas library lands.	Same plus `start_image_url` / `end_image_url`. Video-to-video via `videoInputs`.
Grok	Throws — needs xAI Imagine API rewrite (follow-up).	n/a
OpenRouter	Throws — needs multimodal injection (follow-up).	n/a
Anthropic	n/a (no image gen).	n/a

Source conversion: URL inputs are fetched + converted to File for OpenAI's upload-only edit endpoint; base64 sources are decoded to a Buffer and wrapped. fal / Gemini receive URLs and base64 data URIs directly.

Files

17 modified, 4 new, ~1000 net new lines.

Core types: packages/typescript/ai/src/types.ts (new MediaInputRole, MediaInputMetadata; ImageGenerationOptions and VideoGenerationOptions gain the three input fields)
Activities: generateImage and generateVideo thread the inputs through to adapters
Adapters: OpenAI image+video, Gemini image, fal image+video implement the routing; Grok + OpenRouter throw with a link to image-to-image and image-to-video support #618
Helpers: ai-openai/src/image/image-input-to-file.ts, ai-fal/src/image/image-inputs.ts

Tests

729 tests passing across 7 touched packages.

New OpenAI image-adapter tests: images.edit() routing, mask routing, dall-e-3 rejection, dall-e-2 single-image gate, videoInputs/audioInputs rejection.
New fal mapping tests: 11 cases covering single/multi sources, mask, reference, control, start/end frame, base64 data-URI encoding, duplicate-mask rejection.

E2E

'image-to-image' and 'image-to-video' feature flags added to testing/e2e/src/lib/types.ts with empty support sets in feature-support.ts. aimock doesn't currently mock /v1/images/edits or Sora's input_reference upload field, so functional E2E specs are deferred until aimock gains support. Adapter behavior is comprehensively covered by unit tests in the meantime.

Docs / Skill

docs/media/image-generation.md: new "Image-Conditioned Generation" section with role table, provider matrix, mask + multi-reference examples.
docs/media/video-generation.md: new "Image-to-Video" section.
packages/typescript/ai/skills/ai-core/media-generation/SKILL.md: new image-conditioned section + new HIGH common-mistake entry on passing inputs to unsupported models.
pnpm test:docs passes.

Follow-ups (separate issues / PRs)

Grok Imagine API adapter (image-to-image + image-to-video via xAI's native endpoint, not the OpenAI-compat shim).
OpenRouter multimodal injection for image-conditioned models routed through chat completions.
Gemini Veo video adapter.
Replace fal's interim per-endpoint heuristic with the dedicated fal schemas library once it lands.
Type-level gating: make imageInputs show up in the option type only for models that support it (per-model input capability map, mirroring how size and modelOptions resolve today). Sketched in PR conversation — landing as a follow-up to keep this PR scoped to runtime behavior.
aimock support for /v1/images/edits and Sora input_reference — unblocks functional E2E specs for the new flags.

Test plan

CI: pnpm test:lib, pnpm test:types, pnpm test:eslint, pnpm test:build, pnpm test:docs
Manual: run an examples/ts-react-chat-style flow with openaiImage('gpt-image-1') + a reference image, confirm round-trip
Manual: confirm dall-e-3 + imageInputs throws with a helpful message at runtime
Manual: confirm falImage('fal-ai/flux/dev') + a single imageInputs part routes to image_url

🤖 Generated with Claude Code

…tioned generation (closes #618) Adds optional `imageInputs`, `videoInputs`, and `audioInputs` to `generateImage()` and `generateVideo()` for image-to-image, multi-reference, mask / inpaint, image-to-video, and starting-frame flows. Each input part may carry a `metadata.role` hint (`'reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character'`) that adapters use to route to the provider-specific field. Provider behavior: - OpenAI image: gpt-image-1 / -mini route to `images.edit()` (up to 16 + mask); dall-e-2 routes to `images.edit()` with one source; dall-e-3 throws. - OpenAI video: Sora-2 / -pro accept a single `input_reference`; throws on >1. - Gemini: native models receive inputs as multimodal `contents` parts; Imagen throws (text-only). - fal: 1 input → `image_url`, >1 → `image_urls`; metadata roles map to `mask_url` / `control_image_url` / `reference_image_urls`; video adds `start_image_url` / `end_image_url`. Interim mapping until the fal schemas library lands. - Grok, OpenRouter: throw with a link back to #618 (pending native Imagine API rewrite and multimodal injection work respectively). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-22T09:55:39Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f9163c2a-500d-45b2-818e-ccb9589a2742

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch 618-image-to-image-and-image-to-video-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-22T09:56:23Z

🚀 Changeset Version Preview

7 package(s) bumped directly, 23 bumped as dependents.

🟥 Major bumps

Package	Version	Reason
`@tanstack/ai-event-client`	0.3.8 → 1.0.0	Changeset
`@tanstack/ai-fal`	0.7.11 → 1.0.0	Changeset
`@tanstack/ai-gemini`	0.10.9 → 1.0.0	Changeset
`@tanstack/ai-grok`	0.8.6 → 1.0.0	Changeset
`@tanstack/ai-openai`	0.9.6 → 1.0.0	Changeset
`@tanstack/ai-openrouter`	0.9.6 → 1.0.0	Changeset
`@tanstack/ai-anthropic`	0.10.2 → 1.0.0	Dependent
`@tanstack/ai-code-mode`	0.1.18 → 1.0.0	Dependent
`@tanstack/ai-code-mode-skills`	0.1.18 → 1.0.0	Dependent
`@tanstack/ai-elevenlabs`	0.2.9 → 1.0.0	Dependent
`@tanstack/ai-groq`	0.2.5 → 1.0.0	Dependent
`@tanstack/ai-isolate-node`	0.1.18 → 1.0.0	Dependent
`@tanstack/ai-isolate-quickjs`	0.1.18 → 1.0.0	Dependent
`@tanstack/ai-ollama`	0.6.20 → 1.0.0	Dependent
`@tanstack/ai-preact`	0.6.30 → 1.0.0	Dependent
`@tanstack/ai-react`	0.11.5 → 1.0.0	Dependent
`@tanstack/ai-react-ui`	0.8.0 → 1.0.0	Dependent
`@tanstack/ai-solid`	0.10.5 → 1.0.0	Dependent
`@tanstack/ai-solid-ui`	0.7.0 → 1.0.0	Dependent
`@tanstack/ai-svelte`	0.10.5 → 1.0.0	Dependent
`@tanstack/ai-vue`	0.10.6 → 1.0.0	Dependent
`@tanstack/openai-base`	0.3.5 → 1.0.0	Dependent

🟨 Minor bumps

Package	Version	Reason
`@tanstack/ai`	0.21.1 → 0.22.0	Changeset

🟩 Patch bumps

Package	Version	Reason
`@tanstack/ai-client`	0.11.5 → 0.11.6	Dependent
`@tanstack/ai-devtools-core`	0.3.35 → 0.3.36	Dependent
`@tanstack/ai-isolate-cloudflare`	0.2.9 → 0.2.10	Dependent
`@tanstack/ai-vue-ui`	0.2.1 → 0.2.2	Dependent
`@tanstack/preact-ai-devtools`	0.1.39 → 0.1.40	Dependent
`@tanstack/react-ai-devtools`	0.2.39 → 0.2.40	Dependent
`@tanstack/solid-ai-devtools`	0.2.39 → 0.2.40	Dependent

nx-cloud · 2026-05-22T09:57:30Z

View your CI Pipeline Execution ↗ for commit 92857e6

Command	Status	Duration	Result
`nx run-many --targets=build --exclude=examples/...`	✅ Succeeded	1m 9s	View ↗

☁️ Nx Cloud last updated this comment at 2026-05-22 09:59:09 UTC

pkg-pr-new · 2026-05-22T09:59:24Z

Open in StackBlitz

@tanstack/ai

npm i https://pkg.pr.new/@tanstack/ai@624

@tanstack/ai-anthropic

npm i https://pkg.pr.new/@tanstack/ai-anthropic@624

@tanstack/ai-client

npm i https://pkg.pr.new/@tanstack/ai-client@624

@tanstack/ai-code-mode

npm i https://pkg.pr.new/@tanstack/ai-code-mode@624

@tanstack/ai-code-mode-skills

npm i https://pkg.pr.new/@tanstack/ai-code-mode-skills@624

@tanstack/ai-devtools-core

npm i https://pkg.pr.new/@tanstack/ai-devtools-core@624

@tanstack/ai-elevenlabs

npm i https://pkg.pr.new/@tanstack/ai-elevenlabs@624

@tanstack/ai-event-client

npm i https://pkg.pr.new/@tanstack/ai-event-client@624

@tanstack/ai-fal

npm i https://pkg.pr.new/@tanstack/ai-fal@624

@tanstack/ai-gemini

npm i https://pkg.pr.new/@tanstack/ai-gemini@624

@tanstack/ai-grok

npm i https://pkg.pr.new/@tanstack/ai-grok@624

@tanstack/ai-groq

npm i https://pkg.pr.new/@tanstack/ai-groq@624

@tanstack/ai-isolate-cloudflare

npm i https://pkg.pr.new/@tanstack/ai-isolate-cloudflare@624

@tanstack/ai-isolate-node

npm i https://pkg.pr.new/@tanstack/ai-isolate-node@624

@tanstack/ai-isolate-quickjs

npm i https://pkg.pr.new/@tanstack/ai-isolate-quickjs@624

@tanstack/ai-ollama

npm i https://pkg.pr.new/@tanstack/ai-ollama@624

@tanstack/ai-openai

npm i https://pkg.pr.new/@tanstack/ai-openai@624

@tanstack/ai-openrouter

npm i https://pkg.pr.new/@tanstack/ai-openrouter@624

@tanstack/ai-preact

npm i https://pkg.pr.new/@tanstack/ai-preact@624

@tanstack/ai-react

npm i https://pkg.pr.new/@tanstack/ai-react@624

@tanstack/ai-react-ui

npm i https://pkg.pr.new/@tanstack/ai-react-ui@624

@tanstack/ai-solid

npm i https://pkg.pr.new/@tanstack/ai-solid@624

@tanstack/ai-solid-ui

npm i https://pkg.pr.new/@tanstack/ai-solid-ui@624

@tanstack/ai-svelte

npm i https://pkg.pr.new/@tanstack/ai-svelte@624

@tanstack/ai-utils

npm i https://pkg.pr.new/@tanstack/ai-utils@624

@tanstack/ai-vue

npm i https://pkg.pr.new/@tanstack/ai-vue@624

@tanstack/ai-vue-ui

npm i https://pkg.pr.new/@tanstack/ai-vue-ui@624

@tanstack/openai-base

npm i https://pkg.pr.new/@tanstack/openai-base@624

@tanstack/preact-ai-devtools

npm i https://pkg.pr.new/@tanstack/preact-ai-devtools@624

@tanstack/react-ai-devtools

npm i https://pkg.pr.new/@tanstack/react-ai-devtools@624

@tanstack/solid-ai-devtools

npm i https://pkg.pr.new/@tanstack/solid-ai-devtools@624

commit: 92857e6

tombeckenham linked an issue May 22, 2026 that may be closed by this pull request

image-to-image and image-to-video support #618

Open

ci: apply automated fixes

92857e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ai): add imageInputs / videoInputs / audioInputs for image-conditioned generation#624

feat(ai): add imageInputs / videoInputs / audioInputs for image-conditioned generation#624
tombeckenham wants to merge 2 commits into
mainfrom
618-image-to-image-and-image-to-video-support

tombeckenham commented May 22, 2026

Uh oh!

coderabbitai Bot commented May 22, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

nx-cloud Bot commented May 22, 2026 •

edited

Loading

Uh oh!

pkg-pr-new Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tombeckenham commented May 22, 2026

Summary

Provider behavior

Files

Tests

E2E

Docs / Skill

Follow-ups (separate issues / PRs)

Test plan

Uh oh!

coderabbitai Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Changeset Version Preview

🟥 Major bumps

🟨 Minor bumps

🟩 Patch bumps

Uh oh!

nx-cloud Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 22, 2026 •

edited

Loading

github-actions Bot commented May 22, 2026 •

edited

Loading

nx-cloud Bot commented May 22, 2026 •

edited

Loading