feat(ai): add imageInputs / videoInputs / audioInputs for image-conditioned generation#624
feat(ai): add imageInputs / videoInputs / audioInputs for image-conditioned generation#624tombeckenham wants to merge 2 commits into
Conversation
…tioned generation (closes #618) Adds optional `imageInputs`, `videoInputs`, and `audioInputs` to `generateImage()` and `generateVideo()` for image-to-image, multi-reference, mask / inpaint, image-to-video, and starting-frame flows. Each input part may carry a `metadata.role` hint (`'reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character'`) that adapters use to route to the provider-specific field. Provider behavior: - OpenAI image: gpt-image-1 / -mini route to `images.edit()` (up to 16 + mask); dall-e-2 routes to `images.edit()` with one source; dall-e-3 throws. - OpenAI video: Sora-2 / -pro accept a single `input_reference`; throws on >1. - Gemini: native models receive inputs as multimodal `contents` parts; Imagen throws (text-only). - fal: 1 input → `image_url`, >1 → `image_urls`; metadata roles map to `mask_url` / `control_image_url` / `reference_image_urls`; video adds `start_image_url` / `end_image_url`. Interim mapping until the fal schemas library lands. - Grok, OpenRouter: throw with a link back to #618 (pending native Imagine API rewrite and multimodal injection work respectively). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 Changeset Version Preview7 package(s) bumped directly, 23 bumped as dependents. 🟥 Major bumps
🟨 Minor bumps
🟩 Patch bumps
|
|
View your CI Pipeline Execution ↗ for commit 92857e6
☁️ Nx Cloud last updated this comment at |
@tanstack/ai
@tanstack/ai-anthropic
@tanstack/ai-client
@tanstack/ai-code-mode
@tanstack/ai-code-mode-skills
@tanstack/ai-devtools-core
@tanstack/ai-elevenlabs
@tanstack/ai-event-client
@tanstack/ai-fal
@tanstack/ai-gemini
@tanstack/ai-grok
@tanstack/ai-groq
@tanstack/ai-isolate-cloudflare
@tanstack/ai-isolate-node
@tanstack/ai-isolate-quickjs
@tanstack/ai-ollama
@tanstack/ai-openai
@tanstack/ai-openrouter
@tanstack/ai-preact
@tanstack/ai-react
@tanstack/ai-react-ui
@tanstack/ai-solid
@tanstack/ai-solid-ui
@tanstack/ai-svelte
@tanstack/ai-utils
@tanstack/ai-vue
@tanstack/ai-vue-ui
@tanstack/openai-base
@tanstack/preact-ai-devtools
@tanstack/react-ai-devtools
@tanstack/solid-ai-devtools
commit: |
Closes #618.
Summary
Adds optional
imageInputs?: ImagePart[],videoInputs?: VideoPart[], andaudioInputs?: AudioPart[]togenerateImage()andgenerateVideo()for image-to-image, multi-reference, mask / inpaint, and image-to-video flows. Each input part may carry an optionalmetadata.rolehint ('reference' | 'mask' | 'control' | 'start_frame' | 'end_frame' | 'character') that adapters use to route to the provider-specific field.The field reuses the existing
ImagePart/VideoPart/AudioPartmultimodal primitives — no new public types beyond the role convention.Design discussion + provider research captured on the issue: #618 (comment)
Provider behavior
generateImagegenerateVideoimages.edit()(up to 16 + mask). dall-e-2 → edit (1). dall-e-3 throws.input_reference(single only).contents. Imagen throws (text-only).image_url; >1 →image_urls; roles →mask_url/control_image_url/reference_image_urls. Interim mapping until the dedicated fal schemas library lands.start_image_url/end_image_url. Video-to-video viavideoInputs.Source conversion: URL inputs are fetched + converted to
Filefor OpenAI's upload-only edit endpoint; base64 sources are decoded to aBufferand wrapped. fal / Gemini receive URLs and base64 data URIs directly.Files
17 modified, 4 new, ~1000 net new lines.
packages/typescript/ai/src/types.ts(newMediaInputRole,MediaInputMetadata;ImageGenerationOptionsandVideoGenerationOptionsgain the three input fields)generateImageandgenerateVideothread the inputs through to adaptersai-openai/src/image/image-input-to-file.ts,ai-fal/src/image/image-inputs.tsTests
729 tests passing across 7 touched packages.
images.edit()routing, mask routing, dall-e-3 rejection, dall-e-2 single-image gate, videoInputs/audioInputs rejection.E2E
'image-to-image'and'image-to-video'feature flags added totesting/e2e/src/lib/types.tswith empty support sets infeature-support.ts. aimock doesn't currently mock/v1/images/editsor Sora'sinput_referenceupload field, so functional E2E specs are deferred until aimock gains support. Adapter behavior is comprehensively covered by unit tests in the meantime.Docs / Skill
docs/media/image-generation.md: new "Image-Conditioned Generation" section with role table, provider matrix, mask + multi-reference examples.docs/media/video-generation.md: new "Image-to-Video" section.packages/typescript/ai/skills/ai-core/media-generation/SKILL.md: new image-conditioned section + new HIGH common-mistake entry on passing inputs to unsupported models.pnpm test:docspasses.Follow-ups (separate issues / PRs)
imageInputsshow up in the option type only for models that support it (per-model input capability map, mirroring howsizeandmodelOptionsresolve today). Sketched in PR conversation — landing as a follow-up to keep this PR scoped to runtime behavior./v1/images/editsand Sorainput_reference— unblocks functional E2E specs for the new flags.Test plan
pnpm test:lib,pnpm test:types,pnpm test:eslint,pnpm test:build,pnpm test:docsexamples/ts-react-chat-style flow withopenaiImage('gpt-image-1')+ a reference image, confirm round-tripdall-e-3+imageInputsthrows with a helpful message at runtimefalImage('fal-ai/flux/dev')+ a singleimageInputspart routes toimage_url🤖 Generated with Claude Code