Skip to content

image-to-image and image-to-video support #618

@tombeckenham

Description

@tombeckenham

Thank you @yiss for describing this so well

Problem

generateImage() and generateVideo() are currently centered around text-prompt inputs, but several providers and models support image-conditioned generation workflows.

Examples include:

  • image-to-image generation
  • prompt + reference image generation
  • multi-reference image generation
  • image-to-video generation
  • video generation from a starting frame
  • model-specific image editing / transformation workflows

Today there is no obvious provider-agnostic way to pass image inputs into generateImage() and generateVideo().

TanStack AI already has a clean multimodal abstraction for content parts (ImagePart with source.type: 'data' | 'url'). It would be great if media generation APIs reused that same shape instead of introducing provider-specific one-offs for image-conditioned generation.

Why this matters

Modern image and video models are increasingly multimodal. Generation is no longer only text-to-image or text-to-video.

A unified way to pass image inputs would make it much easier for adapters to support workflows like:

  • image editing
  • reference-guided generation
  • image-to-video
  • multi-image composition

Proposal

Add an optional inputs field to both generateImage() and generateVideo() that accepts reusable multimodal content parts, ideally existing ImagePart values.

This would provide a consistent, provider-agnostic way to pass image-conditioned inputs into media generation APIs.

Example API

generateImage()

import { generateImage, type ImagePart } from '@tanstack/ai'

const reference: ImagePart = {
  type: 'image',
  source: {
    type: 'url',
    value: 'https://example.com/reference.png',
  },
}

await generateImage({
  adapter: openaiImage('gpt-image-1.5'),
  prompt: 'Turn this into a cinematic product photo',
  inputs: [reference],
})

generateVideo()

import { generateVideo, type ImagePart } from '@tanstack/ai'

const startingFrame: ImagePart = {
  type: 'image',
  source: {
    type: 'data',
    value: base64Image,
    mimeType: 'image/png',
  },
}

await generateVideo({
  adapter: googleVideo('veo-3.1'),
  prompt: 'Animate this still into a slow cinematic push-in with subtle motion',
  inputs: [startingFrame],
})

Multiple reference images

import { generateImage, type ImagePart } from '@tanstack/ai'

const product: ImagePart = {
  type: 'image',
  source: {
    type: 'url',
    value: 'https://example.com/product.png',
  },
}

const style: ImagePart = {
  type: 'image',
  source: {
    type: 'url',
    value: 'https://example.com/style.png',
  },
}

await generateImage({
  adapter: geminiImage('nano-banana'),
  prompt: 'Generate a new image of the product using the style of the second reference',
  inputs: [product, style],
})

Expected behavior

  • generateImage() and generateVideo() should both accept image-conditioned inputs through the same field name.
  • The input format should ideally reuse existing TanStack AI multimodal primitives such as ImagePart.
  • Adapters should map those inputs into the provider-native request shape.
  • Unsupported combinations can be rejected by adapters at runtime or by adapter-specific validation.
  • Providers that only support text prompts should continue to work unchanged.

Open design questions

  • Should the field be named inputs, references, or something else?
  • Should it accept only ImagePart[], or broader content parts for future extensibility?
  • Should generateVideo() support multiple input images as well, or only one initially?

Summary

Request: add a unified, provider-agnostic way to pass image-conditioned inputs into both generateImage() and generateVideo(), ideally by reusing existing multimodal content-part types such as ImagePart.

Originally posted by @yiss in #481

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions