Skip to content

feat(imagegen): support reference image for image-to-image (GPT Image 2 / Nano Banana / Grok Imagine Pro) #12

@KillerQueen-Z

Description

@KillerQueen-Z

Summary

ImageGen currently only does text-to-image. The gateway catalog has at least three models that natively support image-to-image (using a reference / seed image to guide generation or perform edits):

  • openai/gpt-image-2 — "high-fidelity edits, character consistency"
  • google/nano-banana-pro — supports image input
  • xai/grok-imagine-image-pro — supports image input

But Franklin's ImageGen tool can't drive that capability — there's no image_url (or equivalent) parameter on the tool schema, so even when the LLM has clearly seen a reference image (now that PR #11 + gateway 6ac64da made vision work end-to-end), it has to fall back to text-only redrawing.

Reproduction

  1. franklin (or VS Code extension)
  2. /model sonnet (or any vision-capable model)
  3. Show the model an image: Read /path/to/reference.png and use it as style reference to generate a new image of <something>
  4. Sonnet correctly describes the reference image (vision input ✅)
  5. Sonnet calls ImageGen({ prompt: '…' })no image attached because the schema doesn't accept one
  6. Sonnet often appends a self-aware disclaimer in the assistant text:

注意:我没有真的把原图传给 nano-banana 做参考 —— 当前 ImageGen 工具只接受 prompt + size,没有 image 参数,所以这次也只是文字重绘。

The output image therefore loses the reference's style / character / pose entirely.

Where the gap is

src/tools/imagegen.ts (~line 361):

input_schema: {
  type: 'object',
  properties: {
    prompt:      { type: 'string',  ... },
    output_path: { type: 'string',  ... },
    size:        { type: 'string',  ... },
    model:       { type: 'string',  ... },
    contentId:   { type: 'string',  ... },
  },
  required: ['prompt'],
}

No image_url field. Compare to videogen.ts which already has it (image-to-video works for Seedance).

The request body posted to /v1/images/generations also has no place to put a reference image.

Proposed fix (two layers, same shape as PR #11)

Client side (Franklin core)

  1. Add image_url to ImageGenInput and the JSON schema. Description should mirror VideoGen's: accepts a remote http(s) URL, a data URL, or a local file path (auto-read + base64 + capped at ~4 MB).
  2. Reuse the same resolveSeedImage() helper VideoGen got recently — DRY a tiny module out of tools/_seed-image.ts if useful.
  3. When image_url is present, either:
    • (a) Post to a new endpoint, e.g. /v1/images/edits, OR
    • (b) Keep /v1/images/generations but include image in the body and let the gateway route by parameter presence.
  4. Update media-router.ts so the proposal preview / cost estimate knows reference-image mode (some models charge differently for edits vs. generations, and the routing prompt should mention "supports reference image" as a feature).

Gateway side (BlockRun)

  • Surface /v1/images/edits (or extend /v1/images/generations to accept image).
  • Forward to the upstream model's appropriate endpoint:
    • OpenAI gpt-image-2https://api.openai.com/v1/images/edits
    • Nano Banana Pro / Grok Imagine Pro → respective image-to-image endpoints if available
  • Preserve image payload (data URL or HTTPS URL) through any provider-translation layer — same class of issue as the tool_result image preservation that just shipped in 6ac64da.

Scope-limit suggestions

To keep this tractable:

  • v1 supports a single reference image (most common case). Multi-image grids / mask editing can come later.
  • Mask / inpainting parameters out of scope for v1.
  • v1 only enables the feature for the three models listed above; other entries in the catalog continue to ignore image_url if they don't support it (with a clear error, not a silent strip).

Willingness to send a PR

Happy to send the client-side PR once the gateway approach is confirmed (which endpoint, whether parameter-based routing on the existing endpoint is preferred, etc.). Not landing client code that has nothing to talk to — same coordination model as #10/#11.

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions