Summary
ImageGen currently only does text-to-image. The gateway catalog has at least three models that natively support image-to-image (using a reference / seed image to guide generation or perform edits):
openai/gpt-image-2 — "high-fidelity edits, character consistency"
google/nano-banana-pro — supports image input
xai/grok-imagine-image-pro — supports image input
But Franklin's ImageGen tool can't drive that capability — there's no image_url (or equivalent) parameter on the tool schema, so even when the LLM has clearly seen a reference image (now that PR #11 + gateway 6ac64da made vision work end-to-end), it has to fall back to text-only redrawing.
Reproduction
franklin (or VS Code extension)
/model sonnet (or any vision-capable model)
- Show the model an image:
Read /path/to/reference.png and use it as style reference to generate a new image of <something>
- Sonnet correctly describes the reference image (vision input ✅)
- Sonnet calls
ImageGen({ prompt: '…' }) — no image attached because the schema doesn't accept one
- Sonnet often appends a self-aware disclaimer in the assistant text:
注意:我没有真的把原图传给 nano-banana 做参考 —— 当前 ImageGen 工具只接受 prompt + size,没有 image 参数,所以这次也只是文字重绘。
The output image therefore loses the reference's style / character / pose entirely.
Where the gap is
src/tools/imagegen.ts (~line 361):
input_schema: {
type: 'object',
properties: {
prompt: { type: 'string', ... },
output_path: { type: 'string', ... },
size: { type: 'string', ... },
model: { type: 'string', ... },
contentId: { type: 'string', ... },
},
required: ['prompt'],
}
No image_url field. Compare to videogen.ts which already has it (image-to-video works for Seedance).
The request body posted to /v1/images/generations also has no place to put a reference image.
Proposed fix (two layers, same shape as PR #11)
Client side (Franklin core)
- Add
image_url to ImageGenInput and the JSON schema. Description should mirror VideoGen's: accepts a remote http(s) URL, a data URL, or a local file path (auto-read + base64 + capped at ~4 MB).
- Reuse the same
resolveSeedImage() helper VideoGen got recently — DRY a tiny module out of tools/_seed-image.ts if useful.
- When
image_url is present, either:
- (a) Post to a new endpoint, e.g.
/v1/images/edits, OR
- (b) Keep
/v1/images/generations but include image in the body and let the gateway route by parameter presence.
- Update
media-router.ts so the proposal preview / cost estimate knows reference-image mode (some models charge differently for edits vs. generations, and the routing prompt should mention "supports reference image" as a feature).
Gateway side (BlockRun)
- Surface
/v1/images/edits (or extend /v1/images/generations to accept image).
- Forward to the upstream model's appropriate endpoint:
- OpenAI
gpt-image-2 → https://api.openai.com/v1/images/edits
- Nano Banana Pro / Grok Imagine Pro → respective image-to-image endpoints if available
- Preserve
image payload (data URL or HTTPS URL) through any provider-translation layer — same class of issue as the tool_result image preservation that just shipped in 6ac64da.
Scope-limit suggestions
To keep this tractable:
- v1 supports a single reference image (most common case). Multi-image grids / mask editing can come later.
- Mask / inpainting parameters out of scope for v1.
- v1 only enables the feature for the three models listed above; other entries in the catalog continue to ignore
image_url if they don't support it (with a clear error, not a silent strip).
Willingness to send a PR
Happy to send the client-side PR once the gateway approach is confirmed (which endpoint, whether parameter-based routing on the existing endpoint is preferred, etc.). Not landing client code that has nothing to talk to — same coordination model as #10/#11.
Environment
Summary
ImageGencurrently only does text-to-image. The gateway catalog has at least three models that natively support image-to-image (using a reference / seed image to guide generation or perform edits):openai/gpt-image-2— "high-fidelity edits, character consistency"google/nano-banana-pro— supports image inputxai/grok-imagine-image-pro— supports image inputBut Franklin's
ImageGentool can't drive that capability — there's noimage_url(or equivalent) parameter on the tool schema, so even when the LLM has clearly seen a reference image (now that PR #11 + gateway6ac64damade vision work end-to-end), it has to fall back to text-only redrawing.Reproduction
franklin(or VS Code extension)/model sonnet(or any vision-capable model)Read /path/to/reference.png and use it as style reference to generate a new image of <something>ImageGen({ prompt: '…' })— no image attached because the schema doesn't accept oneThe output image therefore loses the reference's style / character / pose entirely.
Where the gap is
src/tools/imagegen.ts(~line 361):No
image_urlfield. Compare tovideogen.tswhich already has it (image-to-video works for Seedance).The request body posted to
/v1/images/generationsalso has no place to put a reference image.Proposed fix (two layers, same shape as PR #11)
Client side (Franklin core)
image_urltoImageGenInputand the JSON schema. Description should mirror VideoGen's: accepts a remote http(s) URL, a data URL, or a local file path (auto-read + base64 + capped at ~4 MB).resolveSeedImage()helper VideoGen got recently — DRY a tiny module out oftools/_seed-image.tsif useful.image_urlis present, either:/v1/images/edits, OR/v1/images/generationsbut includeimagein the body and let the gateway route by parameter presence.media-router.tsso the proposal preview / cost estimate knows reference-image mode (some models charge differently for edits vs. generations, and the routing prompt should mention "supports reference image" as a feature).Gateway side (BlockRun)
/v1/images/edits(or extend/v1/images/generationsto acceptimage).gpt-image-2→https://api.openai.com/v1/images/editsimagepayload (data URL or HTTPS URL) through any provider-translation layer — same class of issue as the tool_result image preservation that just shipped in6ac64da.Scope-limit suggestions
To keep this tractable:
image_urlif they don't support it (with a clear error, not a silent strip).Willingness to send a PR
Happy to send the client-side PR once the gateway approach is confirmed (which endpoint, whether parameter-based routing on the existing endpoint is preferred, etc.). Not landing client code that has nothing to talk to — same coordination model as #10/#11.
Environment