Fix/generate-image#285
Conversation
There was a problem hiding this comment.
Requesting changes - architectural concern + a few correctness issues
Main blocker: provider branching duplicates infrastructure we already have
This change hardcodes if provider == "openai" / elif "gemini" inside the action, with two SDK imports, two client inits, and two error-mapping blocks. We already have provider abstraction elsewhere in the repo:
MODEL_REGISTRY+InterfaceTypeenum (used by VLM/LLM)LLMInterface/VLMInterfacewrappers inapp/llm_interface.pyandapp/vlm_interface.pythat hide the provider-specific SDK callsdescribe_image.py(lines 62–70) is the reference pattern: read the configured provider fromMODEL_REGISTRY[provider][InterfaceType.VLM], then delegate
As-is this PR builds a third parallel provider system that future image providers (Stability, Replicate, xAI, OpenRouter image, etc.) will all have to extend by adding another elif branch here. It also introduces a new image_generation.preferred_provider setting that parallels the existing vlm_provider / llm_provider pattern instead of joining it.
Could we route this through MODEL_REGISTRY with a new InterfaceType.IMAGE_GEN and an ImageGenInterface in agent_core, mirroring how VLMInterface is set up, so generate_image.py ends up looking like describe_image.py? Reusing InterfaceType.VLM directly is tempting since some providers serve both through one endpoint, but the capability sets differ (Claude / ByteDance support VLM but not gen) and users will want to pick providers independently for each.
Other issues worth fixing while you're in here
- OpenAI aspect-ratio map is wrong.
"16:9": "1536x1024"is 3:2,"9:16": "1024x1536"is 2:3. The canvas constraint is real (gpt-image only has 3 sizes), but silently mismapping → at least append towarnings. (I skimmed real quick so please verify) - Silent 4K downgrade for OpenAI.
"4K": "high"returns at most 1536×1024. Either reject 4K for OpenAI or warn. qualitydropped on the edit path.images.generate(..., quality=...)is passed, butimages.edit(...)isn't - reference-image runs silently render at lower quality.images.edit≠ "style reference." The existingreference_imagesfield is documented as style guidance (how Gemini uses them). OpenAI'simages.edittreats inputs as compositional/mask inputs. Same input, very different output between providers.- Provider-selection UX doesn't match the PR description. The description says "asks the user" when both keys are present, but the code silently defaults to Gemini - there's no signal in the response telling the calling LLM that a choice is available. Once
provider_preferenceis saved, there's also no way to clear it.
Happy to pair on the ImageGenInterface refactor if it'd help.
@zfoong What do you think about this change? Worth the effort?
| error_message = f"Content blocked by safety filters: {error_message}. Try modifying your prompt." | ||
| elif "not found" in error_message.lower() or "404" in error_message: | ||
| error_message = f"Model not available: {error_message}. The gemini-3-pro-image-preview model may not be accessible with your API key. Try using Google AI Studio to verify access." | ||
| if provider == "gemini": |
There was a problem hiding this comment.
Follow-up: extract these error strings to a message catalog
Even setting aside this PR, both error-mapping blocks (Gemini + OpenAI) are the same five patterns repeated with provider-specific text spliced in:
| Template key | Triggers (substrings) | Placeholders |
|---|---|---|
provider_rate_limit |
quota, rate, insufficient_quota, billing |
{provider_label}, {error} |
provider_invalid_key |
invalid + key, invalid_api_key |
{provider_label}, {error} |
provider_access_denied |
permission, access |
{provider_label}, {model}, {error} |
provider_safety_block |
safety, blocked, content_policy |
{provider_label}, {error} |
provider_model_not_found |
not found, 404 |
{provider_label}, {model}, {help_url}, {error} |
Suggested shape - anchors on the existing get_os_language() setting in app/config.py:341, which currently isn't wired to anything:
app/i18n/errors.en.json- flat catalog of templates with{placeholder}substitutionapp/i18n/errors.<lang>.json- locale overrides; missing keys fall back toenapp/i18n/__init__.pyexposing two helpers:t(key, **kwargs) -> str- formatted lookup with locale + en fallbackclassify_provider_error(exc, *, provider, model) -> str- does the substring matching against a triggers table in the catalog and returns the formatted message in one call
The error block in this PR collapses to roughly:
except Exception as e:
return {
"status": "error",
...,
"message": classify_provider_error(e, provider=provider, model=model_id),
}
Benefits:
- one mapping table instead of two near-identical blocks per provider
- localization-ready without touching call sites
- adding a new provider = adding a row to the catalog, not another
if/elifladder - avoids writing a bespoke message for every case - templates cover ~90% of them
- i think with the help of claude we can implement this pretty quick and easy
@zfoong What do you think about this too?
Happy to file this as a separate issue once we figure out which agent_core module owns it (since the same templates will be useful for the LLMInterface / VLMInterface paths too).
What
Why
Updated CraftBot to use state-of-the-art image gen models
Closes #219