Skip to content

Fix/generate-image#285

Merged
zfoong merged 2 commits into
V1.3.2from
fix-generate-image
May 28, 2026
Merged

Fix/generate-image#285
zfoong merged 2 commits into
V1.3.2from
fix-generate-image

Conversation

@makiroll1125
Copy link
Copy Markdown
Collaborator

@makiroll1125 makiroll1125 commented May 26, 2026

What

  • Add OpenAI Image 2.0 image generation functionality
  • Change Gemini image gen model to Nano banana 2

Why

Updated CraftBot to use state-of-the-art image gen models
Closes #219

@makiroll1125 makiroll1125 self-assigned this May 26, 2026
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes - architectural concern + a few correctness issues

Main blocker: provider branching duplicates infrastructure we already have

This change hardcodes if provider == "openai" / elif "gemini" inside the action, with two SDK imports, two client inits, and two error-mapping blocks. We already have provider abstraction elsewhere in the repo:

  • MODEL_REGISTRY + InterfaceType enum (used by VLM/LLM)
  • LLMInterface / VLMInterface wrappers in app/llm_interface.py and app/vlm_interface.py that hide the provider-specific SDK calls
  • describe_image.py (lines 62–70) is the reference pattern: read the configured provider from MODEL_REGISTRY[provider][InterfaceType.VLM], then delegate

As-is this PR builds a third parallel provider system that future image providers (Stability, Replicate, xAI, OpenRouter image, etc.) will all have to extend by adding another elif branch here. It also introduces a new image_generation.preferred_provider setting that parallels the existing vlm_provider / llm_provider pattern instead of joining it.

Could we route this through MODEL_REGISTRY with a new InterfaceType.IMAGE_GEN and an ImageGenInterface in agent_core, mirroring how VLMInterface is set up, so generate_image.py ends up looking like describe_image.py? Reusing InterfaceType.VLM directly is tempting since some providers serve both through one endpoint, but the capability sets differ (Claude / ByteDance support VLM but not gen) and users will want to pick providers independently for each.

Other issues worth fixing while you're in here

  • OpenAI aspect-ratio map is wrong. "16:9": "1536x1024" is 3:2, "9:16": "1024x1536" is 2:3. The canvas constraint is real (gpt-image only has 3 sizes), but silently mismapping → at least append to warnings. (I skimmed real quick so please verify)
  • Silent 4K downgrade for OpenAI. "4K": "high" returns at most 1536×1024. Either reject 4K for OpenAI or warn.
  • quality dropped on the edit path. images.generate(..., quality=...) is passed, but images.edit(...) isn't - reference-image runs silently render at lower quality.
  • images.edit ≠ "style reference." The existing reference_images field is documented as style guidance (how Gemini uses them). OpenAI's images.edit treats inputs as compositional/mask inputs. Same input, very different output between providers.
  • Provider-selection UX doesn't match the PR description. The description says "asks the user" when both keys are present, but the code silently defaults to Gemini - there's no signal in the response telling the calling LLM that a choice is available. Once provider_preference is saved, there's also no way to clear it.

Happy to pair on the ImageGenInterface refactor if it'd help.

@zfoong What do you think about this change? Worth the effort?

error_message = f"Content blocked by safety filters: {error_message}. Try modifying your prompt."
elif "not found" in error_message.lower() or "404" in error_message:
error_message = f"Model not available: {error_message}. The gemini-3-pro-image-preview model may not be accessible with your API key. Try using Google AI Studio to verify access."
if provider == "gemini":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: extract these error strings to a message catalog

Even setting aside this PR, both error-mapping blocks (Gemini + OpenAI) are the same five patterns repeated with provider-specific text spliced in:

Template key Triggers (substrings) Placeholders
provider_rate_limit quota, rate, insufficient_quota, billing {provider_label}, {error}
provider_invalid_key invalid + key, invalid_api_key {provider_label}, {error}
provider_access_denied permission, access {provider_label}, {model}, {error}
provider_safety_block safety, blocked, content_policy {provider_label}, {error}
provider_model_not_found not found, 404 {provider_label}, {model}, {help_url}, {error}

Suggested shape - anchors on the existing get_os_language() setting in app/config.py:341, which currently isn't wired to anything:

  • app/i18n/errors.en.json - flat catalog of templates with {placeholder} substitution
  • app/i18n/errors.<lang>.json - locale overrides; missing keys fall back to en
  • app/i18n/__init__.py exposing two helpers:
    • t(key, **kwargs) -> str - formatted lookup with locale + en fallback
    • classify_provider_error(exc, *, provider, model) -> str - does the substring matching against a triggers table in the catalog and returns the formatted message in one call

The error block in this PR collapses to roughly:

except Exception as e:
    return {
        "status": "error",
        ...,
        "message": classify_provider_error(e, provider=provider, model=model_id),
    }

Benefits:

  • one mapping table instead of two near-identical blocks per provider
  • localization-ready without touching call sites
  • adding a new provider = adding a row to the catalog, not another if/elif ladder
  • avoids writing a bespoke message for every case - templates cover ~90% of them
  • i think with the help of claude we can implement this pretty quick and easy

@zfoong What do you think about this too?
Happy to file this as a separate issue once we figure out which agent_core module owns it (since the same templates will be useful for the LLMInterface / VLMInterface paths too).

@zfoong zfoong merged commit f8f888d into V1.3.2 May 28, 2026
@makiroll1125 makiroll1125 deleted the fix-generate-image branch May 28, 2026 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants