Skip to content

Eval bug: Multimodal model loading fails for Gemma 4 12B (unknown projector type: gemma4uv) #52

@chromascope-x

Description

@chromascope-x

Name and Version

BeeLlama v0.3.0 (Windows x64, CUDA 12 build)

Operating systems

Windows

GGML backends

CUDA

Hardware

CPU: Ryzen 7950X3D
GPU: 3080ti
RAM: 96GB
VRAM: 12GB

Models

Model:

  • gemma-4-12b-it-UD-Q5_K_XL.gguf

Multimodal projector:

  • gemma-4-12b-it-mmproj-BF16.gguf

from: https://huggingface.co/unsloth/gemma-4-12b-it-GGUF

Problem description & steps to reproduce

When loading Gemma 4 12B with its multimodal projector in BeeLlama, model loading fails before inference starts.

The server exits during multimodal / projector initialization with:

load_hparams: unknown projector type: gemma4uv

This appears to be a compatibility issue with the current runtime not recognizing the Gemma 4 projector format / architecture. Gemma 4 includes newer multimodal capabilities, and recent ecosystem notes suggest support has been evolving across runtimes. [web:503][web:512]

Steps to reproduce:

  1. Configure BeeLlama to load:
    • gemma-4-12b-it-UD-Q5_K_XL.gguf
    • gemma-4-12b-it-mmproj-BF16.gguf
  2. Start the server / load the model.
  3. Wait for initialization.
  4. Loading fails at clip_init / mtmd_init_from_file and the server exits.

Expected behavior:
The model should load successfully with multimodal support enabled.

Actual behavior:
The server crashes during projector loading and exits before the model becomes available.

Workaround:
If I remove / disable the mmproj entry, the text model can be loaded without multimodal support.

First Bad Commit

No response

Relevant log output

0.44.589.181 I srv ensure_model: waiting until model name=gemma-4-12b-it-UD-Q5_K_XL is fully loaded...
[50179] 0.03.748.921 W llama_context: n_ctx_seq (65536) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
[50179] 0.03.867.438 I common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
[50179] TCQ decode: context-adaptive V alpha enabled
[50179] 0.04.222.949 E clip_init: failed to load model 'F:\AI\LLM\models\gemma-4-12b-it-mmproj-BF16.gguf': load_hparams: unknown projector type: gemma4uv
[50179]
[50179] 0.04.223.000 E mtmd_init_from_file: error: Failed to load CLIP model from F:\AI\LLM\models\gemma-4-12b-it-mmproj-BF16.gguf
[50179]
[50179] 0.04.223.009 E srv load_model: failed to load multimodal model, 'F:\AI\LLM\models\gemma-4-12b-it-mmproj-BF16.gguf'
[50179] 0.04.223.014 I srv operator(): operator(): cleaning up before exit...
[50179] 0.04.223.950 E srv llama_server: exiting due to model loading error

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions