Implement Ernie Image model.#13369
Conversation
📝 WalkthroughWalkthroughThis pull request introduces support for ERNIE image diffusion models and integrates a Ministral 3.3B text encoder. The changes include a new ERNIE transformer implementation with rotary position embeddings, patch embeddings, and adaptive layer normalization for diffusion conditioning; a corresponding model class registered in the supported models framework; text encoder integration via new tokenizer and model wrapper classes for Ministral; and model detection logic to identify ERNIE checkpoints by their weight key patterns. The modifications span image model architecture, text encoder configuration, model registry, and checkpoint detection. 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy/sd.py`:
- Around line 1306-1307: The detector currently returns TEModel.MINISTRAL_3_3B
whenever weight.shape[0] == 3072 which is too broad; update the branch to first
check that the tensor key corresponds to a Ministral-specific parameter (e.g.
the key string equals or contains "model.layers.0.linear_attn.in_proj_a.weight")
and only then use the shape check as a secondary guard before returning
TEModel.MINRAL_3_3B (or TEModel.MINISTRAL_3_3B). Locate the check around
weight.shape[0] == 3072 and add the key/name presence test so only tensors from
Ministral's in_proj_a weight path are routed to the Ministral loader.
In `@comfy/text_encoders/ernie.py`:
- Around line 29-38: The wrapper factory te() defines a subclass ErnieTEModel_
that applies dtype_llama and llama_quantization_metadata but currently returns
the original ErnieTEModel; change the return to the configured subclass by
returning ErnieTEModel_ from te() so the overridden __init__ (and its applied
dtype_llama and model_options["quantization_metadata"]) is used when consumers
instantiate the model.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7a2120a5-e41a-48d1-9d29-dfdd68f39c22
📒 Files selected for processing (8)
comfy/ldm/ernie/model.pycomfy/model_base.pycomfy/model_detection.pycomfy/sd.pycomfy/supported_models.pycomfy/text_encoders/ernie.pycomfy/text_encoders/flux.pycomfy/text_encoders/llama.py
| if weight.shape[0] == 3072: | ||
| return TEModel.MINISTRAL_3_3B |
There was a problem hiding this comment.
Use a Ministral-specific key in the detector.
Line 1306 only checks for hidden size 3072, which is too broad for model selection. Any 3072-wide decoder-only TE that reaches this branch will now be routed through the ERNIE loader. Please gate this on a Ministral-specific key such as model.layers.0.linear_attn.in_proj_a.weight and keep the shape check as a secondary guard.
🐛 Proposed fix
- if weight.shape[0] == 3072:
+ if weight.shape[0] == 3072 and "model.layers.0.linear_attn.in_proj_a.weight" in sd:
return TEModel.MINISTRAL_3_3B🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@comfy/sd.py` around lines 1306 - 1307, The detector currently returns
TEModel.MINISTRAL_3_3B whenever weight.shape[0] == 3072 which is too broad;
update the branch to first check that the tensor key corresponds to a
Ministral-specific parameter (e.g. the key string equals or contains
"model.layers.0.linear_attn.in_proj_a.weight") and only then use the shape check
as a secondary guard before returning TEModel.MINRAL_3_3B (or
TEModel.MINISTRAL_3_3B). Locate the check around weight.shape[0] == 3072 and add
the key/name presence test so only tensors from Ministral's in_proj_a weight
path are routed to the Ministral loader.
| def te(dtype_llama=None, llama_quantization_metadata=None): | ||
| class ErnieTEModel_(ErnieTEModel): | ||
| def __init__(self, device="cpu", dtype=None, model_options={}): | ||
| if dtype_llama is not None: | ||
| dtype = dtype_llama | ||
| if llama_quantization_metadata is not None: | ||
| model_options = model_options.copy() | ||
| model_options["quantization_metadata"] = llama_quantization_metadata | ||
| super().__init__(device=device, dtype=dtype, model_options=model_options) | ||
| return ErnieTEModel |
There was a problem hiding this comment.
Return the configured subclass from te().
Line 38 returns ErnieTEModel instead of ErnieTEModel_, so dtype_llama and llama_quantization_metadata are never applied. That makes the new loader ignore the detection info passed in from both comfy/sd.py and comfy/supported_models.py.
🐛 Proposed fix
def te(dtype_llama=None, llama_quantization_metadata=None):
class ErnieTEModel_(ErnieTEModel):
def __init__(self, device="cpu", dtype=None, model_options={}):
if dtype_llama is not None:
dtype = dtype_llama
if llama_quantization_metadata is not None:
model_options = model_options.copy()
model_options["quantization_metadata"] = llama_quantization_metadata
super().__init__(device=device, dtype=dtype, model_options=model_options)
- return ErnieTEModel
+ return ErnieTEModel_🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@comfy/text_encoders/ernie.py` around lines 29 - 38, The wrapper factory te()
defines a subclass ErnieTEModel_ that applies dtype_llama and
llama_quantization_metadata but currently returns the original ErnieTEModel;
change the return to the configured subclass by returning ErnieTEModel_ from
te() so the overridden __init__ (and its applied dtype_llama and
model_options["quantization_metadata"]) is used when consumers instantiate the
model.
|
The model has been removed from Hugging Face, so it can be considered as ultimately not going to be open-sourced. |
|
Seems like it's back. |
No description provided.