-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Fix(models/siglip): Add compatibility for Gemma models quantized by llm-compressor #19643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
vllm/model_executor/models/siglip.py
Outdated
@@ -516,6 +516,13 @@ def load_weights(self, weights: Iterable[tuple[str, | |||
weight_loader(param, loaded_weight, shard_id) | |||
break | |||
else: | |||
if name not in params_dict: | |||
potential_name = f"vision_model.{name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some comment to explain this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @houseroad, thank you so much for the quick and insightful review!
- Absolutely, that's a great suggestion. I've added a detailed comment to the code to explain the purpose of the remapping logic.
- You've pointed out the core issue perfectly regarding the double prefix. My current fix in
load_weights
is indeed a workaround at the loading stage. A more fundamental fix at the model's__init__
level would certainly be cleaner. I'm happy to explore that path. Could you provide any pointers on the preferred way to handle this prefixing logic within the vLLM architecture? Or would you prefer to merge the current loader-side fix first to resolve the user-facing bug, and then create a separate issue for the architectural refactoring?
I'll push the commit with the added comments shortly. Thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we can fix the weights loading logic to avoid vision_model.vision_model double prefix?
bec5429
to
5e0cded
Compare
Hi @houseroad , thank you so much for the quick and insightful review! I change my method and refer other models plemetation and just finish verify my new method, pass successful.
output:
|
Hi @houseroad , following up on our discussion about the I've just pushed a new version of the fix that addresses the issue at a more fundamental level, as you suggested. Instead of patching the I have verified locally that this new approach also successfully resolves the The PR should be ready for review and the final CI run whenever you have a moment. Thanks again for your guidance! |
Hi @DarkLight1337 Can you merge this codes? I updated this PR about this codes specific description. 😄 |
vllm/model_executor/models/gemma3.py
Outdated
hf_to_vllm_mapper = WeightsMapper( | ||
orig_to_new_prefix={ | ||
# mapping for new names in checkpoint saved after transformers v4.52 | ||
"vision_tower.vision_model.": "vision_model.", | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, but Gemma3ForCausalLM
is a text-only model, why can it have ViT in checkpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Isotr0py , thank you again for the very insightful question. It helped me clarify the core of the issue, and my apologies for not making the context clearer in the initial PR description.
You are absolutely correct that gemma3
is a text-only LLM.
This fix is specifically designed to support its multi-modal variants, where gemma-3-4b-it
is used as the language backbone and is combined with a separate vision encoder (like SigLIP, which is a ViT). In these common VLM (Vision-Language-Model) architectures, the vision encoder is loaded into a component conventionally named vision_tower
.
This leads to the exact problem this PR solves: the model checkpoint contains weight names with the prefix vision_tower.vision_model.
, while vLLM's internal SiglipVisionModel
loader expects the prefix to be just vision_model.
. This naming mismatch results in the KeyError
.
The WeightsMapper
I've implemented in gemma3_mm.py
directly and cleanly resolves this by providing a rule to remap vision_tower.vision_model.
to vision_model.
, thus allowing the weights to be loaded correctly. This addresses the bug originally reported in vllm-project/llm-compressor#1546
.
I hope this provides the necessary context for the change! Please let me know if you have any further questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this mapping from Gemma3ForCausalLM
to Gemma3ForConditionalGeneration
? This mapping in Gemma3ForCausalLM
here can be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Isotr0py ,
Thank you for the excellent and clear guidance. Your feedback was incredibly helpful.
I have just pushed an updated commit that implements your suggestion precisely.
Specifically, I have reverted the previous changes to gemma3.py
and have now placed the fix within gemma3_mm.py
. I located the existing WeightsMapper
in the Gemma3ForConditionalGeneration
class and simply added the necessary rule ("vision_tower.vision_model.": "vision_model."
) to its orig_to_new_prefix
dictionary.
This is a much cleaner solution that correctly places the vision-related logic within the multi-modal class, fully addressing your concern about keeping Gemma3ForCausalLM
clean.
The PR should now be ready for another look. Thanks again for your help!
Signed-off-by: Vensenmu <vensenmu@gmail.com>
Signed-off-by: Vensenmu <vensenmu@gmail.com>
727cb28
to
4ed2749
Compare
4ed2749
to
e74d681
Compare
Hi @Isotr0py , thank you for triggering the final CI run! It seems like the This looks like it might be a transient CI environment or network issue. Would it be possible to re-run the failed job? Please let me know if there's anything I need to do on my end. Thank you! |
…lm-compressor (vllm-project#19643) Signed-off-by: Vensenmu <vensenmu@gmail.com> Signed-off-by: juncheoll <th6re8e@naver.com>
…lm-compressor (vllm-project#19643) Signed-off-by: Vensenmu <vensenmu@gmail.com> Signed-off-by: fhl <2410591650@qq.com>
…lm-compressor (vllm-project#19643) Signed-off-by: Vensenmu <vensenmu@gmail.com>
Description
This PR resolves a
KeyError
that occurs when attempting to serve a Gemma-3 model that has been quantized by thevllm-project/llm-compressor
library. It makes the SigLIP vision model loader more robust to variations in weight naming conventions.This directly addresses the root cause of the bug reported in vllm-project/llm-compressor#1546.
Root Cause
The
KeyError
arises from a mismatch in weight-naming conventions between the vLLM's internalSiglipVisionModel
architecture and the standard model artifacts produced by tools likellm-compressor
.Specifically, the
SiglipVisionModel
in vLLM prepends an additionalvision_model.
prefix to its parameters (e.g., expectingvision_model.vision_model.encoder...
), while the quantized model artifact uses the standard Hugging Face naming convention (e.g.,vision_model.encoder...
).The existing
load_weights
function insiglip.py
strictly expected the internal vLLM format, leading to aKeyError
when it encountered a weight name from the quantized model.Solution
This patch introduces a flexible remapping logic within the
load_weights
function invllm/model_executor/models/siglip.py
.vision_model.
.This makes the loader robust to this naming discrepancy, resolving the
KeyError
.Testing
The fix was verified in a cloud GPU environment (Colab A100) using the following methodology:
KeyError
with the original vLLM code.vllm
package was installed.siglip.py
file within the installation was "hot-patched" with the code from this PR.vllm serve
command was re-run against the quantized model.Result:
INFO: Remapping weight...
logs were observed, confirming the patch was being correctly executed.KeyError
was successfully resolved.vllm
server launched successfully and was able to serve the quantized model.Thank you for considering this contribution!