-
Notifications
You must be signed in to change notification settings - Fork 285
Add qwen3 vl #1095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add qwen3 vl #1095
Conversation
Summary of ChangesHello @SangChengC, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the model capabilities by adding full support for the Qwen3-VL and Qwen3-VL MoE models. It involves a deep integration of multimodal features, including a new visual encoder, specialized tokenization, and the innovative 'deepstack' mechanism to pass rich visual context through multiple layers. Alongside these major additions, the PR refines existing image processing, enhances Rotary Positional Embeddings for complex scenarios, and optimizes MoE weight handling and Whisper's attention for better performance. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Summary of ChangesHello @SangChengC, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates the Qwen3 Vision-Language (VL) model into the LightLLM framework, enabling efficient processing of multimodal inputs. It involves adapting the model's architecture, including its Mixture-of-Experts (MoE) components and specialized image processing pipeline, to work seamlessly within the existing inference system. The changes enhance the framework's capability to handle complex visual and textual data, ensuring accurate and performant multimodal inference for Qwen3 VL. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the qwen3_vl and qwen3_vl_moe multimodal models. The changes are extensive, including new model definitions, custom layer implementations for handling vision features and Mixture-of-Experts, and specific weight loading logic. Notably, it adds support for deepstack features and mRoPE. While the implementation is comprehensive, I've identified a critical bug in the model initialization logic that could affect other models, along with a few high-severity issues related to potential runtime errors and several medium-severity issues for code cleanup. My feedback focuses on ensuring correctness, robustness, and maintainability.
| self.model = ( | ||
| Qwen2_5_VisionTransformerPretrainedModel(kvargs, **model_cfg["vision_config"]).eval().bfloat16() | ||
| ) | ||
| elif self.model_type == "qwen3_vl" or "qwen3_vl_moe": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition or "qwen3_vl_moe" is logically incorrect and will always evaluate to True because a non-empty string is truthy. This will cause this block to execute for any model type that doesn't match the preceding if/elif conditions, preventing other models from being initialized correctly. You should use in to check for membership in a list of model types.
| elif self.model_type == "qwen3_vl" or "qwen3_vl_moe": | |
| elif self.model_type in ["qwen3_vl", "qwen3_vl_moe"]: |
lightllm/models/llama/model.py
Outdated
| 模型特殊的一些初始化 | ||
| """ | ||
| rope_scaling = self.config.get("rope_scaling", None) | ||
| if "mrope_section" in rope_scaling: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition will raise a TypeError if rope_scaling is None, which is the default value assigned on the preceding line. You should add a check to ensure rope_scaling is not None before attempting to access it as a dictionary.
| if "mrope_section" in rope_scaling: | |
| if rope_scaling and "mrope_section" in rope_scaling: |
| h_bar = round(height / factor) * factor | ||
| w_bar = round(width / factor) * factor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculation for h_bar and w_bar could result in zero if height or width are very small compared to factor (e.g., if height / factor < 0.5). The previous implementation prevented this by using max(factor, ...). This logic has been moved into one branch of the conditional but is missing from the initial calculation. This could lead to errors or incorrect resizing for small images. It's safer to restore the max guard for the initial calculation.
| h_bar = round(height / factor) * factor | |
| w_bar = round(width / factor) * factor | |
| h_bar = max(factor, round(height / factor) * factor) | |
| w_bar = max(factor, round(width / factor) * factor) |
lightllm/models/llama/model.py
Outdated
| freqs = torch.outer(t, inv_freq) | ||
|
|
||
| self.freqs = freqs.cuda() | ||
| print(f"model.freqs is {self.freqs}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.position_cos = torch.cat([cos_half, cos_half], dim=-1).contiguous() # [L, d] | ||
| self.position_sin = torch.cat([sin_half, sin_half], dim=-1).contiguous() | ||
| if self.is_prefill: | ||
| pos = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the Qwen3-VL and Qwen3-VL-MoE models. The changes are extensive, covering new model definitions, a custom vision tower with deepstack feature injection, multi-modal RoPE (mROPE), and updates to weight loading for fused MoE layers. The implementation also includes a mechanism for passing deepstack features via shared memory. Additionally, there's a performance improvement for the Whisper model by integrating SDPA.
The overall approach is sound, but I've found a few critical issues that need to be addressed, particularly a potential TypeError in RoPE initialization and a buggy conditional in the visual server. I've also noted some areas for code cleanup, such as removing debug prints and dead code.
lightllm/models/llama/model.py
Outdated
| if "mrope_section" in rope_scaling: | ||
| self.mrope_section = rope_scaling["mrope_section"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check if "mrope_section" in rope_scaling: will raise a TypeError if rope_scaling is None, which is a possible value assigned on the preceding line. You should ensure rope_scaling is not None before attempting to access it.
| if "mrope_section" in rope_scaling: | |
| self.mrope_section = rope_scaling["mrope_section"] | |
| if rope_scaling and "mrope_section" in rope_scaling: |
| self.model = ( | ||
| Qwen2_5_VisionTransformerPretrainedModel(kvargs, **model_cfg["vision_config"]).eval().bfloat16() | ||
| ) | ||
| elif self.model_type == "qwen3_vl" or "qwen3_vl_moe": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition self.model_type == "qwen3_vl" or "qwen3_vl_moe" is always true because the non-empty string "qwen3_vl_moe" evaluates to True. This will cause incorrect model loading for any subsequent model types. This should be corrected to check for membership in a list of model types.
| elif self.model_type == "qwen3_vl" or "qwen3_vl_moe": | |
| elif self.model_type in ["qwen3_vl", "qwen3_vl_moe"]: |
| if self.fused_gate_up: | ||
| raise ValueError("qwen3_vl_moe not support quant now") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good safeguard to prevent running quantization on a model variant that does not support it yet. However, instead of raising a ValueError, it would be more informative to log a warning and skip quantization for this case, allowing the model to load and run in a non-quantized mode. This would provide more flexibility.
lightllm/models/llama/model.py
Outdated
| freqs = torch.outer(t, inv_freq) | ||
|
|
||
| self.freqs = freqs.cuda() | ||
| print(f"model.freqs is {self.freqs}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def resize_image( | ||
| image_file: Image.Image, factor: int = IMAGE_FACTOR, min_pixels: int = MIN_PIXELS, max_pixels: int = MAX_PIXELS | ||
| ) -> tuple[Image.Image, int, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function resize_image is type-hinted to return a tuple[Image.Image, int, int], but the implementation only returns a single Image.Image object. The type hint should be corrected to match the actual return value.
| def resize_image( | |
| image_file: Image.Image, factor: int = IMAGE_FACTOR, min_pixels: int = MIN_PIXELS, max_pixels: int = MAX_PIXELS | |
| ) -> tuple[Image.Image, int, int]: | |
| def resize_image( | |
| image_file: Image.Image, factor: int = IMAGE_FACTOR, min_pixels: int = MIN_PIXELS, max_pixels: int = MAX_PIXELS | |
| ) -> Image.Image: |
| args: | ||
| x: (3, bs, seq_len, head_dim // 2) | ||
| mrope_section: (3,) | ||
| returns: | ||
| x_t: (bs, seq_len, head_dim // 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring for apply_interleaved_mrope is inaccurate. The argument is freqs with a shape of (3, seq_len, head_dim // 2), but the docstring refers to x with a bs (batch size) dimension, which is not present. The return value shape is also (seq_len, head_dim // 2). Please update the docstring for clarity.
| args: | |
| x: (3, bs, seq_len, head_dim // 2) | |
| mrope_section: (3,) | |
| returns: | |
| x_t: (bs, seq_len, head_dim // 2) | |
| args: | |
| freqs: (3, seq_len, head_dim // 2) | |
| mrope_section: (3,) | |
| returns: | |
| freqs_t: (seq_len, head_dim // 2) |
| axis_map = [] | ||
| for i, n in enumerate(self.mrope_section * 2): | ||
| axis_map += [i % 3] * n | ||
| self.axis_map = torch.tensor(axis_map, dtype=torch.int32, device="cuda") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ) | ||
|
|
||
|
|
||
| class Qwen3VLTransformerLayerWeight(Qwen3TransformerLayerWeight): # 后面看要不要改 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment // 后面看要不要改 (which translates to "check later if this needs to be changed") seems to be a temporary note and should be removed from the code.
| class Qwen3VLTransformerLayerWeight(Qwen3TransformerLayerWeight): # 后面看要不要改 | |
| class Qwen3VLTransformerLayerWeight(Qwen3TransformerLayerWeight): |
No description provided.