fix mm slicer #1104

hiworldwzj · 2025-11-13T02:32:54Z

No description provided.

gemini-code-assist · 2025-11-13T02:33:12Z

Summary of Changes

Hello @hiworldwzj, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug in the handling of AWQ (Activation-aware Weight Quantization) weights within the lightllm framework, particularly concerning their slicing for matrix multiplication operations. The changes introduce specialized mixins to correctly process AWQ's unique inxout weight storage format and streamline the bias division logic across various weight slicing implementations. This ensures accurate and efficient processing of quantized models.

Highlights

Introduction of AWQ-Specific Slicer Mixins: New classes, AwqQuantizedRowSliceMixin and AwqQuantizedColSliceMixin, have been added to specifically handle the unique inxout storage format of AWQ (Activation-aware Weight Quantization) weights, ensuring correct slicing behavior.
Refactored Bias Slicing Logic: The bias_div_world_size parameter has been removed from the constructors of several base SliceMixin classes. The logic for bias division is now directly embedded within the _slice_bias methods of ColSliceMixin and the newly introduced AWQ slicers, simplifying the API and ensuring consistent behavior.
Integration of New AWQ Slicers: The AWQCOLMMWeight and AWQROWMMWeight classes have been updated to utilize the appropriate AwqQuantizedColSliceMixin and AwqQuantizedRowSliceMixin respectively, replacing previous generic quantized slicers and removing the now redundant bias_div_world_size argument.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the tensor-parallel slicing logic, particularly to add support for AWQ quantized weights which have a different memory layout. The changes simplify bias handling by removing the bias_div_world_size flag, which is a good improvement. However, I've found two critical issues where the new AWQ slicer classes are swapped between the column-parallel and row-parallel weight implementations, which will cause the model to fail. Additionally, the naming convention for all slicer classes is counter-intuitive and is likely the root cause of these bugs; I've suggested a refactoring to improve clarity and maintainability.

gemini-code-assist · 2025-11-13T02:35:07Z

lightllm/common/basemodel/layer_weights/meta_weights/mm_weight/colmm_weight.py

-        self.param_slicer = QuantizedRowSliceMixin(
-            tp_rank=tp_rank, tp_world_size=tp_world_size, bias_div_world_size=True
-        )
+        self.param_slicer = AwqQuantizedColSliceMixin(tp_rank=tp_rank, tp_world_size=tp_world_size)


There appears to be a mix-up in the slicer class used here. AWQCOLMMWeight implements a column-parallel linear layer. For AWQ's in x out weight layout, this requires slicing along dimension 1 (the out dimension). The correct slicer for this is AwqQuantizedRowSliceMixin, which correctly handles slicing for weights, scales, zero-points, and bias for column-parallelism with AWQ weights.

However, AwqQuantizedColSliceMixin is used, which is designed for row-parallelism. This will result in incorrect tensor slicing and will break the model. You'll also need to update the import on line 11 to bring in AwqQuantizedRowSliceMixin.

Suggested change

self.param_slicer = AwqQuantizedColSliceMixin(tp_rank=tp_rank, tp_world_size=tp_world_size)

self.param_slicer = AwqQuantizedRowSliceMixin(tp_rank=tp_rank, tp_world_size=tp_world_size)

gemini-code-assist · 2025-11-13T02:35:07Z

lightllm/common/basemodel/layer_weights/meta_weights/mm_weight/rowmm_weight.py

-            tp_rank=tp_rank, tp_world_size=tp_world_size, bias_div_world_size=False
-        )
+
+        self.param_slicer = AwqQuantizedRowSliceMixin(tp_rank=tp_rank, tp_world_size=tp_world_size)


There appears to be a mix-up in the slicer class used here. AWQROWMMWeight implements a row-parallel linear layer. For AWQ's in x out weight layout, this requires slicing along dimension 0 (the in dimension). The correct slicer for this is AwqQuantizedColSliceMixin, which correctly handles slicing for weights, scales, zero-points, and bias for row-parallelism with AWQ weights.

However, AwqQuantizedRowSliceMixin is used, which is designed for column-parallelism. This will result in incorrect tensor slicing and will break the model. You'll also need to update the import on line 12 to bring in AwqQuantizedColSliceMixin.

Suggested change

self.param_slicer = AwqQuantizedRowSliceMixin(tp_rank=tp_rank, tp_world_size=tp_world_size)

self.param_slicer = AwqQuantizedColSliceMixin(tp_rank=tp_rank, tp_world_size=tp_world_size)

gemini-code-assist · 2025-11-13T02:35:07Z

lightllm/common/basemodel/layer_weights/meta_weights/mm_weight/mm_slicer.py

+# awq 的量化权重是inxout存储格式，需要定制实现。
+class AwqQuantizedRowSliceMixin(QuantizedColSliceMixin):
+    def __init__(self, tp_rank: int = None, tp_world_size: int = None):
+        super().__init__(tp_rank, tp_world_size)
+
+    def _slice_bias(self, bias: torch.Tensor) -> torch.Tensor:
+        assert bias.shape[0] % self.tp_world_size_ == 0, f"tp slice error {bias.shape[0]} % {self.tp_world_size_}"
+        tp_size = bias.shape[0] // self.tp_world_size_
+        return bias[tp_size * self.tp_rank_ : tp_size * (self.tp_rank_ + 1)]
+
+
+class AwqQuantizedColSliceMixin(QuantizedRowSliceMixin):
+    def __init__(self, tp_rank: int = None, tp_world_size: int = None):
+        super().__init__(tp_rank, tp_world_size)
+
+    def _slice_bias(self, bias: torch.Tensor) -> torch.Tensor:
+        return bias / self.tp_world_size_


The naming of these new AWQ slicer classes, and the base slicer classes they inherit from, is counter-intuitive and has likely led to the bugs in colmm_weight.py and rowmm_weight.py.

AwqQuantizedRowSliceMixin is for column-parallelism.

AwqQuantizedColSliceMixin is for row-parallelism.

This is because they are named based on the dimension they slice for a standard out x in weight, but for AWQ's in x out weights, the slicing dimension is swapped for a given parallelism type. This makes the code very difficult to reason about.

To improve clarity and prevent future bugs, I strongly recommend renaming all slicer classes to reflect the parallelism strategy they implement, rather than the dimension they slice. For example:

RowSliceMixin -> ColumnParallelSliceMixin

ColSliceMixin -> RowParallelSliceMixin

AwqQuantizedRowSliceMixin -> AwqColumnParallelSliceMixin

AwqQuantizedColSliceMixin -> AwqRowParallelSliceMixin

fix awq slicer

722e0e9

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

shihaobai merged commit 8fd19c2 into main Nov 13, 2025
1 check passed

shihaobai deleted the wzj branch November 13, 2025 06:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix mm slicer #1104

fix mm slicer #1104

Uh oh!

hiworldwzj commented Nov 13, 2025

Uh oh!

gemini-code-assist bot commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Uh oh!

gemini-code-assist bot Nov 13, 2025

Uh oh!

gemini-code-assist bot Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	self.param_slicer = AwqQuantizedColSliceMixin(tp_rank=tp_rank, tp_world_size=tp_world_size)
	self.param_slicer = AwqQuantizedRowSliceMixin(tp_rank=tp_rank, tp_world_size=tp_world_size)

fix mm slicer #1104

fix mm slicer #1104

Uh oh!

Conversation

hiworldwzj commented Nov 13, 2025

Uh oh!

gemini-code-assist bot commented Nov 13, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants