Lack of `exclude_layers_to_not_quantize` after `get_named_linears` #3

noah-kim-theori · 2024-02-21T07:14:06Z

Some models like mixtral have linear layers with hidden size smaller than 128

4096x8 weight matrix of block_sparse_moe.gate (number of experts = 8)

I found integer modulo by zero exception when quantizing mixtral.

Traceback (most recent call last):                                                                                                                       
  File "/home/noah/AIOS-demo/demo/test.script/test.quant.py", line 23, in <module>                                                                       
    model.quantize(                                                                                                                                      
  File "/home/noah/miniconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context                                    
    return func(*args, **kwargs)                                                                                                                         
           ^^^^^^^^^^^^^^^^^^^^^                                                                                                                         
  File "/QUICK/quick/awq/models/base.py", line 119, in quantize                                                                
    self.quantizer.quantize()                                                                                                                            
  File "/QUICK/quick/awq/quantize/quantizer.py", line 135, in quantize                                                         
    self._apply_quant(self.modules[i], named_linears)                                                                                                    
  File "/QUICK/quick/awq/quantize/quantizer.py", line 184, in _apply_quant
    q_linear = q_linear_module.from_linear(
File "/QUICK/quick/awq/modules/linear/quick.py", line 127, in from_linear
    ((x // 32) % (intweight.shape[1] // 128)) * 128 + \
     ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ZeroDivisionError: integer modulo by zero

For preventing it, layer exclusion after get_named_linear is required like quantizer.py#L112.

JHLEE17 · 2024-02-29T08:11:34Z

Hi @noah-kim-theori

Thank you so much for your contribution to our project through the pull request!
We appreciate your effort in identifying the issue with models like mixtral and the integer modulo by zero exception when quantizing.

I wanted to let you know that we have addressed the issue you pointed out in pull request #6, where we have implemented the necessary fixes based on your suggestions. Please refer to this for the changes we made.

Please let us know if there are any other enhancements you believe could benefit the project or if you have any more feedback.

Best regards,

noah-kim-theori and others added 2 commits February 21, 2024 14:16

Add layer exclusion after

976a360

Merge branch 'main' into main

f63f0cf

JHLEE17 merged commit 6029de9 into SqueezeBits:main Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lack of `exclude_layers_to_not_quantize` after `get_named_linears` #3

Lack of `exclude_layers_to_not_quantize` after `get_named_linears` #3

noah-kim-theori commented Feb 21, 2024

JHLEE17 commented Feb 29, 2024

Lack of exclude_layers_to_not_quantize after get_named_linears #3

Lack of exclude_layers_to_not_quantize after get_named_linears #3

Conversation

noah-kim-theori commented Feb 21, 2024

JHLEE17 commented Feb 29, 2024

Lack of `exclude_layers_to_not_quantize` after `get_named_linears` #3

Lack of `exclude_layers_to_not_quantize` after `get_named_linears` #3