Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of exclude_layers_to_not_quantize after get_named_linears #3

Merged
merged 2 commits into from
Feb 29, 2024

Conversation

noah-kim-theori
Copy link
Contributor

Some models like mixtral have linear layers with hidden size smaller than 128

  • 4096x8 weight matrix of block_sparse_moe.gate (number of experts = 8)

I found integer modulo by zero exception when quantizing mixtral.

Traceback (most recent call last):                                                                                                                       
  File "/home/noah/AIOS-demo/demo/test.script/test.quant.py", line 23, in <module>                                                                       
    model.quantize(                                                                                                                                      
  File "/home/noah/miniconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context                                    
    return func(*args, **kwargs)                                                                                                                         
           ^^^^^^^^^^^^^^^^^^^^^                                                                                                                         
  File "/QUICK/quick/awq/models/base.py", line 119, in quantize                                                                
    self.quantizer.quantize()                                                                                                                            
  File "/QUICK/quick/awq/quantize/quantizer.py", line 135, in quantize                                                         
    self._apply_quant(self.modules[i], named_linears)                                                                                                    
  File "/QUICK/quick/awq/quantize/quantizer.py", line 184, in _apply_quant
    q_linear = q_linear_module.from_linear(
File "/QUICK/quick/awq/modules/linear/quick.py", line 127, in from_linear
    ((x // 32) % (intweight.shape[1] // 128)) * 128 + \
     ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ZeroDivisionError: integer modulo by zero

For preventing it, layer exclusion after get_named_linear is required like quantizer.py#L112.

@JHLEE17
Copy link
Contributor

JHLEE17 commented Feb 29, 2024

Hi @noah-kim-theori

Thank you so much for your contribution to our project through the pull request!
We appreciate your effort in identifying the issue with models like mixtral and the integer modulo by zero exception when quantizing.

I wanted to let you know that we have addressed the issue you pointed out in pull request #6, where we have implemented the necessary fixes based on your suggestions. Please refer to this for the changes we made.

Please let us know if there are any other enhancements you believe could benefit the project or if you have any more feedback.

Best regards,

@JHLEE17 JHLEE17 merged commit 6029de9 into SqueezeBits:main Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants