Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix result dtype conversion in QuantLinear.forward() #390

Closed
wants to merge 1 commit into from

Conversation

vivekkhandelwal1
Copy link
Contributor

Fixes: #385 (comment)

Signed-Off By: Vivek Khandelwal vivek@nod-labs.com

Fixes: AutoGPTQ#385 (comment)

Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
@vivekkhandelwal1
Copy link
Contributor Author

@fxmarty, can you please review this PR?

@@ -268,8 +268,8 @@ def forward(self, x: torch.Tensor):
g_idx_i = self.g_idx[i*num_dim:(i+1)*num_dim]
weights.append(scale_i[g_idx_i.long()] * (weight_i - zeros_i[g_idx_i.long()]))
weights = torch.cat(weights,dim=1)
out = torch.matmul(x, weights)
out = out.to(dtype=weights.dtype).reshape(out_shape)
out = torch.matmul(x, weights).to(dtype=weights.dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to remember why these casts are needed in the first place. Shouldn't the activation & weight be of same dtype (either both fp32, either both fp16)?

@vivekkhandelwal1
Copy link
Contributor Author

vivekkhandelwal1 commented Nov 1, 2023

EDIT: The error is happening because of this change: a7d61ca#diff-c4c2bf0dd8440248a29510131f06affa3c2ab00d1bd7ca507dc0b7125a04f825R20

@fxmarty, I'm getting the following error:

File "/home/vivek/work/vivek-AutoGPTQ/repro_gptq.py", line 13, in <module>
    model = AutoModelForCausalLM.from_pretrained(checkpoint, low_cpu_mem_usage=True, device_map="cpu", quantization_config=quantization_config, torch_dtype=torch.float32)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vivek/work/shark-vivekkhandelwal1/shark.venv/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vivek/work/shark-vivekkhandelwal1/shark.venv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2713, in from_pretrained
    from optimum.gptq import GPTQQuantizer
  File "/home/vivek/work/shark-vivekkhandelwal1/shark.venv/lib/python3.11/site-packages/optimum/gptq/__init__.py", line 15, in <module>
    from .quantizer import GPTQQuantizer, load_quantized_model
  File "/home/vivek/work/shark-vivekkhandelwal1/shark.venv/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 44, in <module>
    from auto_gptq import exllama_set_max_input_length
  File "/home/vivek/work/vivek-AutoGPTQ/auto_gptq/__init__.py", line 4, in <module>
    from .utils.peft_utils import get_gptq_peft_model
  File "/home/vivek/work/vivek-AutoGPTQ/auto_gptq/utils/peft_utils.py", line 20, in <module>
    from ..nn_modules.qlinear.qlinear_exllama import QuantLinear as QuantLinearExllama
  File "/home/vivek/work/vivek-AutoGPTQ/auto_gptq/nn_modules/qlinear/qlinear_exllama.py", line 14, in <module>
    from exllama_kernels import make_q4, q4_matmul
ModuleNotFoundError: No module named 'exllama_kernels'

For the following code:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig

checkpoint = "TheBloke/Mistral-7B-Instruct-v0.1-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
quantization_config = GPTQConfig(bits=4, disable_exllama=True)

model = AutoModelForCausalLM.from_pretrained(checkpoint, low_cpu_mem_usage=True, device_map="cpu", quantization_config=quantization_config, torch_dtype=torch.float32)

inputs = tokenizer.encode("Hello how are you?", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=4, do_sample=False)
print(tokenizer.decode(outputs[0]))

Is this happening because of this commit: bcd1406

@fxmarty
Copy link
Collaborator

fxmarty commented Nov 2, 2023

Hi thank you - superseded by #393

Note this bug in accelerate: huggingface/accelerate#2116

@fxmarty fxmarty closed this Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants