We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
使用 decoupleQ 量化了一个 llama 33B的模型,推理时报错
Traceback (most recent call last): File "/mnt/afs/quantization/test/decoupleQ/llama.py", line 476, in <module> model_output = model.generate(input_token_ids_tensor, max_length=40, do_sample=False) ... File "/mnt/afs/quantization/test/decoupleQ/decoupleQ/linear_w2a16.py", line 36, in forward output = dQ_asymm_qw2_gemm(input, self.weight, self.scale, self.zp, self.bias, self.group_size) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Unsupported compute type Float
请问你们是否有遇到过该问题?
谢谢!
The text was updated successfully, but these errors were encountered:
@ChuanhongLi 是否中间推理出现了输入全为nan的结果
Sorry, something went wrong.
不是很确定,现在量化的模型,都有些问题 #8
No branches or pull requests
使用 decoupleQ 量化了一个 llama 33B的模型,推理时报错
请问你们是否有遇到过该问题?
谢谢!
The text was updated successfully, but these errors were encountered: