-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
量化 int4,报错"RuntimeError: CUDA Error: no kernel image is available for execution on the device" #56
Comments
我也遇到这个问题,用官方的方式int4量化,推理有问题 |
能不能贴一下你的代码? |
|
我的代码和你差不多: 可以正常运行 |
|
解决了吗 遇到一样的问题 |
quantizer.py里面的kernel有问题,可以用chatglm2的代码进行替换。 |
量化 int4,
NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0
运行“streamlit run web_demo.py”可以正常启动,但是问问题后,就报错。
[user] 你是谁?
2023-07-13 12:53:14.567 Uncaught app exception
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "/root/Baichuan-13B/web_demo.py", line 72, in
main()
File "/root/Baichuan-13B/web_demo.py", line 61, in main
for response in model.chat(tokenizer, messages, stream=True):
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 527, in stream_generator
for token in self.generate(input_ids, generation_config=stream_config):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/usr/local/lib/python3.8/dist-packages/transformers_stream_generator/main.py", line 931, in sample_stream
outputs = self(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 382, in forward
outputs = self.model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 325, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 178, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 113, in forward
proj = self.W_pack(hidden_states)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/quantizer.py", line 116, in forward
rweight = dequant4(self.weight, self.scale, input).T
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/quantizer.py", line 82, in dequant4
kernels.int4_to_fp16(
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/base.py", line 48, in call
func = self._prepare_func()
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/base.py", line 40, in _prepare_func
self._module.get_module(), self._func_name
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/base.py", line 24, in get_module
self._module[curr_device] = cuda.cuModuleLoadData(self._code)
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/library/base.py", line 94, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/library/cuda.py", line 233, in cuModuleLoadData
checkCUStatus(cuda.cuModuleLoadData(ctypes.byref(module), data))
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/library/cuda.py", line 216, in checkCUStatus
raise RuntimeError("CUDA Error: %s" % cuGetErrorString(error))
RuntimeError: CUDA Error: no kernel image is available for execution on the device
The text was updated successfully, but these errors were encountered: