量化 int4，报错"RuntimeError: CUDA Error: no kernel image is available for execution on the device" #56

landxman · 2023-07-13T13:15:10Z

量化 int4,
NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0
运行“streamlit run web_demo.py”可以正常启动，但是问问题后，就报错。

[user] 你是谁？
2023-07-13 12:53:14.567 Uncaught app exception
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.dict)
File "/root/Baichuan-13B/web_demo.py", line 72, in
main()
File "/root/Baichuan-13B/web_demo.py", line 61, in main
for response in model.chat(tokenizer, messages, stream=True):
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 527, in stream_generator
for token in self.generate(input_ids, generation_config=stream_config):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/usr/local/lib/python3.8/dist-packages/transformers_stream_generator/main.py", line 931, in sample_stream
outputs = self(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 382, in forward
outputs = self.model(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 325, in forward
layer_outputs = decoder_layer(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 178, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/modeling_baichuan.py", line 113, in forward
proj = self.W_pack(hidden_states)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/quantizer.py", line 116, in forward
rweight = dequant4(self.weight, self.scale, input).T
File "/root/.cache/huggingface/modules/transformers_modules/Baichuan-13B-Chat/quantizer.py", line 82, in dequant4
kernels.int4_to_fp16(
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/base.py", line 48, in call
func = self._prepare_func()
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/base.py", line 40, in _prepare_func
self._module.get_module(), self._func_name
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/kernels/base.py", line 24, in get_module
self._module[curr_device] = cuda.cuModuleLoadData(self._code)
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/library/base.py", line 94, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/library/cuda.py", line 233, in cuModuleLoadData
checkCUStatus(cuda.cuModuleLoadData(ctypes.byref(module), data))
File "/usr/local/lib/python3.8/dist-packages/cpm_kernels/library/cuda.py", line 216, in checkCUStatus
raise RuntimeError("CUDA Error: %s" % cuGetErrorString(error))
RuntimeError: CUDA Error: no kernel image is available for execution on the device

sun1092469590 · 2023-07-14T02:15:00Z

我也遇到这个问题，用官方的方式int4量化，推理有问题

jameswu2014 · 2023-07-14T02:43:17Z

能不能贴一下你的代码？

landxman · 2023-07-14T04:24:33Z

def init_model():
    print("init model ...")
    model = AutoModelForCausalLM.from_pretrained(
        "/data/baichuan/Baichuan-13B-Chat",
        torch_dtype=torch.float16,
        trust_remote_code=True
    )
    model = model.quantize(4).cuda()
    model.generation_config = GenerationConfig.from_pretrained(
        "/data/baichuan/Baichuan-13B-Chat"
    )
    tokenizer = AutoTokenizer.from_pretrained(
        "/data/baichuan/Baichuan-13B-Chat",
    #    use_fast=False,
        trust_remote_code=True
    )
    return model, tokenizer

jameswu2014 · 2023-07-14T06:24:59Z

def init_model():
    print("init model ...")
    model = AutoModelForCausalLM.from_pretrained(
        "/data/baichuan/Baichuan-13B-Chat",
        torch_dtype=torch.float16,
        trust_remote_code=True
    )
    model = model.quantize(4).cuda()
    model.generation_config = GenerationConfig.from_pretrained(
        "/data/baichuan/Baichuan-13B-Chat"
    )
    tokenizer = AutoTokenizer.from_pretrained(
        "/data/baichuan/Baichuan-13B-Chat",
    #    use_fast=False,
        trust_remote_code=True
    )
    return model, tokenizer

我的代码和你差不多：
def init_model():
model = AutoModelForCausalLM.from_pretrained(
"baichuan-inc/Baichuan-13B-Chat",
torch_dtype=torch.float16,
# device_map="auto",
trust_remote_code=True
)
model = model.quantize(4).cuda()
model.generation_config = GenerationConfig.from_pretrained(
"baichuan-inc/Baichuan-13B-Chat"
)
tokenizer = AutoTokenizer.from_pretrained(
"baichuan-inc/Baichuan-13B-Chat",
use_fast=False,
trust_remote_code=True
)
return model, tokenizer

可以正常运行

bxjxxyy · 2023-07-20T01:55:43Z

def init_model():
    print("init model ...")
    model = AutoModelForCausalLM.from_pretrained(
        "/data/baichuan/Baichuan-13B-Chat",
        torch_dtype=torch.float16,
        trust_remote_code=True
    )
    model = model.quantize(4).cuda()
    model.generation_config = GenerationConfig.from_pretrained(
        "/data/baichuan/Baichuan-13B-Chat"
    )
    tokenizer = AutoTokenizer.from_pretrained(
        "/data/baichuan/Baichuan-13B-Chat",
    #    use_fast=False,
        trust_remote_code=True
    )
    return model, tokenizer
我的代码和你差不多： def init_model(): model = AutoModelForCausalLM.from_pretrained( "baichuan-inc/Baichuan-13B-Chat", torch_dtype=torch.float16, # device_map="auto", trust_remote_code=True ) model = model.quantize(4).cuda() model.generation_config = GenerationConfig.from_pretrained( "baichuan-inc/Baichuan-13B-Chat" ) tokenizer = AutoTokenizer.from_pretrained( "baichuan-inc/Baichuan-13B-Chat", use_fast=False, trust_remote_code=True ) return model, tokenizer

可以正常运行

我的和你一样无法运行，报错跟楼主一样。

dalong2hongmei · 2023-08-09T06:25:15Z

解决了吗遇到一样的问题

shesung · 2023-08-09T09:45:22Z

quantizer.py里面的kernel有问题，可以用chatglm2的代码进行替换。
https://gist.github.com/shesung/3acd80c22a19d3e019553ad7e497a707

GradientGuru assigned jameswu2014 Jul 17, 2023

jameswu2014 closed this as completed Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

量化 int4，报错"RuntimeError: CUDA Error: no kernel image is available for execution on the device" #56

量化 int4，报错"RuntimeError: CUDA Error: no kernel image is available for execution on the device" #56

landxman commented Jul 13, 2023

sun1092469590 commented Jul 14, 2023

jameswu2014 commented Jul 14, 2023

landxman commented Jul 14, 2023

jameswu2014 commented Jul 14, 2023

bxjxxyy commented Jul 20, 2023

dalong2hongmei commented Aug 9, 2023

shesung commented Aug 9, 2023

量化 int4，报错"RuntimeError: CUDA Error: no kernel image is available for execution on the device" #56

量化 int4，报错"RuntimeError: CUDA Error: no kernel image is available for execution on the device" #56

Comments

landxman commented Jul 13, 2023

sun1092469590 commented Jul 14, 2023

jameswu2014 commented Jul 14, 2023

landxman commented Jul 14, 2023

jameswu2014 commented Jul 14, 2023

bxjxxyy commented Jul 20, 2023

dalong2hongmei commented Aug 9, 2023

shesung commented Aug 9, 2023