[BUG/Help] 加载模型时遇到 RuntimeError: Library cuda is not initialized 问题 #839

TJJ120635 · 2023-04-27T07:41:24Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

我在参考官方 chatglm-6b-int4 样例提供的代码来进行测试

# 测试 CUDA 可用性
import torch
print(torch.__version__)
print(torch.version.cuda)
print(torch.cuda.is_available())
from torch.backends import  cudnn
print(cudnn.is_available())

# 测试 GLM 模型
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("chatglm-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("chatglm-6b-int4", trust_remote_code=True).half().cuda()
print("First:")
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
print("Second:")
response, history = model.chat(tokenizer, "请介绍一下你自己", history=history)
print(response)

使用 Yolov5 项目来跑目标检测，测试过 CUDA 正常可用
在 GLM 测试程序开头也添加了 CUDA 检测部分，输出正常
但是在实际运行模型的过程中，进行到 response, history = model.chat(tokenizer, "你好", history=[]) 一句时，cpm_kernels 库提示了 RuntimeError: Library cuda is not initialized 问题

(查询过网上的相关问题，但是都找不到相同的情况，需要求助各位大佬)

完整报错如下：

(chat) root@ubuntu-virtual-machine:/data1/chat_model# python test.py
2.0.0
11.7
True
True
----------------------------------------------
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c -shared -o /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Load kernel : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Setting CPU quantization kernel threads to 2
Parallel kernel is not recommended when parallel num < 4.
Using quantization cache
Applying quantization to glm layers
First:
The dtype of attention mask (torch.int64) is not bool
Traceback (most recent call last):
  File "/data1/chat_model/test.py", line 15, in <module>
    response, history = model.chat(tokenizer, "你好", history=[])
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 1286, in chat
    outputs = self.generate(**inputs, **gen_kwargs)
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/transformers/generation/utils.py", line 1452, in generate
    return self.sample(
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/transformers/generation/utils.py", line 2468, in sample
    outputs = self(
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 1191, in forward
    transformer_outputs = self.transformer(
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 997, in forward
    layer_ret = layer(
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 627, in forward
    attention_outputs = self.attention(
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 445, in forward
    mixed_raw_layer = self.query_key_value(hidden_states)
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 375, in forward
    output = W8A16Linear.apply(input, self.weight, self.weight_scale, self.weight_bit_width)
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 53, in forward
    weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 274, in extract_weight_to_half
    func(
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/cpm_kernels/kernels/base.py", line 48, in __call__
    func = self._prepare_func()
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/cpm_kernels/kernels/base.py", line 40, in _prepare_func
    self._module.get_module(), self._func_name
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/cpm_kernels/kernels/base.py", line 24, in get_module
    self._module[curr_device] = cuda.cuModuleLoadData(self._code)
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/cpm_kernels/library/base.py", line 72, in wrapper
    raise RuntimeError("Library %s is not initialized" % self.__name)
RuntimeError: Library cuda is not initialized

Expected Behavior

No response

Steps To Reproduce

目录结构：
test.py (or cli_demo.py)
chatglm-6b-int4
---- 从 huggingface/THUDM/ChatGLM-6B-int4 克隆的完整项目与权重

测试语句：
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("chatglm-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("chatglm-6b-int4", trust_remote_code=True).half().cuda()
print("First:")
response, history = model.chat(tokenizer, "你好", history=[])

Environment

- OS: Ubuntu 20.04.5 LTS
- Python: 3.9
- Transformers: 4.27.1
- PyTorch: 2.0.0

- GPU: Nvidia Tesla P40
- Driver: 515.65.01
- CUDA: 11.7
- CuDNN: cudnn-linux-x86_64-8.9.0.131_cuda11
- CUDA Support: True

Anything else?

No response

The text was updated successfully, but these errors were encountered:

TJJ120635 · 2023-04-27T08:36:40Z

通过查找源代码，在 envs/chat/lib/python3.9/site-packages/cpm_kernels/library/base.py 中找到了下面的段落

# Line 21
def unix_find_lib(name):
    cuda_path = os.environ.get("CUDA_PATH", None)
    if cuda_path is not None:
        lib_name = os.path.join(cuda_path, "lib64", "lib%s.so" % name)
        if os.path.exists(lib_name):
            return lib_name

    cuda_path = "/usr/local/cuda"
    if cuda_path is not None:
        lib_name = os.path.join(cuda_path, "lib64", "lib%s.so" % name)
        if os.path.exists(lib_name):
            return lib_name

# Line 41
class Lib:
    def __init__(self, name):
        self.__name = name
        if sys.platform.startswith("win"):
            lib_path = windows_find_lib(self.__name)
            self.__lib_path = lib_path
            if lib_path is not None:
                self.__lib = ctypes.WinDLL(lib_path)
            else:
                self.__lib = None
        elif sys.platform.startswith("linux"):
            lib_path = unix_find_lib(self.__name)
            self.__lib_path = lib_path
            if lib_path is not None:
                self.__lib = ctypes.cdll.LoadLibrary(lib_path)
            else:
                self.__lib = None
        else:
            raise RuntimeError("Unknown platform: %s" % sys.platform)

我在第 52 行修改了代码：

lib_path = unix_find_lib(self.__name)
# Edit Here
print(name, ':', lib_path)
self.__lib_path = lib_path

随后运行程序，发现缺少了 cuda 库（为什么？）
似乎是缺少了一个 .so 文件，需要进一步排查

TJJ120635 · 2023-04-28T09:06:10Z

经过检查是由于CUDA目录 /usr/local/cuda/lib64 中缺少 libcuda.so 文件
后续在 /usr/lib/x86_64-linux-gnu 找到 libcuda.so.515.65.01*
将文件复制到CUDA目录中，然后创建一个软连接 libcuda.so -> libcuda.so.515.65.01*
可以成功读取，解决问题

zsscpr · 2023-06-09T01:53:07Z

我也是发现/usr/local/cuda/lib64 中缺少 libcuda.so 文件，我是这样建立的软连接：ln -s /usr/lib/x86_64-linux-gnu/libcuda.so /usr/local/cuda/lib64/libcuda.so，软连接建立后，问题解决

fairylulu · 2024-02-08T07:31:12Z

方法很有效～感谢～

TJJ120635 closed this as completed Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG/Help] 加载模型时遇到 RuntimeError: Library cuda is not initialized 问题 #839

[BUG/Help] 加载模型时遇到 RuntimeError: Library cuda is not initialized 问题 #839

TJJ120635 commented Apr 27, 2023

TJJ120635 commented Apr 27, 2023 •

edited

Loading

TJJ120635 commented Apr 28, 2023

zsscpr commented Jun 9, 2023 •

edited

Loading

fairylulu commented Feb 8, 2024

[BUG/Help] 加载模型时遇到 RuntimeError: Library cuda is not initialized 问题 #839

[BUG/Help] 加载模型时遇到 RuntimeError: Library cuda is not initialized 问题 #839

Comments

TJJ120635 commented Apr 27, 2023

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

TJJ120635 commented Apr 27, 2023 • edited Loading

TJJ120635 commented Apr 28, 2023

zsscpr commented Jun 9, 2023 • edited Loading

fairylulu commented Feb 8, 2024

TJJ120635 commented Apr 27, 2023 •

edited

Loading

zsscpr commented Jun 9, 2023 •

edited

Loading