Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/Help] 加载模型时遇到 RuntimeError: Library cuda is not initialized 问题 #839

Closed
1 task done
TJJ120635 opened this issue Apr 27, 2023 · 4 comments
Closed
1 task done

Comments

@TJJ120635
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

我在参考官方 chatglm-6b-int4 样例提供的代码来进行测试

# 测试 CUDA 可用性
import torch
print(torch.__version__)
print(torch.version.cuda)
print(torch.cuda.is_available())
from torch.backends import  cudnn
print(cudnn.is_available())

# 测试 GLM 模型
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("chatglm-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("chatglm-6b-int4", trust_remote_code=True).half().cuda()
print("First:")
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
print("Second:")
response, history = model.chat(tokenizer, "请介绍一下你自己", history=history)
print(response)

使用 Yolov5 项目来跑目标检测,测试过 CUDA 正常可用
在 GLM 测试程序开头也添加了 CUDA 检测部分,输出正常
但是在实际运行模型的过程中,进行到 response, history = model.chat(tokenizer, "你好", history=[]) 一句时,cpm_kernels 库提示了 RuntimeError: Library cuda is not initialized 问题

(查询过网上的相关问题,但是都找不到相同的情况,需要求助各位大佬)

5

完整报错如下:

(chat) root@ubuntu-virtual-machine:/data1/chat_model# python test.py
2.0.0
11.7
True
True
----------------------------------------------
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.c -shared -o /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Load kernel : /root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization_kernels_parallel.so
Setting CPU quantization kernel threads to 2
Parallel kernel is not recommended when parallel num < 4.
Using quantization cache
Applying quantization to glm layers
First:
The dtype of attention mask (torch.int64) is not bool
Traceback (most recent call last):
  File "/data1/chat_model/test.py", line 15, in <module>
    response, history = model.chat(tokenizer, "你好", history=[])
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 1286, in chat
    outputs = self.generate(**inputs, **gen_kwargs)
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/transformers/generation/utils.py", line 1452, in generate
    return self.sample(
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/transformers/generation/utils.py", line 2468, in sample
    outputs = self(
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 1191, in forward
    transformer_outputs = self.transformer(
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 997, in forward
    layer_ret = layer(
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 627, in forward
    attention_outputs = self.attention(
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/modeling_chatglm.py", line 445, in forward
    mixed_raw_layer = self.query_key_value(hidden_states)
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 375, in forward
    output = W8A16Linear.apply(input, self.weight, self.weight_scale, self.weight_bit_width)
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 53, in forward
    weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
  File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b-int4/quantization.py", line 274, in extract_weight_to_half
    func(
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/cpm_kernels/kernels/base.py", line 48, in __call__
    func = self._prepare_func()
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/cpm_kernels/kernels/base.py", line 40, in _prepare_func
    self._module.get_module(), self._func_name
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/cpm_kernels/kernels/base.py", line 24, in get_module
    self._module[curr_device] = cuda.cuModuleLoadData(self._code)
  File "/root/anaconda3/envs/chat/lib/python3.9/site-packages/cpm_kernels/library/base.py", line 72, in wrapper
    raise RuntimeError("Library %s is not initialized" % self.__name)
RuntimeError: Library cuda is not initialized

Expected Behavior

No response

Steps To Reproduce

目录结构:
test.py (or cli_demo.py)
chatglm-6b-int4
---- 从 huggingface/THUDM/ChatGLM-6B-int4 克隆的完整项目与权重

测试语句:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("chatglm-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("chatglm-6b-int4", trust_remote_code=True).half().cuda()
print("First:")
response, history = model.chat(tokenizer, "你好", history=[])

Environment

- OS: Ubuntu 20.04.5 LTS
- Python: 3.9
- Transformers: 4.27.1
- PyTorch: 2.0.0

- GPU: Nvidia Tesla P40
- Driver: 515.65.01
- CUDA: 11.7
- CuDNN: cudnn-linux-x86_64-8.9.0.131_cuda11
- CUDA Support: True

Anything else?

No response

@TJJ120635
Copy link
Author

TJJ120635 commented Apr 27, 2023

通过查找源代码,在 envs/chat/lib/python3.9/site-packages/cpm_kernels/library/base.py 中找到了下面的段落

# Line 21
def unix_find_lib(name):
    cuda_path = os.environ.get("CUDA_PATH", None)
    if cuda_path is not None:
        lib_name = os.path.join(cuda_path, "lib64", "lib%s.so" % name)
        if os.path.exists(lib_name):
            return lib_name

    cuda_path = "/usr/local/cuda"
    if cuda_path is not None:
        lib_name = os.path.join(cuda_path, "lib64", "lib%s.so" % name)
        if os.path.exists(lib_name):
            return lib_name

# Line 41
class Lib:
    def __init__(self, name):
        self.__name = name
        if sys.platform.startswith("win"):
            lib_path = windows_find_lib(self.__name)
            self.__lib_path = lib_path
            if lib_path is not None:
                self.__lib = ctypes.WinDLL(lib_path)
            else:
                self.__lib = None
        elif sys.platform.startswith("linux"):
            lib_path = unix_find_lib(self.__name)
            self.__lib_path = lib_path
            if lib_path is not None:
                self.__lib = ctypes.cdll.LoadLibrary(lib_path)
            else:
                self.__lib = None
        else:
            raise RuntimeError("Unknown platform: %s" % sys.platform)

我在第 52 行修改了代码:

lib_path = unix_find_lib(self.__name)
# Edit Here
print(name, ':', lib_path)
self.__lib_path = lib_path

随后运行程序,发现缺少了 cuda 库(为什么?)
似乎是缺少了一个 .so 文件,需要进一步排查

6

@TJJ120635
Copy link
Author

经过检查是由于CUDA目录 /usr/local/cuda/lib64 中缺少 libcuda.so 文件
后续在 /usr/lib/x86_64-linux-gnu 找到 libcuda.so.515.65.01*
将文件复制到CUDA目录中,然后创建一个软连接 libcuda.so -> libcuda.so.515.65.01*
可以成功读取,解决问题

@zsscpr
Copy link

zsscpr commented Jun 9, 2023

我也是发现/usr/local/cuda/lib64 中缺少 libcuda.so 文件,我是这样建立的软连接:ln -s /usr/lib/x86_64-linux-gnu/libcuda.so /usr/local/cuda/lib64/libcuda.so,软连接建立后,问题解决

@fairylulu
Copy link

方法很有效~感谢~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants