Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/Help] <title> RuntimeError: Library cudart is not initialized #115

Closed
1 task done
rogerrojur opened this issue Mar 17, 2023 · 38 comments
Closed
1 task done

[BUG/Help] <title> RuntimeError: Library cudart is not initialized #115

rogerrojur opened this issue Mar 17, 2023 · 38 comments

Comments

@rogerrojur
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:09<00:00, 1.14s/it]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /data/text2music/ChatGLM-6B/cli_demo1.py:5 in │
│ │
│ 2 from transformers import AutoTokenizer, AutoModel │
│ 3 │
│ 4 tokenizer = AutoTokenizer.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_rem │
│ ❱ 5 model = AutoModel.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_remote_code │
│ 6 model = model.eval() │
│ 7 │
│ 8 history = [] │
│ │
│ /home/user_00/.cache/huggingface/modules/transformers_modules/local/modeling_chatglm.py:1154 in │
│ quantize │
│ │
│ 1151 │ │
│ 1152 │ def quantize(self, bits: int): │
│ 1153 │ │ from .quantization import quantize │
│ ❱ 1154 │ │ self.transformer = quantize(self.transformer, bits) │
│ 1155 │ │ return self │
│ 1156 │
│ │
│ /home/user_00/.cache/huggingface/modules/transformers_modules/local/quantization.py:147 in │
│ quantize │
│ │
│ 144 │ """Replace fp16 linear with quantized linear""" │
│ 145 │ │
│ 146 │ for layer in model.layers: │
│ ❱ 147 │ │ layer.attention.query_key_value = QuantizedLinear( │
│ 148 │ │ │ weight_bit_width=weight_bit_width, │
│ 149 │ │ │ weight_tensor=layer.attention.query_key_value.weight.to(torch.cuda.current_d │
│ 150 │ │ │ bias_tensor=layer.attention.query_key_value.bias, │
│ │
│ /home/user_00/.cache/huggingface/modules/transformers_modules/local/quantization.py:130 in │
init
│ │
│ 127 │ │ │ self.weight_scale = (weight_tensor.abs().max(dim=-1).values / ((2 ** (weight │
│ 128 │ │ │ self.weight = torch.round(weight_tensor / self.weight_scale[:, None]).to(tor │
│ 129 │ │ │ if weight_bit_width == 4: │
│ ❱ 130 │ │ │ │ self.weight = compress_int4_weight(self.weight) │
│ 131 │ │ │
│ 132 │ │ self.weight = Parameter(self.weight.to(kwargs["device"]), requires_grad=False) │
│ 133 │ │ self.weight_scale = Parameter(self.weight_scale.to(kwargs["device"]), requires_g │
│ │
│ /home/user_00/.cache/huggingface/modules/transformers_modules/local/quantization.py:71 in │
│ compress_int4_weight │
│ │
│ 68 │ │ gridDim = (n, 1, 1) │
│ 69 │ │ blockDim = (min(round_up(m, 32), 1024), 1, 1) │
│ 70 │ │ │
│ ❱ 71 │ │ kernels.int4WeightCompression( │
│ 72 │ │ │ gridDim, │
│ 73 │ │ │ blockDim, │
│ 74 │ │ │ 0, │
│ │
│ /data/miniconda3/envs/GLM/lib/python3.8/site-packages/cpm_kernels/kernels/base.py:48 in call
│ │
│ 45 │ │ │ sharedMemBytes : int, stream : cudart.cudaStream_t, params : List[Any] ) -> │
│ 46 │ │ assert len(gridDim) == 3 │
│ 47 │ │ assert len(blockDim) == 3 │
│ ❱ 48 │ │ func = self._prepare_func() │
│ 49 │ │ │
│ 50 │ │ cuda.cuLaunchKernel(func, │
│ 51 │ │ │ gridDim[0], gridDim[1], gridDim[2], │
│ │
│ /data/miniconda3/envs/GLM/lib/python3.8/site-packages/cpm_kernels/kernels/base.py:36 in │
│ _prepare_func │
│ │
│ 33 │ │ self._func_name = func_name │
│ 34 │ │
│ 35 │ def _prepare_func(self): │
│ ❱ 36 │ │ curr_device = cudart.cudaGetDevice() │
│ 37 │ │ cudart.cudaSetDevice(curr_device) # ensure cudart context │
│ 38 │ │ if curr_device not in self._funcs: │
│ 39 │ │ │ self._funcs[curr_device] = cuda.cuModuleGetFunction( │
│ │
│ /data/miniconda3/envs/GLM/lib/python3.8/site-packages/cpm_kernels/library/base.py:72 in wrapper │
│ │
│ 69 │ │ │ def decorator(f): │
│ 70 │ │ │ │ @wraps(f) │
│ 71 │ │ │ │ def wrapper(*args, **kwargs): │
│ ❱ 72 │ │ │ │ │ raise RuntimeError("Library %s is not initialized" % self.__name) │
│ 73 │ │ │ │ return wrapper │
│ 74 │ │ │ return decorator │
│ 75 │ │ else: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Library cudart is not initialized

Expected Behavior

I just use the quantize function, to convert the model into int4. However, this exception appear. How could I fix this bug to successfully quantize this ChatGLM-6B?

Steps To Reproduce

import os
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_remote_code=True)
model = AutoModel.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_remote_code=True).half().quantize(4).cuda(device=2)

Environment

- OS: Ubuntu 20.04
- Python: 3.7
- Transformers: 4.26.1
- PyTorch: 1.13
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True

Anything else?

No response

@Adenialzz
Copy link

same problem. have you solved this?

@188080501
Copy link

检查本机cuda的安装是否正确,或者尝试添加下path到cuda的bin目录
我重装了cuda,设置了path后,问题解决,正常运行

@Chenny0808
Copy link

添加下path到cuda的bin目录,请问是什么path,项目path吗?

@mh739025250
Copy link

同样的问题

@AnduFalaH
Copy link

  1. 首先,在环境里找到torch库内nvrtc开头的一个链接库文件,比如我的是在windows平台、miniconda的环境里的C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll路径。不同平台应该都有所不同。
  2. 把这个文件所在目录加到PATH里。如果不希望污染操作系统的PATH,可以直接在开头import os之后直接加进去,例如:
    os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
  3. 然后打开应该就可以了。

English version(Translated by ChatGPT):

  1. First, find a library file starting with "nvrtc" in the torch library in your environment. For example, mine is located at the path C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll in a Windows platform with miniconda installed. The path may differ for different platforms.
  2. Add the directory where the file is located to your PATH. If you don't want to modify the PATH of your operating system, you can directly add it after importing os. For example:
    os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
  3. After doing this, it should work fine.

@mjysci
Copy link

mjysci commented Mar 24, 2023

如果用的是conda管理环境:
首先用conda list | grep cuda确定该环境cuda运行时版本,如11.7
然后从nvidia源安装cudatoolkit

conda install cudatoolkit=11.7 -c nvidia

@LucienShui
Copy link

如果用的是conda管理环境: 首先用conda list | grep cuda确定该环境cuda运行时版本,如11.7。 然后从nvidia源安装cudatoolkit

conda install cudatoolkit=11.7 -c nvidia

实测可以解决问题,环境

Windows 11 + WSL2 Debian
pytorch==2.0.0
transformers==4.26.1

@RRRoger
Copy link

RRRoger commented Mar 28, 2023

如果用的是conda管理环境: 首先用conda list | grep cuda确定该环境cuda运行时版本,如11.7。 然后从nvidia源安装cudatoolkit

conda install cudatoolkit=11.7 -c nvidia

it works, :)

@judgementc
Copy link

我在wsl2里面也遇到了相同的问题,按照微软的推荐未在wsl中设置任何cuda tookit,出现了上述错误“[RuntimeError: Library cudart is not initialized]"

@gg22mm
Copy link

gg22mm commented Apr 3, 2023

我也是一样的问题,上面讲我看都是扯淡, 压根就不是环境问题好么,怎么解决???????????:
好郁闷,写了几行代码这么多兼容问题~~
图片
图片

@flyingtimes
Copy link

我也遇到这个问题,找不到解决思路。目前通过在train的时候去掉 --quantization_bit 4 这个选项,放弃4bit量化可以跑通。

@weiliswen
Copy link

The same issue. How to fix it in ubuntu OS?

@gg22mm
Copy link

gg22mm commented Apr 14, 2023

目前通过在train的时候去掉 --quantization_bit 4 这个选项,放弃4bit量化可以跑通。

说得对 去掉--quantization_bit 4 确实是没这个报错了, 不知道官方有没有发现?

@gg22mm
Copy link

gg22mm commented Apr 14, 2023

还有就是预测也是一样的问题,预测还没没有这个参数

@yuquant
Copy link

yuquant commented Apr 18, 2023

很肯能是cuda版本和pytorch对应的cuda版本不同,我在windows安装的cuda版本是12,安装pytorch对应的cuda版本是11.8,然后就报了错,卸载cuda后安装11.8的cuda就可以了

@SeekPoint
Copy link

我也是一样的问题,上面讲我看都是扯淡, 压根就不是环境问题好么,怎么解决???????????: 好郁闷,写了几行代码这么多兼容问题~~ 图片 图片

me too

@529106896
Copy link

我也是一样的问题,上面讲我看都是扯淡, 压根就不是环境问题好么,怎么解决???????????: 好郁闷,写了几行代码这么多兼容问题~~ 图片 图片

me too

把--quantization_bit 4去掉试试

@bingoohe
Copy link

还有就是预测也是一样的问题,预测还没没有这个参数

推理时确实出现这个问题,我装了cudatoolkit也不行

@Richard-Ni
Copy link

Richard-Ni commented Apr 19, 2023

这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

@l3yx
Copy link

l3yx commented Apr 26, 2023

  1. 首先,在环境里找到torch库内nvrtc开头的一个链接库文件,比如我的是在windows平台、miniconda的环境里的C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll路径。不同平台应该都有所不同。
  2. 把这个文件所在目录加到PATH里。如果不希望污染操作系统的PATH,可以直接在开头import os之后直接加进去,例如:
    os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
  3. 然后打开应该就可以了。

English version(Translated by ChatGPT):

  1. First, find a library file starting with "nvrtc" in the torch library in your environment. For example, mine is located at the path C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll in a Windows platform with miniconda installed. The path may differ for different platforms.
  2. Add the directory where the file is located to your PATH. If you don't want to modify the PATH of your operating system, you can directly add it after importing os. For example:
    os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
  3. After doing this, it should work fine.

这个方法对我环境管用的,另外顺便提供一个通用代码:

import pkg_resources
import os
os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + pkg_resources.resource_filename('torch', 'lib')

@siyuan163
Copy link

这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

这个管用

@linuxdevopscn
Copy link

conda环境里安装你cuda 对应版本的 cuda-toolkit ,比如我是最新的cuda 12.1
conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit
https://anaconda.org/nvidia/cuda-toolkit

@GoldExperience
Copy link

GoldExperience commented May 25, 2023

@weiliswen

I tried the same way on ubuntu
conda install cudatoolkit=11.8 -c nvidia

working for me

@murainwood
Copy link

是这样的,直接搞定。
另外我的ubuntu 22.04还遇到了gcc编译时候问题 crti.o no such file or directory
用这样:
sudo apt install libc6=2.35-0ubuntu3
sudo apt install libc6-dev

@codingfun2022
Copy link

Linux 下可能可以这样解决,参考:
Support loading cuda libraries from nvidia package.
OpenBMB/cpm_kernels#8

@jushe
Copy link

jushe commented Jun 15, 2023

这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

有效,十分感谢

@ablozhou
Copy link

这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

The same env and encounter the same problem, and it works for me.
Thanks.

@njutsiang
Copy link

这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

正解!
如果是 Ubuntu 20.04,执行:
sudo apt install libcudart10.1 libcublaslt10

@KelvinJhu
Copy link

  1. 首先,在环境里找到torch库内nvrtc开头的一个链接库文件,比如我的是在windows平台、miniconda的环境里的C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll路径。不同平台应该都有所不同。
  2. 把这个文件所在目录加到PATH里。如果不希望污染操作系统的PATH,可以直接在开头import os之后直接加进去,例如:
    os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
  3. 然后打开应该就可以了。

English version(Translated by ChatGPT):

  1. First, find a library file starting with "nvrtc" in the torch library in your environment. For example, mine is located at the path C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll in a Windows platform with miniconda installed. The path may differ for different platforms.
  2. Add the directory where the file is located to your PATH. If you don't want to modify the PATH of your operating system, you can directly add it after importing os. For example:
    os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
  3. After doing this, it should work fine.

这个方法对我环境管用的,另外顺便提供一个通用代码:

import pkg_resources
import os
os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + pkg_resources.resource_filename('torch', 'lib')

这个解决了我的问题

@yzbx
Copy link

yzbx commented Jul 11, 2023

这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

正解! 如果是 Ubuntu 20.04,执行: sudo apt install libcudart10.1 libcublaslt10

版本要匹配,否则nvidia-smi 会出现 Failed to initialize NVML: Driver/library version mismatch 的问题

@doudoutiantian
Copy link

it works!

@zhangch9
Copy link

zhangch9 commented Jul 20, 2023

感谢大家提供的解决方案。

模型量化依赖cpm-kernelscpm-kernels调用了libcudart.so。可以通过以下代码检查libcudart.so是否存在:

python -c "import ctypes.util; print(ctypes.util.find_library('cudart'))"

如果返回None,需要手动安装cudatooklit,并可能需要修改环境变量LD_LIBRARY_PATH

@Vvegetables
Copy link

Linux 下可能可以这样解决,参考: Support loading cuda libraries from nvidia package. OpenBMB/cpm_kernels#8

it works for me!非常感谢

@YunfengCUI
Copy link

Linux 下可能可以这样解决,参考: Support loading cuda libraries from nvidia package. OpenBMB/cpm_kernels#8

解决了问题,多谢!

@LuckyFanpu
Copy link

我用的是Arch,系统的Cuda版本是12.2
conda中使用cudatoolkit-11.7.0时出现了这个错误,conda中最新的现在是cudatoolkit-11.8.0
使用 conda install cudatoolkit=11.8 -c nvidia后 work了!

@MikuGhoul
Copy link

补充一个说明。
根因原因是cudart库没有被cpm_kernels找到,还可能是nvrtc、cuda、cublasLt出错,也会报
RuntimeError: Library cudart/nvrtc/cuda/cublasLt is not initialized 这其中的一种错误。
具体可以在cpm_kernels的library/base.py文件中debug一下,在unix_find_lib函数中print一下lib_name,确认一下是去哪个目录寻找的lib文件。然后和自己安装的库目录对比一下是不是同一个目录,不在同一个目录的话创建一个软连接就可以了。

@0neday
Copy link

0neday commented Oct 21, 2023

sudo apt install libcudart12 libcublaslt12

for cuda 12 +

@jadegong
Copy link

我也遇到此问题,我的操作系统是Archlinux,使用quantize(4)报错,我通过单独安装cuda包解决问题:
sudo pacman -S cuda
packages: cuda-12.3.0-6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests