[BUG/Help] <title> RuntimeError: Library cudart is not initialized #115

rogerrojur · 2023-03-17T02:10:06Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:09<00:00, 1.14s/it]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /data/text2music/ChatGLM-6B/cli_demo1.py:5 in │
│ │
│ 2 from transformers import AutoTokenizer, AutoModel │
│ 3 │
│ 4 tokenizer = AutoTokenizer.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_rem │
│ ❱ 5 model = AutoModel.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_remote_code │
│ 6 model = model.eval() │
│ 7 │
│ 8 history = [] │
│ │
│ /home/user_00/.cache/huggingface/modules/transformers_modules/local/modeling_chatglm.py:1154 in │
│ quantize │
│ │
│ 1151 │ │
│ 1152 │ def quantize(self, bits: int): │
│ 1153 │ │ from .quantization import quantize │
│ ❱ 1154 │ │ self.transformer = quantize(self.transformer, bits) │
│ 1155 │ │ return self │
│ 1156 │
│ │
│ /home/user_00/.cache/huggingface/modules/transformers_modules/local/quantization.py:147 in │
│ quantize │
│ │
│ 144 │ """Replace fp16 linear with quantized linear""" │
│ 145 │ │
│ 146 │ for layer in model.layers: │
│ ❱ 147 │ │ layer.attention.query_key_value = QuantizedLinear( │
│ 148 │ │ │ weight_bit_width=weight_bit_width, │
│ 149 │ │ │ weight_tensor=layer.attention.query_key_value.weight.to(torch.cuda.current_d │
│ 150 │ │ │ bias_tensor=layer.attention.query_key_value.bias, │
│ │
│ /home/user_00/.cache/huggingface/modules/transformers_modules/local/quantization.py:130 in │
│ init │
│ │
│ 127 │ │ │ self.weight_scale = (weight_tensor.abs().max(dim=-1).values / ((2 ** (weight │
│ 128 │ │ │ self.weight = torch.round(weight_tensor / self.weight_scale[:, None]).to(tor │
│ 129 │ │ │ if weight_bit_width == 4: │
│ ❱ 130 │ │ │ │ self.weight = compress_int4_weight(self.weight) │
│ 131 │ │ │
│ 132 │ │ self.weight = Parameter(self.weight.to(kwargs["device"]), requires_grad=False) │
│ 133 │ │ self.weight_scale = Parameter(self.weight_scale.to(kwargs["device"]), requires_g │
│ │
│ /home/user_00/.cache/huggingface/modules/transformers_modules/local/quantization.py:71 in │
│ compress_int4_weight │
│ │
│ 68 │ │ gridDim = (n, 1, 1) │
│ 69 │ │ blockDim = (min(round_up(m, 32), 1024), 1, 1) │
│ 70 │ │ │
│ ❱ 71 │ │ kernels.int4WeightCompression( │
│ 72 │ │ │ gridDim, │
│ 73 │ │ │ blockDim, │
│ 74 │ │ │ 0, │
│ │
│ /data/miniconda3/envs/GLM/lib/python3.8/site-packages/cpm_kernels/kernels/base.py:48 in call │
│ │
│ 45 │ │ │ sharedMemBytes : int, stream : cudart.cudaStream_t, params : List[Any] ) -> │
│ 46 │ │ assert len(gridDim) == 3 │
│ 47 │ │ assert len(blockDim) == 3 │
│ ❱ 48 │ │ func = self._prepare_func() │
│ 49 │ │ │
│ 50 │ │ cuda.cuLaunchKernel(func, │
│ 51 │ │ │ gridDim[0], gridDim[1], gridDim[2], │
│ │
│ /data/miniconda3/envs/GLM/lib/python3.8/site-packages/cpm_kernels/kernels/base.py:36 in │
│ _prepare_func │
│ │
│ 33 │ │ self._func_name = func_name │
│ 34 │ │
│ 35 │ def _prepare_func(self): │
│ ❱ 36 │ │ curr_device = cudart.cudaGetDevice() │
│ 37 │ │ cudart.cudaSetDevice(curr_device) # ensure cudart context │
│ 38 │ │ if curr_device not in self._funcs: │
│ 39 │ │ │ self._funcs[curr_device] = cuda.cuModuleGetFunction( │
│ │
│ /data/miniconda3/envs/GLM/lib/python3.8/site-packages/cpm_kernels/library/base.py:72 in wrapper │
│ │
│ 69 │ │ │ def decorator(f): │
│ 70 │ │ │ │ @wraps(f) │
│ 71 │ │ │ │ def wrapper(*args, **kwargs): │
│ ❱ 72 │ │ │ │ │ raise RuntimeError("Library %s is not initialized" % self.__name) │
│ 73 │ │ │ │ return wrapper │
│ 74 │ │ │ return decorator │
│ 75 │ │ else: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Library cudart is not initialized

Expected Behavior

I just use the quantize function, to convert the model into int4. However, this exception appear. How could I fix this bug to successfully quantize this ChatGLM-6B?

Steps To Reproduce

import os
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_remote_code=True)
model = AutoModel.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_remote_code=True).half().quantize(4).cuda(device=2)

Environment

- OS: Ubuntu 20.04
- Python: 3.7
- Transformers: 4.26.1
- PyTorch: 1.13
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True

Anything else?

No response

The text was updated successfully, but these errors were encountered:

Adenialzz · 2023-03-17T03:47:40Z

same problem. have you solved this?

188080501 · 2023-03-17T10:34:26Z

检查本机cuda的安装是否正确，或者尝试添加下path到cuda的bin目录
我重装了cuda，设置了path后，问题解决，正常运行

Chenny0808 · 2023-03-18T05:23:15Z

添加下path到cuda的bin目录，请问是什么path，项目path吗？

mh739025250 · 2023-03-23T04:10:38Z

同样的问题

AnduFalaH · 2023-03-23T08:05:23Z

首先，在环境里找到torch库内nvrtc开头的一个链接库文件，比如我的是在windows平台、miniconda的环境里的C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll路径。不同平台应该都有所不同。
把这个文件所在目录加到PATH里。如果不希望污染操作系统的PATH，可以直接在开头import os之后直接加进去，例如：
os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
然后打开应该就可以了。

English version(Translated by ChatGPT):

First, find a library file starting with "nvrtc" in the torch library in your environment. For example, mine is located at the path C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll in a Windows platform with miniconda installed. The path may differ for different platforms.
Add the directory where the file is located to your PATH. If you don't want to modify the PATH of your operating system, you can directly add it after importing os. For example:
os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'
After doing this, it should work fine.

mjysci · 2023-03-24T13:39:06Z

如果用的是conda管理环境：
首先用conda list | grep cuda确定该环境cuda运行时版本，如11.7。
然后从nvidia源安装cudatoolkit：

conda install cudatoolkit=11.7 -c nvidia

LucienShui · 2023-03-24T17:10:22Z

如果用的是conda管理环境：首先用conda list | grep cuda确定该环境cuda运行时版本，如11.7。然后从nvidia源安装cudatoolkit：
conda install cudatoolkit=11.7 -c nvidia

实测可以解决问题，环境

Windows 11 + WSL2 Debian
pytorch==2.0.0
transformers==4.26.1

RRRoger · 2023-03-28T14:18:11Z

如果用的是conda管理环境：首先用conda list | grep cuda确定该环境cuda运行时版本，如11.7。然后从nvidia源安装cudatoolkit：
conda install cudatoolkit=11.7 -c nvidia

it works, :)

judgementc · 2023-04-01T12:28:49Z

我在wsl2里面也遇到了相同的问题，按照微软的推荐未在wsl中设置任何cuda tookit，出现了上述错误“[RuntimeError: Library cudart is not initialized]"

gg22mm · 2023-04-03T07:41:49Z

我也是一样的问题，上面讲我看都是扯淡，压根就不是环境问题好么，怎么解决？？？？？？？？？？？：
好郁闷，写了几行代码这么多兼容问题~~

flyingtimes · 2023-04-03T22:28:54Z

我也遇到这个问题，找不到解决思路。目前通过在train的时候去掉 --quantization_bit 4 这个选项，放弃4bit量化可以跑通。

weiliswen · 2023-04-04T11:35:25Z

The same issue. How to fix it in ubuntu OS?

gg22mm · 2023-04-14T08:35:53Z

目前通过在train的时候去掉 --quantization_bit 4 这个选项，放弃4bit量化可以跑通。

说得对去掉--quantization_bit 4 确实是没这个报错了，不知道官方有没有发现？

gg22mm · 2023-04-14T08:36:54Z

还有就是预测也是一样的问题，预测还没没有这个参数

yuquant · 2023-04-18T00:45:40Z

很肯能是cuda版本和pytorch对应的cuda版本不同，我在windows安装的cuda版本是12，安装pytorch对应的cuda版本是11.8，然后就报了错，卸载cuda后安装11.8的cuda就可以了

SeekPoint · 2023-04-18T05:14:34Z

我也是一样的问题，上面讲我看都是扯淡，压根就不是环境问题好么，怎么解决？？？？？？？？？？？：好郁闷，写了几行代码这么多兼容问题~~

me too

529106896 · 2023-04-18T08:23:00Z

我也是一样的问题，上面讲我看都是扯淡，压根就不是环境问题好么，怎么解决？？？？？？？？？？？：好郁闷，写了几行代码这么多兼容问题~~

me too

把--quantization_bit 4去掉试试

bingoohe · 2023-04-19T02:23:26Z

还有就是预测也是一样的问题，预测还没没有这个参数

推理时确实出现这个问题，我装了cudatoolkit也不行

Richard-Ni · 2023-04-19T14:42:46Z

这个问题是因为缺少必要的动态库导致的，Ubuntu 22.04 下执行

sudo apt install libcudart11.0 libcublaslt11

其他 Linux 环境可以参考查找对应的库解决

l3yx · 2023-04-26T01:42:03Z

首先，在环境里找到torch库内nvrtc开头的一个链接库文件，比如我的是在windows平台、miniconda的环境里的C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll路径。不同平台应该都有所不同。

把这个文件所在目录加到PATH里。如果不希望污染操作系统的PATH，可以直接在开头import os之后直接加进去，例如：
os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'

然后打开应该就可以了。

English version(Translated by ChatGPT):

First, find a library file starting with "nvrtc" in the torch library in your environment. For example, mine is located at the path C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll in a Windows platform with miniconda installed. The path may differ for different platforms.

Add the directory where the file is located to your PATH. If you don't want to modify the PATH of your operating system, you can directly add it after importing os. For example:
os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'

After doing this, it should work fine.

这个方法对我环境管用的，另外顺便提供一个通用代码：

import pkg_resources
import os
os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + pkg_resources.resource_filename('torch', 'lib')

siyuan163 · 2023-05-15T03:20:46Z

这个问题是因为缺少必要的动态库导致的，Ubuntu 22.04 下执行
sudo apt install libcudart11.0 libcublaslt11
其他 Linux 环境可以参考查找对应的库解决

这个管用

linuxdevopscn · 2023-05-25T06:29:29Z

conda环境里安装你cuda 对应版本的 cuda-toolkit ，比如我是最新的cuda 12.1
conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit
https://anaconda.org/nvidia/cuda-toolkit

GoldExperience · 2023-05-25T10:58:58Z

@weiliswen

I tried the same way on ubuntu
conda install cudatoolkit=11.8 -c nvidia

working for me

murainwood · 2023-06-01T10:54:29Z

是这样的，直接搞定。
另外我的ubuntu 22.04还遇到了gcc编译时候问题 crti.o no such file or directory
用这样:
sudo apt install libc6=2.35-0ubuntu3
sudo apt install libc6-dev

codingfun2022 · 2023-06-14T04:04:51Z

Linux 下可能可以这样解决，参考：
Support loading cuda libraries from nvidia package.
OpenBMB/cpm_kernels#8

jushe · 2023-06-15T16:03:35Z

这个问题是因为缺少必要的动态库导致的，Ubuntu 22.04 下执行
sudo apt install libcudart11.0 libcublaslt11
其他 Linux 环境可以参考查找对应的库解决

有效，十分感谢

ablozhou · 2023-06-26T09:19:46Z

这个问题是因为缺少必要的动态库导致的，Ubuntu 22.04 下执行
sudo apt install libcudart11.0 libcublaslt11
其他 Linux 环境可以参考查找对应的库解决

The same env and encounter the same problem, and it works for me.
Thanks.

njutsiang · 2023-07-01T05:13:56Z

这个问题是因为缺少必要的动态库导致的，Ubuntu 22.04 下执行
sudo apt install libcudart11.0 libcublaslt11
其他 Linux 环境可以参考查找对应的库解决

正解！
如果是 Ubuntu 20.04，执行：
sudo apt install libcudart10.1 libcublaslt10

KelvinJhu · 2023-07-07T03:58:50Z

首先，在环境里找到torch库内nvrtc开头的一个链接库文件，比如我的是在windows平台、miniconda的环境里的C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll路径。不同平台应该都有所不同。

把这个文件所在目录加到PATH里。如果不希望污染操作系统的PATH，可以直接在开头import os之后直接加进去，例如：
os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'

然后打开应该就可以了。

English version(Translated by ChatGPT):

First, find a library file starting with "nvrtc" in the torch library in your environment. For example, mine is located at the path C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib\nvrtc64_112_0.dll in a Windows platform with miniconda installed. The path may differ for different platforms.

Add the directory where the file is located to your PATH. If you don't want to modify the PATH of your operating system, you can directly add it after importing os. For example:
os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + r'C:\ProgramData\miniconda3\envs\ChatGLM-6B\Lib\site-packages\torch\lib'

After doing this, it should work fine.

这个方法对我环境管用的，另外顺便提供一个通用代码：
import pkg_resources
import os
os.environ['PATH'] = os.environ.get("PATH", "") + os.pathsep + pkg_resources.resource_filename('torch', 'lib')

这个解决了我的问题

yzbx · 2023-07-11T02:38:17Z

这个问题是因为缺少必要的动态库导致的，Ubuntu 22.04 下执行
sudo apt install libcudart11.0 libcublaslt11
其他 Linux 环境可以参考查找对应的库解决
正解！如果是 Ubuntu 20.04，执行： sudo apt install libcudart10.1 libcublaslt10

版本要匹配，否则nvidia-smi 会出现 Failed to initialize NVML: Driver/library version mismatch 的问题

doudoutiantian · 2023-07-19T12:51:29Z

it works!

zhangch9 · 2023-07-20T04:52:15Z

感谢大家提供的解决方案。

模型量化依赖cpm-kernels，cpm-kernels调用了libcudart.so。可以通过以下代码检查libcudart.so是否存在：

python -c "import ctypes.util; print(ctypes.util.find_library('cudart'))"

如果返回None，需要手动安装cudatooklit，并可能需要修改环境变量LD_LIBRARY_PATH。

Vvegetables · 2023-08-06T15:36:28Z

Linux 下可能可以这样解决，参考： Support loading cuda libraries from nvidia package. OpenBMB/cpm_kernels#8

it works for me！非常感谢

YunfengCUI · 2023-08-23T12:25:10Z

Linux 下可能可以这样解决，参考： Support loading cuda libraries from nvidia package. OpenBMB/cpm_kernels#8

解决了问题，多谢！

LuckyFanpu · 2023-08-28T08:34:25Z

我用的是Arch，系统的Cuda版本是12.2
conda中使用cudatoolkit-11.7.0时出现了这个错误，conda中最新的现在是cudatoolkit-11.8.0
使用 conda install cudatoolkit=11.8 -c nvidia后 work了！

MikuGhoul · 2023-09-20T07:01:07Z

补充一个说明。
根因原因是cudart库没有被cpm_kernels找到，还可能是nvrtc、cuda、cublasLt出错，也会报
RuntimeError: Library cudart/nvrtc/cuda/cublasLt is not initialized 这其中的一种错误。
具体可以在cpm_kernels的library/base.py文件中debug一下，在unix_find_lib函数中print一下lib_name，确认一下是去哪个目录寻找的lib文件。然后和自己安装的库目录对比一下是不是同一个目录，不在同一个目录的话创建一个软连接就可以了。

0neday · 2023-10-21T07:14:15Z

sudo apt install libcudart12 libcublaslt12

for cuda 12 +

jadegong · 2023-11-29T02:56:31Z

我也遇到此问题，我的操作系统是Archlinux，使用quantize(4)报错，我通过单独安装cuda包解决问题：
sudo pacman -S cuda
packages: cuda-12.3.0-6

duzx16 mentioned this issue Apr 12, 2023

[BUG/Help] 错误 Library cudart is not initialized #489

Closed

1 task

zhangch9 closed this as completed Jul 20, 2023

[BUG/Help] <title> RuntimeError: Library cudart is not initialized #115

[BUG/Help] <title> RuntimeError: Library cudart is not initialized #115

Comments

rogerrojur commented Mar 17, 2023

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?

Adenialzz commented Mar 17, 2023

188080501 commented Mar 17, 2023

Chenny0808 commented Mar 18, 2023

mh739025250 commented Mar 23, 2023

AnduFalaH commented Mar 23, 2023

mjysci commented Mar 24, 2023

LucienShui commented Mar 24, 2023

RRRoger commented Mar 28, 2023

judgementc commented Apr 1, 2023

gg22mm commented Apr 3, 2023 • edited Loading

flyingtimes commented Apr 3, 2023

weiliswen commented Apr 4, 2023

gg22mm commented Apr 14, 2023

gg22mm commented Apr 14, 2023

yuquant commented Apr 18, 2023

SeekPoint commented Apr 18, 2023

529106896 commented Apr 18, 2023

bingoohe commented Apr 19, 2023

Richard-Ni commented Apr 19, 2023 • edited Loading

l3yx commented Apr 26, 2023

siyuan163 commented May 15, 2023

linuxdevopscn commented May 25, 2023

GoldExperience commented May 25, 2023 • edited Loading

murainwood commented Jun 1, 2023

codingfun2022 commented Jun 14, 2023

jushe commented Jun 15, 2023

ablozhou commented Jun 26, 2023

njutsiang commented Jul 1, 2023

KelvinJhu commented Jul 7, 2023

yzbx commented Jul 11, 2023

doudoutiantian commented Jul 19, 2023

zhangch9 commented Jul 20, 2023 • edited Loading

Vvegetables commented Aug 6, 2023

YunfengCUI commented Aug 23, 2023

LuckyFanpu commented Aug 28, 2023

MikuGhoul commented Sep 20, 2023

0neday commented Oct 21, 2023

jadegong commented Nov 29, 2023

gg22mm commented Apr 3, 2023 •

edited

Loading

Richard-Ni commented Apr 19, 2023 •

edited

Loading

GoldExperience commented May 25, 2023 •

edited

Loading

zhangch9 commented Jul 20, 2023 •

edited

Loading