-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG/Help] <title> RuntimeError: Library cudart is not initialized #115
Comments
same problem. have you solved this? |
检查本机cuda的安装是否正确,或者尝试添加下path到cuda的bin目录 |
添加下path到cuda的bin目录,请问是什么path,项目path吗? |
同样的问题 |
English version(Translated by ChatGPT):
|
如果用的是conda管理环境:
|
实测可以解决问题,环境
|
it works, :) |
我在wsl2里面也遇到了相同的问题,按照微软的推荐未在wsl中设置任何cuda tookit,出现了上述错误“[RuntimeError: Library cudart is not initialized]" |
我也遇到这个问题,找不到解决思路。目前通过在train的时候去掉 --quantization_bit 4 这个选项,放弃4bit量化可以跑通。 |
The same issue. How to fix it in ubuntu OS? |
说得对 去掉--quantization_bit 4 确实是没这个报错了, 不知道官方有没有发现? |
还有就是预测也是一样的问题,预测还没没有这个参数 |
很肯能是cuda版本和pytorch对应的cuda版本不同,我在windows安装的cuda版本是12,安装pytorch对应的cuda版本是11.8,然后就报了错,卸载cuda后安装11.8的cuda就可以了 |
推理时确实出现这个问题,我装了cudatoolkit也不行 |
这个问题是因为缺少必要的动态库导致的,Ubuntu 22.04 下执行
其他 Linux 环境可以参考查找对应的库解决 |
这个方法对我环境管用的,另外顺便提供一个通用代码:
|
这个管用 |
conda环境里安装你cuda 对应版本的 cuda-toolkit ,比如我是最新的cuda 12.1 |
I tried the same way on ubuntu working for me |
是这样的,直接搞定。 |
Linux 下可能可以这样解决,参考: |
有效,十分感谢 |
The same env and encounter the same problem, and it works for me. |
正解! |
这个解决了我的问题 |
版本要匹配,否则nvidia-smi 会出现 |
it works! |
感谢大家提供的解决方案。 模型量化依赖 python -c "import ctypes.util; print(ctypes.util.find_library('cudart'))" 如果返回 |
it works for me!非常感谢 |
解决了问题,多谢! |
我用的是Arch,系统的Cuda版本是12.2 |
补充一个说明。 |
for cuda 12 + |
我也遇到此问题,我的操作系统是Archlinux,使用quantize(4)报错,我通过单独安装cuda包解决问题: |
Is there an existing issue for this?
Current Behavior
Explicitly passing a
revision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.Explicitly passing a
revision
is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.Explicitly passing a
revision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:09<00:00, 1.14s/it]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /data/text2music/ChatGLM-6B/cli_demo1.py:5 in │
│ │
│ 2 from transformers import AutoTokenizer, AutoModel │
│ 3 │
│ 4 tokenizer = AutoTokenizer.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_rem │
│ ❱ 5 model = AutoModel.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_remote_code │
│ 6 model = model.eval() │
│ 7 │
│ 8 history = [] │
│ │
│ /home/user_00/.cache/huggingface/modules/transformers_modules/local/modeling_chatglm.py:1154 in │
│ quantize │
│ │
│ 1151 │ │
│ 1152 │ def quantize(self, bits: int): │
│ 1153 │ │ from .quantization import quantize │
│ ❱ 1154 │ │ self.transformer = quantize(self.transformer, bits) │
│ 1155 │ │ return self │
│ 1156 │
│ │
│ /home/user_00/.cache/huggingface/modules/transformers_modules/local/quantization.py:147 in │
│ quantize │
│ │
│ 144 │ """Replace fp16 linear with quantized linear""" │
│ 145 │ │
│ 146 │ for layer in model.layers: │
│ ❱ 147 │ │ layer.attention.query_key_value = QuantizedLinear( │
│ 148 │ │ │ weight_bit_width=weight_bit_width, │
│ 149 │ │ │ weight_tensor=layer.attention.query_key_value.weight.to(torch.cuda.current_d │
│ 150 │ │ │ bias_tensor=layer.attention.query_key_value.bias, │
│ │
│ /home/user_00/.cache/huggingface/modules/transformers_modules/local/quantization.py:130 in │
│ init │
│ │
│ 127 │ │ │ self.weight_scale = (weight_tensor.abs().max(dim=-1).values / ((2 ** (weight │
│ 128 │ │ │ self.weight = torch.round(weight_tensor / self.weight_scale[:, None]).to(tor │
│ 129 │ │ │ if weight_bit_width == 4: │
│ ❱ 130 │ │ │ │ self.weight = compress_int4_weight(self.weight) │
│ 131 │ │ │
│ 132 │ │ self.weight = Parameter(self.weight.to(kwargs["device"]), requires_grad=False) │
│ 133 │ │ self.weight_scale = Parameter(self.weight_scale.to(kwargs["device"]), requires_g │
│ │
│ /home/user_00/.cache/huggingface/modules/transformers_modules/local/quantization.py:71 in │
│ compress_int4_weight │
│ │
│ 68 │ │ gridDim = (n, 1, 1) │
│ 69 │ │ blockDim = (min(round_up(m, 32), 1024), 1, 1) │
│ 70 │ │ │
│ ❱ 71 │ │ kernels.int4WeightCompression( │
│ 72 │ │ │ gridDim, │
│ 73 │ │ │ blockDim, │
│ 74 │ │ │ 0, │
│ │
│ /data/miniconda3/envs/GLM/lib/python3.8/site-packages/cpm_kernels/kernels/base.py:48 in call │
│ │
│ 45 │ │ │ sharedMemBytes : int, stream : cudart.cudaStream_t, params : List[Any] ) -> │
│ 46 │ │ assert len(gridDim) == 3 │
│ 47 │ │ assert len(blockDim) == 3 │
│ ❱ 48 │ │ func = self._prepare_func() │
│ 49 │ │ │
│ 50 │ │ cuda.cuLaunchKernel(func, │
│ 51 │ │ │ gridDim[0], gridDim[1], gridDim[2], │
│ │
│ /data/miniconda3/envs/GLM/lib/python3.8/site-packages/cpm_kernels/kernels/base.py:36 in │
│ _prepare_func │
│ │
│ 33 │ │ self._func_name = func_name │
│ 34 │ │
│ 35 │ def _prepare_func(self): │
│ ❱ 36 │ │ curr_device = cudart.cudaGetDevice() │
│ 37 │ │ cudart.cudaSetDevice(curr_device) # ensure cudart context │
│ 38 │ │ if curr_device not in self._funcs: │
│ 39 │ │ │ self._funcs[curr_device] = cuda.cuModuleGetFunction( │
│ │
│ /data/miniconda3/envs/GLM/lib/python3.8/site-packages/cpm_kernels/library/base.py:72 in wrapper │
│ │
│ 69 │ │ │ def decorator(f): │
│ 70 │ │ │ │ @wraps(f) │
│ 71 │ │ │ │ def wrapper(*args, **kwargs): │
│ ❱ 72 │ │ │ │ │ raise RuntimeError("Library %s is not initialized" % self.__name) │
│ 73 │ │ │ │ return wrapper │
│ 74 │ │ │ return decorator │
│ 75 │ │ else: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Library cudart is not initialized
Expected Behavior
I just use the quantize function, to convert the model into int4. However, this exception appear. How could I fix this bug to successfully quantize this ChatGLM-6B?
Steps To Reproduce
import os
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_remote_code=True)
model = AutoModel.from_pretrained("/data/text2music/ChatGLM-6B/local", trust_remote_code=True).half().quantize(4).cuda(device=2)
Environment
Anything else?
No response
The text was updated successfully, but these errors were encountered: