Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Library cublasLt is not initialized[BUG/Help] <title> #465

Closed
1 task done
z1968357787 opened this issue Apr 8, 2023 · 16 comments
Closed
1 task done

Comments

@z1968357787
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

运行train.sh时,就报了这个错误:
RuntimeError: Library cublasLt is not initialized

Expected Behavior

No response

Steps To Reproduce

运行train.sh

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
与requirements.txt一致

Anything else?

No response

@sdhou
Copy link

sdhou commented Apr 15, 2023

我也遇到了同样的问题 cli_demo web_demo 没问题 也能吃上gpu 跑 train就报这错 3090 24G显卡
windows环境 pytorch==2.0.0

@gg22mm
Copy link

gg22mm commented Apr 15, 2023

我也遇到了同样的问题 cli_demo web_demo 没问题 也能吃上gpu 跑 train就报这错 3090 24G显卡 windows环境 pytorch==2.0.0

老问题了,关注那么久也没看到有个说法

@JingyangXu1991
Copy link

我也遇到了同样的问题 cli_demo web_demo 没问题 也能吃上gpu 跑 train就报这错 3090 24G显卡 windows环境 pytorch==2.0.0

我也是一样的问题,cuda版本11.2 pytorch==1.13.1

@duzx16
Copy link
Member

duzx16 commented Apr 17, 2023

我也遇到了同样的问题 cli_demo web_demo 没问题 也能吃上gpu 跑 train就报这错 3090 24G显卡 windows环境 pytorch==2.0.0

我也是一样的问题,cuda版本11.2 pytorch==1.13.1

你试一下安装 cudatoolkit

@JingyangXu1991
Copy link

我也遇到了同样的问题 cli_demo web_demo 没问题 也能吃上gpu 跑 train就报这错 3090 24G显卡 windows环境 pytorch==2.0.0

我也是一样的问题,cuda版本11.2 pytorch==1.13.1

你试一下安装 cudatoolkit

有对应版本限制吗

@JingyangXu1991
Copy link

我也遇到了同样的问题 cli_demo web_demo 没问题 也能吃上gpu 跑 train就报这错 3090 24G显卡 windows环境 pytorch==2.0.0

我也是一样的问题,cuda版本11.2 pytorch==1.13.1

你试一下安装 cudatoolkit
解决了,确实是cudatoolkit的问题,我conda环境里面cuda版本指定错了

@z1968357787
Copy link
Author

我的原本也有cudatoolkit了,服务器CUDA版本是11.3,我安装的cudatoolkit也是11.3,也还是跑不了,报错还是同样的错

@mxzgn
Copy link

mxzgn commented Apr 20, 2023

cuda11.2 在conda模式安装虚拟环境,同样的错误

@JingyangXu1991
Copy link

cuda11.2 在conda模式安装虚拟环境,同样的错误
vim ~/.bashrc 看下里面指定的cuda版本对不对

@JingyangXu1991
Copy link

我的原本也有cudatoolkit了,服务器CUDA版本是11.3,我安装的cudatoolkit也是11.3,也还是跑不了,报错还是同样的错

vim ~/.bashrc 看下里面指定的cuda版本对不对

@z1968357787
Copy link
Author

我的问题解决了,参考这篇博客的依赖,用项目配套的依赖会报各种CUDA错误https://blog.csdn.net/v_JULY_v/article/details/129880836

@runeq99
Copy link

runeq99 commented May 8, 2023

Same problem with latest packages in win10. My solution Add the directory containing the cublasLt DLL into system environment variables

Detailed debug process

OS: win10
Python: 3.10
Packages: (all latest versions instead of what mentioned in the repo)

run the code in cmd python web_demo.py

got

│ D:\software\miniconda3\envs\py310\lib\site-packages\cpm_kernels\library\base.py:72 in wrapper    │
│                                                                                                  │
│   69 │   │   │   def decorator(f):                                                               │
│   70 │   │   │   │   @wraps(f)                                                                   │
│   71 │   │   │   │   def wrapper(*args, **kwargs):                                               │
│ ❱ 72 │   │   │   │   │   raise RuntimeError("Library %s is not initialized" % self.__name)       │
│   73 │   │   │   │   return wrapper                                                              │
│   74 │   │   │   return decorator                                                                │
│   75 │   │   else:                                                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Library cublasLt is not initialized

locate the code.

# D:\software\miniconda3\envs\py310\lib\site-packages\cpm_kernels\library\base.py
class Lib:
    def __init__(self, name):
        self.__name = name
        print(self.__name)
        if sys.platform.startswith("win"):
            lib_path = windows_find_lib(self.__name)
            self.__lib_path = lib_path
            print(self.__lib_path)
            if lib_path is not None:
                self.__lib = ctypes.WinDLL(lib_path)
                print(self.__lib)
            else:
                self.__lib = None
        elif sys.platform.startswith("linux"):
            lib_path = unix_find_lib(self.__name)
            self.__lib_path = lib_path
            if lib_path is not None:
                self.__lib = ctypes.cdll.LoadLibrary(lib_path)
            else:
                self.__lib = None
        else:
            raise RuntimeError("Unknown platform: %s" % sys.platform)
            
    ...
    def bind(self, name, arg_types, ret_type) -> Callable[[LibCall], LibCall]:
        if self.__lib is None:
            def decorator(f):
                @wraps(f)
                def wrapper(*args, **kwargs):
                    raise RuntimeError("Library %s is not initialized" % self.__name)
                return wrapper
            return decorator

and use the simple print method to debug (debug in IDE comsume too much memory). It is easy to find that the program can't find the cublasLt DLL. So I just simply add the corresponding path of the DLL to the system environment variable. Specifically, the name of DLL is cublasLt64_11.dll and the dirctory I add to the system environment variable is D:\software\miniconda3\envs\py310\Lib\site-packages\torch\lib.

theoretically, the following solutions can work too. (But I'm too lazy to give them a try)

In short,

  1. search the mising DLL like cublasLt in everything
  2. if exists. Add the directory that contains it to the system environment variable.
  3. if not exits. install cudatoolkit

@zhaozhiming
Copy link

我是ubuntu22.04,也是报这个错,但我看我上面是装了cuda的.

$ apt list --installed | grep cuda
cuda-compat-11-7/now 515.86.01-1 amd64 [installed,local]
cuda-cudart-11-7/now 11.7.99-1 amd64 [installed,local]
cuda-keyring/now 1.0-1 all [installed,local]
cuda-toolkit-11-7-config-common/now 11.7.99-1 all [installed,local]
cuda-toolkit-11-config-common/now 11.8.89-1 all [installed,local]
cuda-toolkit-config-common/now 12.0.107-1 all [installed,local]

@joe0327
Copy link

joe0327 commented May 23, 2023

  1. apt list | grep cuda 看看cuda版本
  2. cuda版本不对的话,重装一下cuda,我是重装cuda-11.7就可以正常使用了

@twang2218
Copy link

我也碰到了这个问题,我是在跑 THUDM/chatglm-6b-int4的时候碰到问题。结果我发现是 cuda 版本太低所致。我之前是跑在 cuda-10.0 上,当我升级到 cuda-12.0 后,问题就解决了。结合之前有的人提到 11.2,11.3 跑不了,后面提到重装 11.7 可以正常使用的情况。那么这个报错会不会是因为它的代码对系统依赖上要求大于11.x 的某个版本的问题所致?大家可以尝试升级一下自己的cuda版本,看看是不是会解决这个问题。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

13 participants