Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Unable to proceed, no GPU resources available #33

Open
louxingrui opened this issue Jan 31, 2022 · 2 comments
Open

RuntimeError: Unable to proceed, no GPU resources available #33

louxingrui opened this issue Jan 31, 2022 · 2 comments

Comments

@louxingrui
Copy link

当我使用bash scripts/full_model/finetune_cpm2_math.sh后,显示RuntimeError: Unable to proceed, no GPU resources available,我的显卡是rtx2080Ti,安装了cuda10.2,在docker环境外跑程序是没有问题的,请问是因为cuda版本和docker环境内的版本不一致的问题吗?这是终端中一些错误的主要信息:
[2022-01-31 14:07:27,900] [WARNING] [runner.py:117:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:100.) return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last): File "/opt/conda/bin/deepspeed", line 6, in <module> main() File "/opt/conda/lib/python3.8/site-packages/deepspeed/launcher/runner.py", line 264, in main raise RuntimeError("Unable to proceed, no GPU resources available")
RuntimeError: Unable to proceed, no GPU resources available
希望能得到您的答复!

@XiaoqingNLP
Copy link

检查一下hostfile 文件,再检查一下GPU是否正确安装

@t1101675
Copy link
Contributor

Hostfile 文件需要包含主机 ssh 时的名称或者 ip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants