Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Not compiled with GPU support #82

Open
KiedaTamashi opened this issue Sep 22, 2020 · 11 comments
Open

RuntimeError: Not compiled with GPU support #82

KiedaTamashi opened this issue Sep 22, 2020 · 11 comments

Comments

@KiedaTamashi
Copy link

I get this error when running testcuda.py on Linux server.

I test torch.cuda.available() and get True.
My cuda version: 10.1
My torch version: 1.4
My python version: 3.6.9

It seems built successfully:
copying build/lib.linux-x86_64-3.6/_ext.cpython-36m-x86_64-linux-gnu.so ->


#!/bin/bash
Creating /NAS/home01/tanzhenwei/.pyenv/versions/3.6.9/envs/tzwpy/lib/python3.6/site-packages/DCNv2.egg-link (link to .)
DCNv2 0.1 is already the active version in easy-install.pth


Installed /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new
Processing dependencies for DCNv2==0.1
Finished processing dependencies for DCNv2==0.1

But get error when testing
True /usr/local/cuda
Traceback (most recent call last):
File "/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/testcuda.py", line 255, in
example_dconv()
File "/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/testcuda.py", line 175, in example_dconv
output = dcn(input)
File "/NAS/home01/tanzhenwei/.pyenv/versions/tzwpy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/dcn_v2.py", line 128, in forward
self.deformable_groups)
File "/NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/dcn_v2.py", line 31, in forward
ctx.deformable_groups)
RuntimeError: Not compiled with GPU support (dcn_v2_forward at /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/src/dcn_v2.h:35)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f5f96ec4193 in /NAS/home01/tanzhenwei/.pyenv/versions/tzwpy/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: dcn_v2_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0x157 (0x7f5f91a755d7 in /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/_ext.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x17504 (0x7f5f91a82504 in /NAS/project01/rzimmerm_substitles/FairMot_compressing/src/lib/models/networks/DCNv2_new/_ext.cpython-36m-x86_64-linux-gnu.so)
...

Could you help me solve this or give some ideas?

@shafu0x
Copy link

shafu0x commented Oct 3, 2020

I have the same problem. Did anyone fix it?

@wenjiey2
Copy link

Were you able to solve this issue? I am facing the same problem with pytorch 1.4.0-py3.6_cuda101_cudnn7_0 and torchvision 0.5.0-py36_cu101. It is invoked by _backend.dcn_v2_forward where _backend should be _ext built from make.sh. I'm not sure if _ext refers to this _ext.cp36-win_amd64.pyd file. Not sure how to proceed from here.

@KiedaTamashi
Copy link
Author

@wenjiey2 @SharifElfouly Hi, I have fixed it. The situation for me is that I was using a virtual env and try to run it in the computing node by submitting a task to the server. But I install the env when using nodes without GPU and get this error.

Therefore, I solved it by installing everything, including the virtual env, in the node with GPU and it works.

@suniash
Copy link

suniash commented Feb 10, 2021

@XiaoSanGit ..thank-you can you please explain in detail how to solve this issue?

@allenwu5
Copy link

allenwu5 commented Feb 25, 2021

I resolved this issue by forcing python setup.py build develop go through

DCNv2/setup.py

Lines 34 to 42 in c7f778f

extension = CUDAExtension
sources += source_cuda
define_macros += [("WITH_CUDA", None)]
extra_compile_args["nvcc"] = [
"-DCUDA_HAS_FP16=1",
"-D__CUDA_NO_HALF_OPERATORS__",
"-D__CUDA_NO_HALF_CONVERSIONS__",
"-D__CUDA_NO_HALF2_OPERATORS__",
]

@fabrizioschiano
Copy link

I resolved this issue by forcing python setup.py build develop go through

DCNv2/setup.py

Lines 34 to 42 in c7f778f

extension = CUDAExtension
sources += source_cuda
define_macros += [("WITH_CUDA", None)]
extra_compile_args["nvcc"] = [
"-DCUDA_HAS_FP16=1",
"-D__CUDA_NO_HALF_OPERATORS__",
"-D__CUDA_NO_HALF_CONVERSIONS__",
"-D__CUDA_NO_HALF2_OPERATORS__",
]

@allenwu5 , thanks for posting your solution. I tried to replicate it and understood that the problem is the following (at least for me):

torch.cuda.is_available():  True
CUDA_HOME:  None

Therefore, if I just force the code to go through that loop (by removing the and CUDA_HOME is not None) I have another error:

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
I am currently trying to understand how to correctly set the CUDA_HOME variable. If, in the meantime, you have time to give us more details it would be helpful.

@fabrizioschiano
Copy link

fabrizioschiano commented Oct 7, 2021

After some research, I understood that the problem was that I actually did not have CUDA installed.

You can find it out by doing:
nvcc –V

If nothing is returned it means that you did not install CUDA

I followed all this:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/

And I installed CUDA with the following official link

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=deb_local

After what's explained above, I did installed the nvidia-development-kit simply with

sudo apt install nvidia-cuda-toolkit

Then you can do:

export CUDA_HOME=/usr/local/cuda-11

(before doing it you should check that this is the folder in which CUDA has been installed on your machine)

I hope this helps someone else in the same situation.

@jatinkatyal
Copy link

After some research, I understood that the problem was that I actually did not have CUDA installed.

You can find it out by doing: nvcc –V

If nothing is returned it means that you did not install CUDA

I followed all this:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/

And I installed CUDA with the following official link

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=deb_local

After what's explained above, I did installed the nvidia-development-kit simply with

sudo apt install nvidia-cuda-toolkit

Then you can do:

export CUDA_HOME=/usr/local/cuda-11

(before doing it you should check that this is the folder in which CUDA has been installed on your machine)

I hope this helps someone else in the same situation.

I am in something deeper, can you help?

$ nvcc -V
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

$ conda activate compvis36
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Aug_15_21:14:11_PDT_2021
Cuda compilation tools, release 11.4, V11.4.120
Build cuda_11.4.r11.4/compiler.30300941_0

With/without the environment active when I type in

$ sudo apt install nvidia-cuda-toolkit
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-cuda-toolkit : Depends: nvidia-cuda-dev (= 10.1.243-3) but it is not going to be installed
                       Recommends: nsight-compute (= 10.1.243-3)
                       Recommends: nsight-systems (= 10.1.243-3)
E: Unable to correct problems, you have held broken packages.

Any tips on how can I get the nvidia-cuda-toolkit for version 11.4.?
Those listed on https://packages.ubuntu.com/search?keywords=nvidia-cuda-toolkit are of lower version.

@jatinkatyal
Copy link

After some research, I understood that the problem was that I actually did not have CUDA installed.
You can find it out by doing: nvcc –V
If nothing is returned it means that you did not install CUDA
I followed all this:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
And I installed CUDA with the following official link
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=deb_local
After what's explained above, I did installed the nvidia-development-kit simply with
sudo apt install nvidia-cuda-toolkit
Then you can do:
export CUDA_HOME=/usr/local/cuda-11
(before doing it you should check that this is the folder in which CUDA has been installed on your machine)
I hope this helps someone else in the same situation.

I am in something deeper, can you help?

$ nvcc -V
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

$ conda activate compvis36
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Aug_15_21:14:11_PDT_2021
Cuda compilation tools, release 11.4, V11.4.120
Build cuda_11.4.r11.4/compiler.30300941_0

With/without the environment active when I type in

$ sudo apt install nvidia-cuda-toolkit
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-cuda-toolkit : Depends: nvidia-cuda-dev (= 10.1.243-3) but it is not going to be installed
                       Recommends: nsight-compute (= 10.1.243-3)
                       Recommends: nsight-systems (= 10.1.243-3)
E: Unable to correct problems, you have held broken packages.

Any tips on how can I get the nvidia-cuda-toolkit for version 11.4.? Those listed on https://packages.ubuntu.com/search?keywords=nvidia-cuda-toolkit are of lower version.

I fixed this by reinstalling cuda 11.4 using run file from nvidia. but now I am facing different issues which are reported on the repo. Like import error for _ext. Switching to different issue threads now.

@TaQuangTu
Copy link

For orthers coming later, remember to set CUDA_HOME environment variable.
export CUDA_HOME=/path/to/your/cuda/

@tranngocphuong89
Copy link

I resolved this issue by forcing python setup.py build develop go through

DCNv2/setup.py

Lines 34 to 42 in c7f778f

extension = CUDAExtension
sources += source_cuda
define_macros += [("WITH_CUDA", None)]
extra_compile_args["nvcc"] = [
"-DCUDA_HAS_FP16=1",
"-D__CUDA_NO_HALF_OPERATORS__",
"-D__CUDA_NO_HALF_CONVERSIONS__",
"-D__CUDA_NO_HALF2_OPERATORS__",
]

I works for me. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants