Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure after installation - torchsearchsorted on CUDA device is asked, but it seems that it is not available #17

Closed
qway opened this issue May 4, 2020 · 15 comments

Comments

@qway
Copy link

qway commented May 4, 2020

Im currently trying to get to run, but i got the following error:

Traceback (most recent call last):
  File "train_nerf.py", line 405, in <module>
    main()
  File "train_nerf.py", line 240, in main
    encode_direction_fn=encode_direction_fn,
  File "/home/nerfteam/nerfmeshes/src/nerf/nerf/train_utils.py", line 180, in run_one_iter_of_nerf
    for batch in batches
  File "/home/nerfteam/nerfmeshes/src/nerf/nerf/train_utils.py", line 180, in <listcomp>
    for batch in batches
  File "/home/nerfteam/nerfmeshes/src/nerf/nerf/train_utils.py", line 101, in predict_and_render_radiance
    det=(getattr(options.nerf, mode).perturb == 0.0),
  File "/home/nerfteam/nerfmeshes/src/nerf/nerf/nerf_helpers.py", line 288, in sample_pdf_2
    inds = torchsearchsorted.searchsorted(cdf, u, side="right")
  File "/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torchsearchsorted/searchsorted.py", line 41, in searchsorted
    raise Exception('torchsearchsorted on CUDA device is asked, but it seems '
Exception: torchsearchsorted on CUDA device is asked, but it seems that it is not available. Please install it

I use poetry, but tried both adding torchsearchsorted as a dependency and installing through pip as advised in the installation part of the readme. In which situations can this exception be thrown? I would be happy if you could point me into the right direction.

@aliutkus
Copy link
Owner

aliutkus commented May 4, 2020

hi, are you sure that it compiled right ?
what happens if you try 'python setup.py install' from the root directory of the package on the command line?

@qway
Copy link
Author

qway commented May 4, 2020

I ran the output into a file by

python setup.py install > setup.log

This is the output:

running install
running bdist_egg
running egg_info
writing src/torchsearchsorted.egg-info/PKG-INFO
writing dependency_links to src/torchsearchsorted.egg-info/dependency_links.txt
writing requirements to src/torchsearchsorted.egg-info/requires.txt
writing top-level names to src/torchsearchsorted.egg-info/top_level.txt
reading manifest file 'src/torchsearchsorted.egg-info/SOURCES.txt'
writing manifest file 'src/torchsearchsorted.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/torchsearchsorted
copying src/torchsearchsorted/utils.py -> build/lib.linux-x86_64-3.7/torchsearchsorted
copying src/torchsearchsorted/searchsorted.py -> build/lib.linux-x86_64-3.7/torchsearchsorted
copying src/torchsearchsorted/__init__.py -> build/lib.linux-x86_64-3.7/torchsearchsorted
running build_ext
building 'torchsearchsorted.cpu' extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/src
creating build/temp.linux-x86_64-3.7/src/cpu
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torch/include -I/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torch/include/TH -I/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torch/include/THC -I/home/nerfteam/nerfmeshes/.venv/include -I/home/nerfteam/.pyenv/versions/3.7.7/include/python3.7m -c src/cpu/searchsorted_cpu_wrapper.cpp -o build/temp.linux-x86_64-3.7/src/cpu/searchsorted_cpu_wrapper.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=cpu -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
g++ -pthread -shared -L/home/nerfteam/.pyenv/versions/3.7.7/lib -L/home/nerfteam/.pyenv/versions/3.7.7/lib build/temp.linux-x86_64-3.7/src/cpu/searchsorted_cpu_wrapper.o -o build/lib.linux-x86_64-3.7/torchsearchsorted/cpu.cpython-37m-x86_64-linux-gnu.so
building 'torchsearchsorted.cuda' extension
creating build/temp.linux-x86_64-3.7/src/cuda
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torch/include -I/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torch/include/TH -I/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/nerfteam/nerfmeshes/.venv/include -I/home/nerfteam/.pyenv/versions/3.7.7/include/python3.7m -c src/cuda/searchsorted_cuda_wrapper.cpp -o build/temp.linux-x86_64-3.7/src/cuda/searchsorted_cuda_wrapper.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
/usr/local/cuda/bin/nvcc -I/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torch/include -I/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torch/include/TH -I/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/nerfteam/nerfmeshes/.venv/include -I/home/nerfteam/.pyenv/versions/3.7.7/include/python3.7m -c src/cuda/searchsorted_cuda_kernel.cu -o build/temp.linux-x86_64-3.7/src/cuda/searchsorted_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_61,code=sm_61 -std=c++11
g++ -pthread -shared -L/home/nerfteam/.pyenv/versions/3.7.7/lib -L/home/nerfteam/.pyenv/versions/3.7.7/lib build/temp.linux-x86_64-3.7/src/cuda/searchsorted_cuda_wrapper.o build/temp.linux-x86_64-3.7/src/cuda/searchsorted_cuda_kernel.o -L/usr/local/cuda/lib64 -lcudart -o build/lib.linux-x86_64-3.7/torchsearchsorted/cuda.cpython-37m-x86_64-linux-gnu.so
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/torchsearchsorted
copying build/lib.linux-x86_64-3.7/torchsearchsorted/utils.py -> build/bdist.linux-x86_64/egg/torchsearchsorted
copying build/lib.linux-x86_64-3.7/torchsearchsorted/cpu.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/torchsearchsorted
copying build/lib.linux-x86_64-3.7/torchsearchsorted/cuda.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg/torchsearchsorted
copying build/lib.linux-x86_64-3.7/torchsearchsorted/searchsorted.py -> build/bdist.linux-x86_64/egg/torchsearchsorted
copying build/lib.linux-x86_64-3.7/torchsearchsorted/__init__.py -> build/bdist.linux-x86_64/egg/torchsearchsorted
byte-compiling build/bdist.linux-x86_64/egg/torchsearchsorted/utils.py to utils.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/torchsearchsorted/searchsorted.py to searchsorted.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/torchsearchsorted/__init__.py to __init__.cpython-37.pyc
creating stub loader for torchsearchsorted/cpu.cpython-37m-x86_64-linux-gnu.so
creating stub loader for torchsearchsorted/cuda.cpython-37m-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/torchsearchsorted/cpu.py to cpu.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/torchsearchsorted/cuda.py to cuda.cpython-37.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying src/torchsearchsorted.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying src/torchsearchsorted.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying src/torchsearchsorted.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying src/torchsearchsorted.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying src/torchsearchsorted.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
creating dist
creating 'dist/torchsearchsorted-1.1-py3.7-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing torchsearchsorted-1.1-py3.7-linux-x86_64.egg
removing '/home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torchsearchsorted-1.1-py3.7-linux-x86_64.egg' (and everything under it)
creating /home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torchsearchsorted-1.1-py3.7-linux-x86_64.egg
Extracting torchsearchsorted-1.1-py3.7-linux-x86_64.egg to /home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages
torchsearchsorted 1.1 is already the active version in easy-install.pth

Installed /home/nerfteam/nerfmeshes/.venv/lib/python3.7/site-packages/torchsearchsorted-1.1-py3.7-linux-x86_64.egg
Processing dependencies for torchsearchsorted==1.1
Finished processing dependencies for torchsearchsorted==1.1

It did not log the cpp compilations, but it looked good what i could see.
Error is still the same. Could there anything else that results in this error?

@aliutkus
Copy link
Owner

aliutkus commented May 5, 2020

sorry for this.
the cpu version does work ?
what version of pytorch/cuda are you using ?
what os are you on?
do you see anything installed in the torchsearchsorted package?

@qway
Copy link
Author

qway commented May 8, 2020

The cpu version does seem to work. It's enough for me right now, but i would be interested in getting the CUDA version to work too.

Im using CUDA 10.1 and pytorch 1.4 on linux.
There are two shared libraries called cpu.cpython-37m-x86_64-linux-gnu.so and cuda.cpython-37m-x86_64-linux-gnu.so in the lib/python3.7/site-packages/torchsearchsorted folder (Next to the .py files).
I do not have sudo privileges on the machine since i'm working on a remote shared server and as such i'm not able to follow your advice regarding the linking of different gcc/g++ versions.

@XiaotengLu
Copy link

XiaotengLu commented May 13, 2020

I also have a similar issue!
I am using python3.7, CUDA:10.1, PyTorch:1.5 on Linux.
Looking for 50000x1000 values in 50000x300 entries NUMPY: searchsorted in 7896.878ms CPU: searchsorted in 3916.315ms difference between CPU and NUMPY: 0.000 Traceback (most recent call last): File "test.py", line 58, in <module> test_GPU = searchsorted(a, v, test_GPU, side)
I ran the test.py and find that cpu version works but not for cuda version.
raise Exception('torchsearchsorted on CUDA device is asked, but it seems ' Exception: torchsearchsorted on CUDA device is asked, but it seems that it is not available. Please install it
Could you help me? Thank you.

@aliutkus
Copy link
Owner

Hi all,

ok, it looks like there's an issue and some people don't manage to compile the cuda code.
Which version of nvcc do you have ?

@aliutkus
Copy link
Owner

concerning the linking of g++/gcc, which versions do you have when you type g++ --version ?

@XiaotengLu
Copy link

I got:
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609

@aliutkus
Copy link
Owner

does it work with latest commit ?

@qway
Copy link
Author

qway commented May 15, 2020

Sadly the new commit did not work, but i did some more digging:

SEARCHSORTED_GPU_AVAILABLE = True
try:
    from torchsearchsorted.cuda import searchsorted_cuda_wrapper
except ImportError:
    SEARCHSORTED_GPU_AVAILABLE = False

This part in searchsorted.py hides certain errors(I dont know of a way of fixing this without making it so that it doesnt fail on import, but shows the correct error on calling searchsorted), by changing it so on an import fail it will display an error, i get this:

ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory

Which leads me to the belief that the default nvcc version for the compilation was wrong - It was actually 10.0 instead of 10.1, which would be the one that pytorch should be using.

Changing the CUDA version using CUDA_HOME=/usr/local/cuda-10.1 leads to compilation errors(over a hundred). If you want to, you can have a look at it here, but i dont know if its actually in any way helpful.

@aliutkus
Copy link
Owner

ok, I had a quick look over the log. I don't really see stuff concerning this torchsearchsorted thing, or am I mistaken?
investigating a bit, I also see that pytorch doesn't look compatible with cuda 10.0

@DomainFlag
Copy link

DomainFlag commented May 20, 2020

@aliutkus We have multiple cuda versions installed on the same remote server that me and @qway
been working with torch using by the default version cuda-10.1 while the package uses the cuda-10.0 version by default, thus the easy fix was explicitly mention the version that matches the one used by PyTorch and now it works:

export PATH="/usr/local/cuda-10.1/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH"
export LIBRARY_PATH="/usr/local/cuda-10.1/lib64:$LIBRARY_PATH"

@aliutkus
Copy link
Owner

haaa I'm happy you found your way through this, congrats
feel free to close the issue

@qway
Copy link
Author

qway commented May 21, 2020

Thanks for your help, as @DomainFlag said, we were able to resolve the problem.

@qway qway closed this as completed May 21, 2020
@SaadatKhan
Copy link

Hi, I get this error

File "train.py", line 180, in
trainer.fit(system)
File "/home/saadat/anaconda3/envs/sss/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 765, in fit
self.single_gpu_train(model)
File "/home/saadat/anaconda3/envs/sss/lib/python3.8/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 492, in single_gpu_train
self.run_pretrain_routine(model)
File "/home/saadat/anaconda3/envs/sss/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 896, in run_pretrain_routine
eval_results = self._evaluate(model,
File "/home/saadat/anaconda3/envs/sss/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 278, in _evaluate
output = self.evaluation_forward(model, batch, batch_idx, dataloader_idx, test_mode)
File "/home/saadat/anaconda3/envs/sss/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 447, in evaluation_forward
output = model.validation_step(*args)
File "train.py", line 123, in validation_step
results = self(rays)
File "/home/saadat/anaconda3/envs/sss/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in call_impl
return forward_call(*input, **kwargs)
File "train.py", line 55, in forward
render_rays(self.models,
File "/home/saadat/Documents/projects/nerf_pl/nerf_pl/models/rendering.py", line 225, in render_rays
z_vals
= sample_pdf(z_vals_mid, weights_coarse[:, 1:-1],
File "/home/saadat/Documents/projects/nerf_pl/nerf_pl/models/rendering.py", line 42, in sample_pdf
inds = searchsorted(cdf, u, side='right')
File "/home/saadat/anaconda3/envs/sss/lib/python3.8/site-packages/torchsearchsorted/searchsorted.py", line 41, in searchsorted
raise Exception('torchsearchsorted on CUDA device is asked, but it seems '
Exception: torchsearchsorted on CUDA device is asked, but it seems that it is not available. Please install it
Can you tell me what to do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants