Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal #29

Open
ViktorAlm opened this issue Aug 22, 2021 · 6 comments

Comments

@ViktorAlm
Copy link

Hello! Nice work!

I'm trying to get this running on an RTX 3090, I'm getting warnings when installing where its recommending that i launch with -std=c++14

Other than that I'm not seeing anything out of the ordinary. Has anyone else managed to get this running for newer rtx cards?

  File "/opt/conda/envs/pytorch_venv/lib/python3.7/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/envs/pytorch_venv/lib/python3.7/site-packages/torch/autograd/__init__.py", line 149, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
  File "/opt/conda/envs/pytorch_venv/lib/python3.7/site-packages/torch/autograd/function.py", line 87, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore[attr-defined]
  File "/opt/conda/envs/pytorch_venv/lib/python3.7/site-packages/diffvg-0.0.1-py3.7-linux-x86_64.egg/pydiffvg/render_pytorch.py", line 709, in backward
    eval_positions.shape[0])
RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDevice: invalid device ordinal
@josephrocca
Copy link

josephrocca commented Aug 31, 2021

Seeing a similar/related error on a Tesla K80:

/usr/local/lib/python3.7/dist-packages/diffvg-0.0.1-py3.7-linux-x86_64.egg/pydiffvg/render_pytorch.py in backward(ctx, grad_img)
    707                       use_prefiltering,
    708                       diffvg.float_ptr(eval_positions.data_ptr()),
--> 709                       eval_positions.shape[0])
    710         time_elapsed = time.time() - start
    711         global print_timing

RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function

@jmsancho
Copy link

jmsancho commented Sep 7, 2021

Having the same issue as @josephrocca with a Tesla K80.

@rvorias
Copy link

rvorias commented Sep 11, 2021

Same, I also do see it pop up more with later GPUs (RTX 30x0)

@bfrasure
Copy link

bfrasure commented Nov 2, 2021

Please provide the code leading up to it. I need more context

@IzhanVarsky
Copy link

IzhanVarsky commented Dec 15, 2021

Hi! I also ran into this problem. As I understand it, this is a compatibility issue. Changing this line

set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -std=c++11")
to this: set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -std=c++14 -gencode=arch=compute_37,code=sm_37") for Tesla K80 on Google Colab helped me. -gencode=arch=compute_86,code=sm_86 for RTX 3090. -gencode=arch=compute_75,code=sm_75 for Tesla T4. Found info about matching CUDA arch here: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

@pschaldenbrand
Copy link

@IzhanVarsky Thank you so much!! I updated the install section of my notebooks that use diffvg with the following code, and now they work when Colab assigns me K80 machines.

%cd /content/
!git clone https://github.com/BachiLi/diffvg 
%cd diffvg
import subprocess
if 'K80' in str(subprocess.check_output(['nvidia-smi', '-L'])):
    !sed -i 's/set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -std=c++11")/set(CUDA_NVCC_FLAGS "${CUDA_NVCC_FLAGS} -std=c++14 -gencode=arch=compute_37,code=sm_37")/' /content/diffvg/CMakeLists.txt
!git submodule update --init --recursive 
!python setup.py install          

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants