Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for newer CUDA capability (e.g. sm_86) #359

Closed
MMelQin opened this issue Sep 29, 2022 Discussed in #358 · 0 comments · Fixed by #381
Closed

Support for newer CUDA capability (e.g. sm_86) #359

MMelQin opened this issue Sep 29, 2022 Discussed in #358 · 0 comments · Fixed by #381

Comments

@MMelQin
Copy link
Collaborator

MMelQin commented Sep 29, 2022

Discussed in #358

Originally posted by Leengit September 28, 2022
I successfully build a docker image with monai-deploy package that runs on the computer on which I built it. However when I try to run the same docker image on a computer with a significantly newer / more powerful GPU, it fails. It appears that the underlying docker image nvcr.io/nvidia/pytorch:21.07-py3 uses a version of CUDA 11.3 and torch that do not support sm_86. Upgrading to torch==1.12.1 within the docker image that I create (and committing the change to a new image that I then use) does not help. Despite my attempts with apt-get, I have been unable to install a newer version of CUDA within the created docker image.

Your help with getting support for an NVIDIA RTX A5000 would be much appreciated! The error from running the docker image that I created with monai deploy includes

NVIDIA RTX A5000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA RTX A5000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
...
  File "~/venv/lung/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
                            weight, bias, self.stride,
                            _pair(0), self.dilation, self.groups)
        return F.conv2d(input, weight, bias, self.stride,
               ~~~~~~~~ <--- HERE   
                        self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
```</div>
@MMelQin MMelQin added this to Needs Triage in Backlog via automation Oct 16, 2022
@MMelQin MMelQin added this to To do in v1.0.0 via automation Oct 26, 2022
@MMelQin MMelQin removed this from Needs Triage in Backlog Oct 26, 2022
@MMelQin MMelQin moved this from To do to In progress in v1.0.0 Oct 26, 2022
@MMelQin MMelQin moved this from In progress to Done in v1.0.0 Nov 3, 2022
@MMelQin MMelQin removed this from Done in v1.0.0 Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant