RuntimeError: CUDA error: no kernel image is available for execution on the device #73

sijin-dm · 2022-12-16T02:47:49Z

Describe the bug
RuntimeError: CUDA error: no kernel image is available for execution on the device

To Reproduce
Steps to reproduce the behavior:

Go to webui
Click on 'Run Model'
See the error

Additional context
After exec into the docker file, I found that it is due to the mismatched torch version. The installed version is torch==1.10.1+cu102, which only support to sm_70, but my 3090 need sm_86.

The text was updated successfully, but these errors were encountered:

jonbakerfish · 2023-03-09T12:35:46Z

same problem here:

/opt/conda/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
test data convert to vgpu BEGIN
kk : tensor([0., 0., 0., 0.])
/opt/conda/lib/python3.6/site-packages/torch/cuda/__init__.py:143: UserWarning: 
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
test data convert to vgpu END
Namespace(conf_thres=0.5, device='0', img_size=1280, iou_thres=0.45, port=5000, weights=['best_overall.pt'])
Traceback (most recent call last):
  File "server_abroad.py", line 297, in <module>
    model = attempt_load(opt.weights, map_location=device)  # load FP32 model
  File "/home/models/experimental.py", line 137, in attempt_load
    model.append(torch.load(w, map_location=map_location)['model'].float().fuse().eval())  # load FP32 model
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 735, in float
    return self._apply(lambda t: t.float() if t.is_floating_point() else t)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 570, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 570, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 570, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 593, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 735, in <lambda>
    return self._apply(lambda t: t.float() if t.is_floating_point() else t)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Is it possible for us to get the Dockerfile for these two images to build locally? :

basicai/xtreme1-image-object-detection
basicai/xtreme1-point-cloud-object-detection

ClementLeBihan · 2023-09-29T14:58:21Z

Hi @jonbakerfish and @sijin-dm,

I managed to update dockerfile of image and point cloud detection to rebuild it with updated cuda version and torch version. You can find original Dockerfile here :
https://github.com/xtreme1-io/point-cloud-object-detection
https://github.com/xtreme1-io/image-object-detection/

ManonCortial · 2024-02-16T14:19:42Z

I have the same error :

When I hit 'Run model' on my dataset, I get :

`
point-cloud-object-detection-1 | Traceback (most recent call last):
point-cloud-object-detection-1 | File "app.py", line 80, in process_data
point-cloud-object-detection-1 | results, _ = self.predictor(points=pc, full_nms=True)
point-cloud-object-detection-1 | File "/app/pcdet_open/src/predictor.py", line 80, in call
point-cloud-object-detection-1 | pred_dicts, _ = self.model.forward(data_dict)
point-cloud-object-detection-1 | File "/src/pcdet/models/detectors/centerpoint.py", line 11, in forward
point-cloud-object-detection-1 | batch_dict = cur_module(batch_dict)
point-cloud-object-detection-1 | File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
point-cloud-object-detection-1 | return forward_call(*input, **kwargs)
point-cloud-object-detection-1 | File "/src/pcdet/models/backbones_3d/vfe/mean_vfe.py", line 26, in forward
point-cloud-object-detection-1 | points_mean = voxel_features[:, :, :].sum(dim=1, keepdim=False)
point-cloud-object-detection-1 | RuntimeError: CUDA error: no kernel image is available for execution on the device
point-cloud-object-detection-1 | CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
point-cloud-object-detection-1 | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
point-cloud-object-detection-1 | 2024-02-16 13:49:17 INFO log_request: 200 POST /pointCloud/recognition (172.18.0.10) 3281.17ms

`
My dataset is a set of .pcd with only x,y,z data. Update : just tested with the data provided by Xtrem1 and the error is the same.
I have Ubuntu 22.

I tested with your new image and the result is the same.

ClementLeBihan · 2024-02-16T15:16:28Z

Hi Manon, did you check the compatibility with the cuda version in point-cloud-object-detection docker and on your local machine ? On my side I upgraded to 11.5.2 (and so upgraded torch, torchvision and torchaudio)

ManonCortial · 2024-02-20T14:39:02Z

Indeed inside the cloud-object-detection container, the command 'torch.cuda.get_device_name()' gives me :

'
/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py:143: UserWarning:
NVIDIA GeForce RTX 3050 Ti Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
'NVIDIA GeForce RTX 3050 Ti Laptop GPU'
'

If I find some time, I may try to build an image based on ubuntu 22.04 instead of 18.04.
Will keep you updated if I get some results.

ClementLeBihan · 2024-02-20T14:50:51Z

Add your archi to TORCH_CUDA_ARCH_LIST ;)
ENV TORCH_CUDA_ARCH_LIST="5.0 5.2 6.0 6.1 7.0 7.5+PTX 8.0 8.6" in base_image/Dockerfile

ManonCortial · 2024-02-22T09:24:18Z

Oh, I didn't know about that ! (I'm still on the learning curve for both docker and CUDA).

But it seems that nvidia removed nvidia/cuda:10.2-cudnn8-devel-ubuntu18.04 from docker hub, so I didn't manage to rebuild the base image so far.

ClementLeBihan · 2024-02-22T09:26:07Z

Feel free to change the starting image, like nvidia/cuda:11.5.2-cudnn8-devel-ubuntu20.04 for example

ManonCortial · 2024-02-22T12:20:16Z

I works !
I had to adapt the dockerfile, but I managed to get an object detection working !

What I did:
download https://github.com/xtreme1-io/point-cloud-object-detection/tree/main
cd point-cloud-object-detection/base-image

in the Dockerfile:

replace line 1 by
FROM nvidia/cuda:11.5.2-cudnn8-devel-ubuntu20.04
replace lines 19 to 23 by :
RUN pip install -U numpy==1.23.5
RUN pip install Pillow==6.2.2
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
RUN pip install spconv-cu118

(not sure the -U is needed for numpy, but I was testing it from the child docker file to avoid recompiling pytorch)

replace line 28 by
ENV TORCH_CUDA_ARCH_LIST="5.0 5.2 6.0 6.1 7.0 7.5+PTX 8.0 8.6"

In you terminal :
docker build --tag 'my-xtrem1-point-cloud-object-detection-base' .

It takes a while (and lots of storage space too)

then
cd ..
In the dockerfile:
replace line 1 by
FROM my-xtrem1-point-cloud-object-detection-base

In your terminal
docker build --tag 'updated-point-cloud-object-detection' .

cd /path/to/xtrem1
In the docker-compose.yml, in the point-cloud-object-detection section, replace
image: basicai/xtreme1-point-cloud-object-detection
by
image: updated-point-cloud-object-detection

then
docker compose --profile model up

Hope this helps! Thank you Clément for you help and reactivity.

ClementLeBihan · 2024-02-22T12:41:47Z

You can let docker compose build your docker images, just set the build directory in the point-cloud-object-detection section of the docker-compose.yml!

ManonCortial · 2024-02-22T14:58:13Z

oh, yes, good to know !

sijin-dm assigned jaggerwang and nicozhan Dec 16, 2022

jaggerwang assigned liu4lin, jidasheng and zhangxuefengtest and unassigned jaggerwang Mar 10, 2023

jaggerwang closed this as completed Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: no kernel image is available for execution on the device #73

RuntimeError: CUDA error: no kernel image is available for execution on the device #73

sijin-dm commented Dec 16, 2022

jonbakerfish commented Mar 9, 2023 •

edited

ClementLeBihan commented Sep 29, 2023 •

edited

ManonCortial commented Feb 16, 2024 •

edited

ClementLeBihan commented Feb 16, 2024

ManonCortial commented Feb 20, 2024

ClementLeBihan commented Feb 20, 2024 •

edited

ManonCortial commented Feb 22, 2024

ClementLeBihan commented Feb 22, 2024

ManonCortial commented Feb 22, 2024

ClementLeBihan commented Feb 22, 2024 •

edited

ManonCortial commented Feb 22, 2024

RuntimeError: CUDA error: no kernel image is available for execution on the device #73

RuntimeError: CUDA error: no kernel image is available for execution on the device #73

Comments

sijin-dm commented Dec 16, 2022

jonbakerfish commented Mar 9, 2023 • edited

ClementLeBihan commented Sep 29, 2023 • edited

ManonCortial commented Feb 16, 2024 • edited

ClementLeBihan commented Feb 16, 2024

ManonCortial commented Feb 20, 2024

ClementLeBihan commented Feb 20, 2024 • edited

ManonCortial commented Feb 22, 2024

ClementLeBihan commented Feb 22, 2024

ManonCortial commented Feb 22, 2024

ClementLeBihan commented Feb 22, 2024 • edited

ManonCortial commented Feb 22, 2024

jonbakerfish commented Mar 9, 2023 •

edited

ClementLeBihan commented Sep 29, 2023 •

edited

ManonCortial commented Feb 16, 2024 •

edited

ClementLeBihan commented Feb 20, 2024 •

edited

ClementLeBihan commented Feb 22, 2024 •

edited