Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: no kernel image is available for execution on the device #73

Closed
sijin-dm opened this issue Dec 16, 2022 · 11 comments
Assignees

Comments

@sijin-dm
Copy link

Describe the bug
RuntimeError: CUDA error: no kernel image is available for execution on the device

To Reproduce
Steps to reproduce the behavior:

  1. Go to webui
  2. Click on 'Run Model'
  3. See the error

Additional context
After exec into the docker file, I found that it is due to the mismatched torch version. The installed version is torch==1.10.1+cu102, which only support to sm_70, but my 3090 need sm_86.

@jonbakerfish
Copy link

jonbakerfish commented Mar 9, 2023

same problem here:

/opt/conda/lib/python3.6/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
test data convert to vgpu BEGIN
kk : tensor([0., 0., 0., 0.])
/opt/conda/lib/python3.6/site-packages/torch/cuda/__init__.py:143: UserWarning: 
NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
test data convert to vgpu END
Namespace(conf_thres=0.5, device='0', img_size=1280, iou_thres=0.45, port=5000, weights=['best_overall.pt'])
Traceback (most recent call last):
  File "server_abroad.py", line 297, in <module>
    model = attempt_load(opt.weights, map_location=device)  # load FP32 model
  File "/home/models/experimental.py", line 137, in attempt_load
    model.append(torch.load(w, map_location=map_location)['model'].float().fuse().eval())  # load FP32 model
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 735, in float
    return self._apply(lambda t: t.float() if t.is_floating_point() else t)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 570, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 570, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 570, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 593, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 735, in <lambda>
    return self._apply(lambda t: t.float() if t.is_floating_point() else t)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Is it possible for us to get the Dockerfile for these two images to build locally? :

basicai/xtreme1-image-object-detection
basicai/xtreme1-point-cloud-object-detection

@ClementLeBihan
Copy link

ClementLeBihan commented Sep 29, 2023

Hi @jonbakerfish and @sijin-dm,

I managed to update dockerfile of image and point cloud detection to rebuild it with updated cuda version and torch version. You can find original Dockerfile here :
https://github.com/xtreme1-io/point-cloud-object-detection
https://github.com/xtreme1-io/image-object-detection/

@ManonCortial
Copy link

ManonCortial commented Feb 16, 2024

I have the same error :

When I hit 'Run model' on my dataset, I get :

`
point-cloud-object-detection-1 | Traceback (most recent call last):
point-cloud-object-detection-1 | File "app.py", line 80, in process_data
point-cloud-object-detection-1 | results, _ = self.predictor(points=pc, full_nms=True)
point-cloud-object-detection-1 | File "/app/pcdet_open/src/predictor.py", line 80, in call
point-cloud-object-detection-1 | pred_dicts, _ = self.model.forward(data_dict)
point-cloud-object-detection-1 | File "/src/pcdet/models/detectors/centerpoint.py", line 11, in forward
point-cloud-object-detection-1 | batch_dict = cur_module(batch_dict)
point-cloud-object-detection-1 | File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
point-cloud-object-detection-1 | return forward_call(*input, **kwargs)
point-cloud-object-detection-1 | File "/src/pcdet/models/backbones_3d/vfe/mean_vfe.py", line 26, in forward
point-cloud-object-detection-1 | points_mean = voxel_features[:, :, :].sum(dim=1, keepdim=False)
point-cloud-object-detection-1 | RuntimeError: CUDA error: no kernel image is available for execution on the device
point-cloud-object-detection-1 | CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
point-cloud-object-detection-1 | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
point-cloud-object-detection-1 | 2024-02-16 13:49:17 INFO log_request: 200 POST /pointCloud/recognition (172.18.0.10) 3281.17ms

`
My dataset is a set of .pcd with only x,y,z data. Update : just tested with the data provided by Xtrem1 and the error is the same.
I have Ubuntu 22.

I tested with your new image and the result is the same.

@ClementLeBihan
Copy link

Hi Manon, did you check the compatibility with the cuda version in point-cloud-object-detection docker and on your local machine ? On my side I upgraded to 11.5.2 (and so upgraded torch, torchvision and torchaudio)

@ManonCortial
Copy link

Indeed inside the cloud-object-detection container, the command 'torch.cuda.get_device_name()' gives me :

'
/usr/local/lib/python3.6/dist-packages/torch/cuda/init.py:143: UserWarning:
NVIDIA GeForce RTX 3050 Ti Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3050 Ti Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
'NVIDIA GeForce RTX 3050 Ti Laptop GPU'
'

If I find some time, I may try to build an image based on ubuntu 22.04 instead of 18.04.
Will keep you updated if I get some results.

@ClementLeBihan
Copy link

ClementLeBihan commented Feb 20, 2024

Add your archi to TORCH_CUDA_ARCH_LIST ;)
ENV TORCH_CUDA_ARCH_LIST="5.0 5.2 6.0 6.1 7.0 7.5+PTX 8.0 8.6" in base_image/Dockerfile

@ManonCortial
Copy link

Oh, I didn't know about that ! (I'm still on the learning curve for both docker and CUDA).

But it seems that nvidia removed nvidia/cuda:10.2-cudnn8-devel-ubuntu18.04 from docker hub, so I didn't manage to rebuild the base image so far.

@ClementLeBihan
Copy link

Feel free to change the starting image, like nvidia/cuda:11.5.2-cudnn8-devel-ubuntu20.04 for example

@ManonCortial
Copy link

I works !
I had to adapt the dockerfile, but I managed to get an object detection working !

What I did:
download https://github.com/xtreme1-io/point-cloud-object-detection/tree/main
cd point-cloud-object-detection/base-image

in the Dockerfile:

  • replace line 1 by
    FROM nvidia/cuda:11.5.2-cudnn8-devel-ubuntu20.04

  • replace lines 19 to 23 by :
    RUN pip install -U numpy==1.23.5
    RUN pip install Pillow==6.2.2
    RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
    RUN pip install spconv-cu118

(not sure the -U is needed for numpy, but I was testing it from the child docker file to avoid recompiling pytorch)

  • replace line 28 by
    ENV TORCH_CUDA_ARCH_LIST="5.0 5.2 6.0 6.1 7.0 7.5+PTX 8.0 8.6"

In you terminal :
docker build --tag 'my-xtrem1-point-cloud-object-detection-base' .

It takes a while (and lots of storage space too)

then
cd ..
In the dockerfile:
replace line 1 by
FROM my-xtrem1-point-cloud-object-detection-base

In your terminal
docker build --tag 'updated-point-cloud-object-detection' .

cd /path/to/xtrem1
In the docker-compose.yml, in the point-cloud-object-detection section, replace
image: basicai/xtreme1-point-cloud-object-detection
by
image: updated-point-cloud-object-detection

then
docker compose --profile model up

Hope this helps! Thank you Clément for you help and reactivity.

@ClementLeBihan
Copy link

ClementLeBihan commented Feb 22, 2024

You can let docker compose build your docker images, just set the build directory in the point-cloud-object-detection section of the docker-compose.yml!

@ManonCortial
Copy link

oh, yes, good to know !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants