Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR #126605

Open
tjasmin111 opened this issue May 18, 2024 · 4 comments
Labels
module: cudnn Related to torch.backends.cudnn, and CuDNN support triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@tjasmin111
Copy link

tjasmin111 commented May 18, 2024

I'm trying to train a model with Yolov8. Everything was good but today I start get this warning:

site-packages/torch/autograd/graph.py:744: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

The training seems to be progressing though. I'm not sure if it has any negative effects on the training progress.

What is the problem and how to address this?

Here are the system info:

Collecting environment information...
PyTorch version: 2.3.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.29.3
Libc version: glibc-2.31
Python version: 3.9.7 | packaged by conda-forge | (default, Sep  2 2021, 17:58:34)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-69-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA A100 80GB PCIe
Nvidia driver version: 515.105.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.8.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture:                    x86_64

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] onnx==1.16.0
[pip3] onnxruntime==1.17.3
[pip3] onnxruntime-gpu==1.17.1
[pip3] onnxsim==0.4.36
[pip3] optree==0.11.0
[pip3] torch==2.3.0+cu118
[pip3] torchaudio==2.3.0+cu118
[pip3] torchvision==0.18.0+cu118
[pip3] triton==2.3.0
[conda] numpy                     1.24.4                   pypi_0    pypi
[conda] pytorch-quantization      2.2.1                    pypi_0    pypi
[conda] torch                     2.1.1+cu118              pypi_0    pypi
[conda] torchaudio                2.1.1+cu118              pypi_0    pypi
[conda] torchmetrics              0.8.0                    pypi_0    pypi
[conda] torchvision               0.16.1+cu118             pypi_0    pypi
[conda] triton                    2.1.0                    pypi_0    pypi

cc @csarofeen @ptrblck @xwang233

@mikaylagawarecki mikaylagawarecki added the module: cudnn Related to torch.backends.cudnn, and CuDNN support label May 20, 2024
@mikaylagawarecki
Copy link
Contributor

Could you share a minimal repro of this please

@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 20, 2024
@tjasmin111
Copy link
Author

I updated my Python from 3.8 to 3.9. Then simply tried to run a Yolov8 training via yolo detect train model=yolov8n.pt data=data.yaml imgsz=640 device=1

@adajscas
Copy link

I also had this problem . Have you solved it ?

@kubicol
Copy link

kubicol commented May 30, 2024

I have the same error with ComfyUI...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cudnn Related to torch.backends.cudnn, and CuDNN support triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants