UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR #126605

tjasmin111 · 2024-05-18T02:07:12Z

I'm trying to train a model with Yolov8. Everything was good but today I start get this warning:

site-packages/torch/autograd/graph.py:744: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

The training seems to be progressing though. I'm not sure if it has any negative effects on the training progress.

What is the problem and how to address this?

Here are the system info:

Collecting environment information...
PyTorch version: 2.3.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.29.3
Libc version: glibc-2.31
Python version: 3.9.7 | packaged by conda-forge | (default, Sep  2 2021, 17:58:34)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-69-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA A100 80GB PCIe
Nvidia driver version: 515.105.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.8.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.8.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture:                    x86_64

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] onnx==1.16.0
[pip3] onnxruntime==1.17.3
[pip3] onnxruntime-gpu==1.17.1
[pip3] onnxsim==0.4.36
[pip3] optree==0.11.0
[pip3] torch==2.3.0+cu118
[pip3] torchaudio==2.3.0+cu118
[pip3] torchvision==0.18.0+cu118
[pip3] triton==2.3.0
[conda] numpy                     1.24.4                   pypi_0    pypi
[conda] pytorch-quantization      2.2.1                    pypi_0    pypi
[conda] torch                     2.1.1+cu118              pypi_0    pypi
[conda] torchaudio                2.1.1+cu118              pypi_0    pypi
[conda] torchmetrics              0.8.0                    pypi_0    pypi
[conda] torchvision               0.16.1+cu118             pypi_0    pypi
[conda] triton                    2.1.0                    pypi_0    pypi

cc @csarofeen @ptrblck @xwang233

The text was updated successfully, but these errors were encountered:

mikaylagawarecki · 2024-05-20T14:51:20Z

Could you share a minimal repro of this please

tjasmin111 · 2024-05-20T16:21:28Z

I updated my Python from 3.8 to 3.9. Then simply tried to run a Yolov8 training via yolo detect train model=yolov8n.pt data=data.yaml imgsz=640 device=1

adajscas · 2024-05-21T07:10:25Z

I also had this problem . Have you solved it ?

kubicol · 2024-05-30T12:09:50Z

I have the same error with ComfyUI...

mikaylagawarecki added the module: cudnn Related to torch.backends.cudnn, and CuDNN support label May 20, 2024

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR #126605

UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR #126605

tjasmin111 commented May 18, 2024 •

edited by pytorch-bot bot

mikaylagawarecki commented May 20, 2024

tjasmin111 commented May 20, 2024

adajscas commented May 21, 2024

kubicol commented May 30, 2024

UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR #126605

UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR #126605

Comments

tjasmin111 commented May 18, 2024 • edited by pytorch-bot bot

mikaylagawarecki commented May 20, 2024

tjasmin111 commented May 20, 2024

adajscas commented May 21, 2024

kubicol commented May 30, 2024

tjasmin111 commented May 18, 2024 •

edited by pytorch-bot bot