Error when building TensorRT Engine. #76

0zl · 2023-05-06T16:33:00Z

Describe the bug

When attempting to build TensorRT Engine, an error message is displayed indicating that cuDNN is not initialized (CUDNN_STATUS_NOT_INITIALIZED).

Error Log

[I]     Total Nodes | Original:  1015, After Folding:   842 |   173 Nodes Folded
[I] Folding Constants | Pass 3
[I]     Total Nodes | Original:   842, After Folding:   842 |     0 Nodes Folded
[INFO] Exporting model: models/accelerate/tensorrt/runwayml/stable-diffusion-v1-5/onnx/unet.onnx

Warning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if any(s % default_overall_up_factor != 0 for s in sample.shape[-2:]):

Traceback (most recent call last):
  File "/env/lib/python3.10/site-packages/gradio/routes.py", line 399, in run_predict
    output = await app.get_blocks().process_api(
  File "/env/lib/python3.10/site-packages/gradio/blocks.py", line 1299, in process_api
    result = await self.call_function(
  File "/env/lib/python3.10/site-packages/gradio/blocks.py", line 1036, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/env/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/env/lib/python3.10/site-packages/gradio/utils.py", line 488, in async_iteration
    return next(iterator)
  File "/Radiata/modules/tabs/tensorrt.py", line 173, in build_engine
    builder.build()
  File "/Radiata/modules/acceleration/tensorrt/engine.py", line 72, in build
    export_onnx(
  File "/Radiata/lib/tensorrt/utilities.py", line 431, in export_onnx
    torch.onnx.export(
  File "/env/lib/python3.10/site-packages/torch/onnx/utils.py", line 504, in export
    _export(
  File "/env/lib/python3.10/site-packages/torch/onnx/utils.py", line 1529, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/env/lib/python3.10/site-packages/torch/onnx/utils.py", line 1111, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/env/lib/python3.10/site-packages/torch/onnx/utils.py", line 987, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/env/lib/python3.10/site-packages/torch/onnx/utils.py", line 891, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/env/lib/python3.10/site-packages/torch/jit/_trace.py", line 1184, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/jit/_trace.py", line 127, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/env/lib/python3.10/site-packages/torch/jit/_trace.py", line 118, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/env/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 718, in forward
    sample = self.conv_in(sample)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Reproduction

Run launch.sh with --share and --tensorrt command arguments
Open web ui, go to TensorRT tab
Click Build button.
Error occurs when exporting the first unet.onnx.

Expected behavior

It should be exported and without throwing any errors.

System Info

Ubuntu 20
Python 3.10
CUDA 1.13/1.14
TensorRT 8.6.0 (auto-installer by launch.py)
Torch 1.13.1 (auto-installer by launch.py)
Inside a Jupyter Notebook.

Additional context

No response

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.

The text was updated successfully, but these errors were encountered:

0zl · 2023-05-06T16:42:21Z

It seems that this issue is related to issue #30. I apologize for opening duplicate issues.

0zl · 2023-05-06T17:47:49Z

I discovered another issue that might be related to this problem. When I installed TensorRT 8.6.0 and ran a simple test provided by NVIDIA, I received the following error message:

import tensorrt
print(tensorrt.__version__)
assert tensorrt.Builder(tensorrt.Logger())

8.6.0
[05/06/2023-17:34:17] [TRT] [W] Unable to determine GPU memory usage
[05/06/2023-17:34:17] [TRT] [W] Unable to determine GPU memory usage
[05/06/2023-17:34:17] [TRT] [W] CUDA initialization failure with error: 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

However, when I used TensorRT 8.5.3.1, the error disappeared, and I received the following message instead:

import tensorrt
print(tensorrt.__version__)
assert tensorrt.Builder(tensorrt.Logger())

8.5.3.1
[05/06/2023-17:38:45] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See `CUDA_MODULE_LOADING` in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars

Based on the NVIDIA TensorRT Install Guide, this error is likely due to an incorrect NVIDIA Driver installation. However, since I am using a managed Docker image, it may not be compatible with TensorRT 8.6.0. If that is the case, it would be beneficial to implement compatibility with an older version if possible.

Stax124 · 2023-05-06T17:54:30Z

TensorRT has recommended CUDA 12.x, try to update the drivers if you have the option

0zl · 2023-05-06T18:03:48Z

TensorRT has recommended CUDA 12.x, try to update the drivers if you have the option

Yes, this must be the main problem. Unfortunately, I don't have the option to upgrade the CUDA version. It appears that Radiata TensorRT is incompatible with CUDA < 12.x. Therefore, I will close this issue. Thanks Stax!

DepressiveDude · 2023-05-09T00:56:53Z

I seem to have this issue as well. The cuda version is 12.1. Does anyone have fixes?

ddPn08 · 2023-05-09T14:24:21Z

Is cudnn properly installed on the machine being used?

DepressiveDude · 2023-05-09T14:55:22Z

I never downloaded it specifically. I just did the set up steps for Linux on a Ubuntu WSL distribution.

0zl · 2023-05-09T15:05:38Z

Is cudnn properly installed on the machine being used?

Yes, it is likely installed properly. I use a managed Docker image, and I am able to generate normal images using the default diffusers.

$: nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+

$: nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
Tue May  9 14:34:12 2023

import tensorflow as tf

sys_details = tf.sysconfig.get_build_info()
cuda_version = sys_details["cuda_version"]
cudnn_version = sys_details["cudnn_version"]  

print(f'cuda: {cuda_version}', f'cudnn: {cudnn_version}')

$: cuda: 11.2 cudnn: 8

I'm not sure why the CUDA version from TensorFlow is mismatched with nvidia-smi. However, when I tested with older TensorRT 8.5.*, it was able to detect the cuDNN.

DepressiveDude · 2023-05-09T15:27:26Z

I am able to generate normal image as well, but the error shows up when I use --tensorrt flag. I should say I am using anaconda with Ubuntu WSL. Should I just reinstall everything from scratch?

DepressiveDude · 2023-05-11T17:09:03Z

0zl, you mentioned a docker image. Is it the one in the tensorrt 8.5 version?

0zl · 2023-05-11T19:38:03Z

0zl, you mentioned a docker image. Is it the one in the tensorrt 8.5 version?

The public PyTorch docker image I am using does not come with TensorRT pre-installed. For convenience, you can use the TensorRT pre-installed docker container provided by NVIDIA.

DepressiveDude · 2023-05-12T05:40:11Z

that sounds like a good idea, thanks. i'll try it out.

Jon-Zbw · 2023-06-20T02:24:27Z

Describe the bug

When attempting to build TensorRT Engine, an error message is displayed indicating that cuDNN is not initialized ().CUDNN_STATUS_NOT_INITIALIZED

Error Log

[I]     Total Nodes | Original:  1015, After Folding:   842 |   173 Nodes Folded
[I] Folding Constants | Pass 3
[I]     Total Nodes | Original:   842, After Folding:   842 |     0 Nodes Folded
[INFO] Exporting model: models/accelerate/tensorrt/runwayml/stable-diffusion-v1-5/onnx/unet.onnx

Warning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if any(s % default_overall_up_factor != 0 for s in sample.shape[-2:]):

Traceback (most recent call last):
  File "/env/lib/python3.10/site-packages/gradio/routes.py", line 399, in run_predict
    output = await app.get_blocks().process_api(
  File "/env/lib/python3.10/site-packages/gradio/blocks.py", line 1299, in process_api
    result = await self.call_function(
  File "/env/lib/python3.10/site-packages/gradio/blocks.py", line 1036, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/env/lib/python3.10/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/env/lib/python3.10/site-packages/gradio/utils.py", line 488, in async_iteration
    return next(iterator)
  File "/Radiata/modules/tabs/tensorrt.py", line 173, in build_engine
    builder.build()
  File "/Radiata/modules/acceleration/tensorrt/engine.py", line 72, in build
    export_onnx(
  File "/Radiata/lib/tensorrt/utilities.py", line 431, in export_onnx
    torch.onnx.export(
  File "/env/lib/python3.10/site-packages/torch/onnx/utils.py", line 504, in export
    _export(
  File "/env/lib/python3.10/site-packages/torch/onnx/utils.py", line 1529, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/env/lib/python3.10/site-packages/torch/onnx/utils.py", line 1111, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/env/lib/python3.10/site-packages/torch/onnx/utils.py", line 987, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/env/lib/python3.10/site-packages/torch/onnx/utils.py", line 891, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/env/lib/python3.10/site-packages/torch/jit/_trace.py", line 1184, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/jit/_trace.py", line 127, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/env/lib/python3.10/site-packages/torch/jit/_trace.py", line 118, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/env/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 718, in forward
    sample = self.conv_in(sample)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/env/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Reproduction

Run with and command argumentslaunch.sh``--share``--tensorrt
Open web ui, go to TensorRT tab
Click Build button.
Error occurs when exporting the first .unet.onnx

Expected behavior

It should be exported and without throwing any errors.

System Info

Ubuntu 20
Python 3.10
CUDA 1.13/1.14
TensorRT 8.6.0 (auto-installer by launch.py)
Torch 1.13.1 (auto-installer by launch.py)
Inside a Jupyter Notebook.

Additional context

No response

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.

Have you solved it? i met the same trouble,but i didnt receive the detail of bug,
like:
Downloading (…)_encoder/config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 617/617 [00:00<00:00, 1.63MB/s]
Downloading model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 492M/492M [01:16<00:00, 6.46MB/s]
[INFO] Exporting model: models/accelerate/tensorrt/runwayml/stable-diffusion-v1-5/onnx/unet.onnx
/Radiata/venv/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py:650: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if any(s % default_overall_up_factor != 0 for s in sample.shape[-2:]):
thats all !!!!

in addition:
my basic image from：docker pull 11.8.0-cudnn8-devel-ubuntu20.04
so CUDA = 11.8
cudnn = 8
tensorrt = 8.5.1.7
how to solve this bug? @ddPn08

Jon-Zbw · 2023-06-20T07:24:02Z

这听起来是个好主意，谢谢。我会尝试一下。

have you solved it? @0zl

Stax124 · 2023-07-08T10:03:15Z

Is cudnn properly installed on the machine being used?

Yes, it is likely installed properly. I use a managed Docker image, and I am able to generate normal images using the default diffusers.

$: nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+

$: nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0
Tue May  9 14:34:12 2023

import tensorflow as tf

sys_details = tf.sysconfig.get_build_info()
cuda_version = sys_details["cuda_version"]
cudnn_version = sys_details["cudnn_version"]  

print(f'cuda: {cuda_version}', f'cudnn: {cudnn_version}')

$: cuda: 11.2 cudnn: 8

I'm not sure why the CUDA version from TensorFlow is mismatched with nvidia-smi. However, when I tested with older TensorRT 8.5.*, it was able to detect the cuDNN.

There seems to be a version mismatch everywhere. That will cause some problems. This usually happens with poorly installed NVIDIA drivers and CUDA. As for how to fix this stuff, if you are on a Linux host, then it's not exactly simple. You are usually better off just purging everything with cuda and nvidia in the name, but that is usually done from a TTY2, as DE and WM will usually crash.

Can you tell me what exact environment are you running in (OS, Docker Image, GPU...)

0zl closed this as completed May 6, 2023

0zl reopened this May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when building TensorRT Engine. #76

Error when building TensorRT Engine. #76

0zl commented May 6, 2023 •

edited

Loading

0zl commented May 6, 2023

0zl commented May 6, 2023

Stax124 commented May 6, 2023

0zl commented May 6, 2023 •

edited

Loading

DepressiveDude commented May 9, 2023

ddPn08 commented May 9, 2023

DepressiveDude commented May 9, 2023

0zl commented May 9, 2023

DepressiveDude commented May 9, 2023

DepressiveDude commented May 11, 2023

0zl commented May 11, 2023

DepressiveDude commented May 12, 2023

Jon-Zbw commented Jun 20, 2023

Describe the bug

Reproduction

Expected behavior

System Info

Additional context

Validations

Jon-Zbw commented Jun 20, 2023

Stax124 commented Jul 8, 2023

Error when building TensorRT Engine. #76

Error when building TensorRT Engine. #76

Comments

0zl commented May 6, 2023 • edited Loading

Describe the bug

Reproduction

Expected behavior

System Info

Additional context

Validations

0zl commented May 6, 2023

0zl commented May 6, 2023

Stax124 commented May 6, 2023

0zl commented May 6, 2023 • edited Loading

DepressiveDude commented May 9, 2023

ddPn08 commented May 9, 2023

DepressiveDude commented May 9, 2023

0zl commented May 9, 2023

DepressiveDude commented May 9, 2023

DepressiveDude commented May 11, 2023

0zl commented May 11, 2023

DepressiveDude commented May 12, 2023

Jon-Zbw commented Jun 20, 2023

Describe the bug

Reproduction

Expected behavior

System Info

Additional context

Validations

Jon-Zbw commented Jun 20, 2023

Stax124 commented Jul 8, 2023

0zl commented May 6, 2023 •

edited

Loading

0zl commented May 6, 2023 •

edited

Loading