Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine building failure of TensorRT 10.2.0 (pip install) when building a custom diffusion model on RTX 4090 #3983

Open
ifeherva opened this issue Jul 4, 2024 · 13 comments

Comments

@ifeherva
Copy link

ifeherva commented Jul 4, 2024

Description

Fresh install of pip install tensorrt==10.2.0

Following engine build crashes on Ubuntu 22.04.4 LTS:

from polygraphy.backend.trt import EngineFromNetwork

EngineFromNetwork(
            network,
            config=CreateConfig(fp16=fp16,
                tf32=tf32,
                int8=int8,
                refittable=enable_refit,
                profiles=[p],
                load_timing_cache=timing_cache,
                builder_optimization_level=3,
                **extra_build_args
            ),
            save_timing_cache=timing_cache
        )()

Error message:

IBuilder::buildSerializedNetwork: Error Code 6: API Usage Error (Unable to load library: libnvinfer_builder_resource_win.so.10.2.0: libnvinfer_builder_resource_win.so.10.2.0: cannot open shared object file: No such file or directory)

Build works fine on 10.1.0 and 10.0.0

Environment

TensorRT Version: 10.2.0

NVIDIA GPU: RTX 4090

NVIDIA Driver Version: 550

CUDA Version: 12.1.r12.1

CUDNN Version: 8.9.7

Operating System: Ubuntu 22.04.4 LTS

Python Version (if applicable):

PyTorch Version (if applicable): 2.3.1

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

This is the latest release.

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

Yes, the above command completes successfully, the ONNX file is correct.

@lautaropaske
Copy link

+1. Same issue: tensorrt fails due to non-existent windows library in a linux distro (libnvinfer_builder_resource_win.so.10.2.0: cannot open shared object file: No such file or directory)

@lanyuer
Copy link

lanyuer commented Jul 5, 2024

how to fix it?

@thefoxfarmer
Copy link

thefoxfarmer commented Jul 5, 2024

Almost exactly the same setup here, same problem. Using it via ComfyUI.

TensorRT Version: 10.2.0
NVIDIA GPU: RTX 4090
CUDA Version: 12.1.105
CUDNN Version: 8.9.2.26
Operating System: Ubuntu 22.04.3
Python Version (if applicable): 3.10
PyTorch Version (if applicable): 2.3.1+cu121

I did a little bit of research on this and determined that the non-Windows library (libnvinfer_builder_resource.so.10.2.0) was already opened by the process, so it's a real mystery to me why it was trying to open the Windows version. The dlopen (or whatever) is happening inside the tensorrt.so compiled code, not anything to do with the Python wrapper around it, so it's hard to debug farther.

I made a symlink from the proper DSO to the Windows filename, but that fixed nothing: The symbols that it then looks for inside are also suffixed with _win.

I asked in the discussion forum for the Comfy nodes...
comfyanonymous/ComfyUI_TensorRT#49
But clearly they have nothing to do with it.

@lanyuer
Copy link

lanyuer commented Jul 5, 2024

I also have this problem on Windows WSL2.

@thefoxfarmer
Copy link

10.1.0 is also working for me on the setup outlined above where 10.2.0 did not.

@BuffMcBigHuge
Copy link

Downgrading by running this command fixed the issue for me.

pip install tensorrt==10.1.0 tensorrt-cu12==10.1.0 tensorrt-cu12-bindings==10.1.0 tensorrt-cu12-libs==10.1.0 --force-reinstall

@online2311
Copy link

通过运行此命令降级为我解决了这个问题。

pip install tensorrt==10.1.0 tensorrt-cu12==10.1.0 tensorrt-cu12-bindings==10.1.0 tensorrt-cu12-libs==10.1.0 --force-reinstall

Solved my problem, thanks.

@glenn-jocher
Copy link

Resolved in Ultralytics package by pinning tensorrt<=10.2.0, but does not resolve underlying issue unfortunately.
ultralytics/ultralytics#14239

@RONNYKHALIL
Copy link

通过运行此命令降级为我解决了这个问题。

pip install tensorrt==10.1.0 tensorrt-cu12==10.1.0 tensorrt-cu12-bindings==10.1.0 tensorrt-cu12-libs==10.1.0 --force-reinstall

Solved my problem, thanks.

same!!! thank uuuu

@lix19937
Copy link

lix19937 commented Jul 7, 2024

libnvinfer_builder_resource_win.so.10.2.0: libnvinfer_builder_resource_win.so.10.2.0: cannot open shared object file: No such file or directory)

can you find libnvinfer_builder_resource_win.so.10.2.0 ?

The tensorrt Python wheel files only support Python versions 3.8 to 3.12 at this time and will not work with other Python versions. Only the Linux and Windows operating systems and the x86_64 CPU architecture are currently supported. These Python wheel files are expected to work on RHEL 8 or newer, Ubuntu 20.04 or newer, and Windows 10 or newer.

@ifeherva
Copy link
Author

ifeherva commented Jul 7, 2024

libnvinfer_builder_resource_win.so.10.2.0: libnvinfer_builder_resource_win.so.10.2.0: cannot open shared object file: No such file or directory)

can you find libnvinfer_builder_resource_win.so.10.2.0 ?

The tensorrt Python wheel files only support Python versions 3.8 to 3.12 at this time and will not work with other Python versions. Only the Linux and Windows operating systems and the x86_64 CPU architecture are currently supported. These Python wheel files are expected to work on RHEL 8 or newer, Ubuntu 20.04 or newer, and Windows 10 or newer.

No, those _win files dont exist on ubuntu.

@zolero
Copy link

zolero commented Jul 9, 2024

For me installing tensorrt_llm==0.12.0.dev2024070200 works!

@yorickvP
Copy link

Upgrading to tensorrt==0.2.0.post1 fixes the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests