Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT 8.4.2 #2218

Open
sc199505 opened this issue Aug 8, 2022 · 24 comments
Open

TensorRT 8.4.2 #2218

sc199505 opened this issue Aug 8, 2022 · 24 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@sc199505
Copy link

sc199505 commented Aug 8, 2022

Description

TensorRT 8.4.0.6 is no such problem, but TensorRT 8.4.0.6 has
6: [libLoader.h::DynamicLibrary::50] Error Code 6: Internal Error (Unable to load library: libnvinfer_builder_resource.so.8.4.2)

Environment

**TensorRT Version : TensorRT-8.4.2.4
NVIDIA GPU : RTX-5000
NVIDIA Driver Version: 460.73.01
CUDA Version: 11.1
CUDNN Version: 8.2.1
Operating System: 18.04.1-Ubuntu

@zerollzeng
Copy link
Collaborator

can you check your LD_LIBRARY_PATH? this file should be under /usr/lib/x86_64-linux-gnu/ or somewhere.

@zerollzeng zerollzeng self-assigned this Aug 8, 2022
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Aug 8, 2022
@sc199505
Copy link
Author

sc199505 commented Aug 8, 2022

  • this file cab be found under /usr/local/TensorRT-8.4.2.4/lib

@zerollzeng
Copy link
Collaborator

export LD_LIBRARY_PATH=/usr/local/TensorRT-8.4.2.4/lib:$LD_LIBRARY_PATH and try again?

@sc199505
Copy link
Author

sc199505 commented Aug 8, 2022

so can be found, but can not load

@zerollzeng
Copy link
Collaborator

I would guess you have more than 1 tensorrt version installed on your system. Can you try export LD_LIBRARY_PATH=/usr/local/TensorRT-8.4.2.4/lib and try again?

@sc199505
Copy link
Author

sc199505 commented Aug 8, 2022

ok, i try it

@WelY1
Copy link

WelY1 commented Aug 20, 2022

I would guess you have more than 1 tensorrt version installed on your system. Can you try export LD_LIBRARY_PATH=/usr/local/TensorRT-8.4.2.4/lib and try again?

Hi, I downloaded tensorrt by python3 -m pip install --upgrade nvidia-tensorrt, where can i find my ### LD_LIBRARY_PATH? I can't find it in ### /usr/lib/x86_64-linux-gnu. Thanks!

@jakepoz
Copy link

jakepoz commented Aug 26, 2022

Hey, I ran into the same error today, the issue is that exporting your LD_LIBRARY_PATH can work, but on Ubuntu, you should really make a file such as /etc/ld.so.conf.d/tensorrt.conf with that path of your installed location. However, due to some bug, that doesn't work. There is likely an error somewhere in how TensorRT is searching for its dependent shared objects.

@Anas-liu
Copy link

Anas-liu commented Aug 27, 2022

Hi, I get the same problem, but just remove sudo(./yolov5 -s yolov5s.wts yolov5s.engine s), that works! why?

@dev0x13
Copy link

dev0x13 commented Sep 26, 2022

For the concerned ones: apparently libnvinfer uses dlopen call to load libnvinfer_builder_resource library. However, libnvinfer library does not have its rpath attribute set, so dlopen only looks for library in system folders even though libnvinfer_builder_resource is located next to the libnvinfer in the same folder. In order to make things work without setting LD_LIBRARY_PATH, one can properly set libnvinfer's rpath to $ORIGIN.
@zerollzeng Perhaps this should be done by default in TRT distribution?

@zerollzeng
Copy link
Collaborator

@kevinch-nv for viz

@mkaivs
Copy link

mkaivs commented Nov 3, 2022

Is there any update on this? I have the same error and was forced to use LD_LIBRARY_PATH instead of using ldconfig. Setting LD_LIBRARY_PATH is fine in development but is considered bad practice in production.

@Bidski
Copy link

Bidski commented Nov 27, 2022

I am also experiencing this issue with version 8.5.1

[E] [TRT] 6: [libLoader.h::DynamicLibrary::54] Error Code 6: Internal Error (Unable to load library: libnvinfer_builder_resource.so.8.5.1)

All of the TensorRT libraries are installed in /usr/local/lib and I have config file for ldconfig setup to look in /usr/local/lib. However, TensorRT will only work if I set LD_LIBRARY_PATH=/usr/local/lib which, as @mkaivs points out, is really poor practice for a production environment.

@Bidski
Copy link

Bidski commented Nov 28, 2022

Setting the RPath of libnvinfer.so.8.5.1 to the install location of libnvinfer.so.8.5.1 seems to be a good workaround. For example, if libnvinfer.so.8.5.1 is located in /usr/local/lib then

patchelf --set-rpath "/usr/local/lib" "/usr/local/lib/libnvinfer.so.8.5.1"

@ShuaiShao93
Copy link

ShuaiShao93 commented Dec 13, 2022

For some reason, setting LD_LIBRARY_PATH doesn't work for me.

I also had a minimal repro to confirm that "libnvinfer_builder_resource.so.8.5.1" can't be opened with dlopen.

  std::string r = PATH_TO_TENSORRT
  setenv("LD_LIBRARY_PATH", r.c_str(), 1);

  std::string filename = "libnvinfer.so.8";
  CHECK(std::filesystem::exists(r+"/"+filename));
  void * handle = dlopen(filename.c_str(), RTLD_NOW | RTLD_LOCAL);
  # Pass
  CHECK(handle);

  filename = "libnvinfer_builder_resource.so.8.5.1";
  CHECK(std::filesystem::exists(r+"/"+filename));
  handle = dlopen(filename.c_str(), RTLD_NOW | RTLD_LOCAL);
  # Failed
  CHECK(handle) << dlerror();

This failed at the last line

F20221212 16:06:45.507669 2931454 test.cc:24] Check failed: handle libnvinfer_builder_resource.so.8.5.1: cannot open shared object file: No such file or directory

ls -l shows the files are there

lrwxrwxrwx 1 shshao shshao         21 Dec 12 15:20 libnvcaffe_parser.so -> libnvparsers.so.8.5.1
lrwxrwxrwx 1 shshao shshao         21 Dec 12 15:20 libnvcaffe_parser.so.8 -> libnvparsers.so.8.5.1
lrwxrwxrwx 1 shshao shshao         21 Dec 12 15:20 libnvcaffe_parser.so.8.5.1 -> libnvparsers.so.8.5.1
-rwxr-xr-x 1 shshao shshao  373747000 Oct 27 15:38 libnvinfer_builder_resource.so.8.5.1
lrwxrwxrwx 1 shshao shshao         26 Dec 12 15:20 libnvinfer_plugin.so -> libnvinfer_plugin.so.8.5.1
lrwxrwxrwx 1 shshao shshao         26 Dec 12 15:20 libnvinfer_plugin.so.8 -> libnvinfer_plugin.so.8.5.1
-rwxr-xr-x 1 shshao shshao   43399840 Oct 27 15:38 libnvinfer_plugin.so.8.5.1
lrwxrwxrwx 1 shshao shshao         19 Dec 12 15:20 libnvinfer.so -> libnvinfer.so.8.5.1
lrwxrwxrwx 1 shshao shshao         19 Dec 12 15:20 libnvinfer.so.8 -> libnvinfer.so.8.5.1
-rwxr-xr-x 1 shshao shshao  487512744 Oct 27 15:37 libnvinfer.so.8.5.1
lrwxrwxrwx 1 shshao shshao         20 Dec 12 15:20 libnvonnxparser.so -> libnvonnxparser.so.8
lrwxrwxrwx 1 shshao shshao         24 Dec 12 15:20 libnvonnxparser.so.8 -> libnvonnxparser.so.8.5.1
-rwxr-xr-x 1 shshao shshao    2838832 Oct 27 15:35 libnvonnxparser.so.8.5.1
lrwxrwxrwx 1 shshao shshao         21 Dec 12 15:20 libnvparsers.so -> libnvparsers.so.8.5.1
lrwxrwxrwx 1 shshao shshao         21 Dec 12 15:20 libnvparsers.so.8 -> libnvparsers.so.8.5.1
-rwxr-xr-x 1 shshao shshao    3424720 Oct 27 15:38 libnvparsers.so.8.5.1

@gcp
Copy link

gcp commented Mar 11, 2023

In order to make things work without setting LD_LIBRARY_PATH, one can properly set libnvinfer's rpath to $ORIGIN.

@dev0x13 and @Bidski, thanks for this suggestion. I was afraid it would not work because we can't patch in the origin processing flag (-Wl,-z,origin) but in practice this fixes the problem nevertheless.

I agree this is a serious regression for deploying TensorRT stuff in production, and it also affects the TensorRT 8.5 releases.

@hch-baobei
Copy link

hch-baobei commented Mar 31, 2023

I had the same problem:
Error Code 6: Internal Error (Unable to load library: libnvinfer_builder_resource.so.8.4.3)

But after I copied this file to /usr/lib/x86_64-linux-gnu, the problem was solved. I don't know why, please ask for an explanation.I searched the container for this file and found that it was only available in /usr/local/src/TensorRT-8.4.3.1/targets/x86_64-linux-gnu and here, so there should be no environment conflict issues, which makes me even weirder.

@gcp
Copy link

gcp commented Mar 31, 2023

Throwing libraries in systems dirs needs root permissions for one, so this doesn't really "solve" anything.

@hzwhl
Copy link

hzwhl commented Aug 30, 2023

I also encountered this problem. Just turn off pycharm and then rerun program

@Arunass
Copy link

Arunass commented Sep 11, 2023

This is still a problem in 8.5.3.

The release notes indicate that the runpath is no longer used as of TensortRT 8.4.1 (https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html#rel-8-4-0-EA),

The TensorRT shared library files no longer have RUNPATH set to $ORIGIN. This setting was causing unintended behavior for some users. If you relied on this setting before you may have trouble with missing library dependencies when loading TensorRT. It is preferred that you manage your own library search path using LD_LIBRARY_PATH or a similar method.

So setting the LD_LIBRARY_PATH, adding to /etc/ld.so.conf, or adding a suitable file in /etc/ld.so.conf.d are apparently the intended approaches. We have added a path to ld.so.conf.d that since we don't install CUDNN or Libnvinfer in the 'default' location.
Our application now runs fine until it tries to dynamically load the builder resource library. Adding a link to the installed library from /usr/lib/x86_64-linux-gnu 'fixes' it.

I continue to troubleshoot, despite our application working because the library can now be found.
Even after running ldconfig to rebuild the cache, the library is not in the cache:

root§HSM1:# ldconfig -p | grep build
	do_not_link_against_nvinfer_builder_resource (libc6,x86-64) => /opt/PrivateLibs/lib/do_not_link_against_nvinfer_builder_resource
	do_not_link_against_nvinfer_builder_resource (libc6,x86-64) => /lib/x86_64-linux-gnu/do_not_link_against_nvinfer_builder_resource

where funnily do_not_link_against_nvinfer_builder_reseource is a link to libnvinfer_builder_resource.so.8.5.3, which is a link to /opt/PrivateLibs/lib/libnvinfer_builder_resource.so.8.5.3

I suspect now that there's something special with the builder_resource library - especially since there's this weird do not link against link floating around.
Indeed, this explains where these strange links come from:

root§HSM1:/usr/lib/x86_64-linux-gnu# readelf -a libnvinfer_builder_resource.so.8.5.3 | grep builder
 0x000000000000000e (SONAME)             Library soname: [do_not_link_against_nvinfer_builder_resource]
  000000: Rev: 1  Flags: BASE  Index: 1  Cnt: 1  Name: do_not_link_against_nvinfer_builder_resource

I'll go out on a limb and suggest that this is what is messing things up. ldconfig is making these links and adding them to its cache, but doesn't realize that this is not the name of the library:

root§HSM1:/usr/lib/x86_64-linux-gnu# rm do_not_link_against_nvinfer_builder_resource 
root§HSM1:/usr/lib/x86_64-linux-gnu# ldconfig
root§HSM1:/usr/lib/x86_64-linux-gnu# ls *build*
do_not_link_against_nvinfer_builder_resource  libnvinfer_builder_resource.so.8.5.3

so this odd hack breaks the normal search methods of dlopen(), and the library cannot be opened unless it's in one of the default lib paths

@gcp
Copy link

gcp commented Sep 11, 2023

So setting the LD_LIBRARY_PATH, adding to /etc/ld.so.conf, or adding a suitable file in /etc/ld.so.conf.d are apparently the intended approaches.

Thanks for finding the release note. Ugh, so NVIDIA broke this intentionally and you're supposed to use a wrapper or bootstrap executable before launching the real TensorRT application.

Again, adding to /etc is not a reasonable suggestion because it requires root permissions.

I managed to get this working by hacking the libraries to fix back the $ORIGIN/rpath, but seriously the suggested solutions are not reasonable for deployment and this is a major regression.

@chenxinfeng4
Copy link

Setting the RPath of libnvinfer.so.8.5.1 to the install location of libnvinfer.so.8.5.1 seems to be a good workaround. For example, if libnvinfer.so.8.5.1 is located in /usr/local/lib then

patchelf --set-rpath "/usr/local/lib" "/usr/local/lib/libnvinfer.so.8.5.1"

Really works for me. Thanks.

@sherlockchou86
Copy link

Hi, I get the same problem, but just remove sudo(./yolov5 -s yolov5s.wts yolov5s.engine s), that works! why?

works for me, just remove sudo ahead of command.

@cx-333
Copy link

cx-333 commented Jul 31, 2024

Hi, I get the same problem, but just remove sudo(./yolov5 -s yolov5s.wts yolov5s.engine s), that works! why?

It works ! Good!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests