Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃悰 [Bug] Torch Tensor RT crash when trying to compile a script module on Windows (C++) #1144

Closed
andreabonvini opened this issue Jun 23, 2022 · 31 comments
Assignees
Labels
bug Something isn't working channel: windows bugs, questions, & RFEs around Windows No Activity

Comments

@andreabonvini
Copy link

andreabonvini commented Jun 23, 2022

Bug Description

I can't compile a script module with TorchTensorRT.
This is my code:

#include <iostream>
#include <vector>
#include <ATen/Context.h>
#include <torch/torch.h>
#include <torch/script.h>
#include "torch_tensorrt/torch_tensorrt.h"

void compile(std::string model_path) {

    const torch::Device device = torch::Device(torch::kCUDA, 0);
    torch::jit::script::Module model;

    std::cout << "Trying to load the model" << std::endl;
    try {
        model = torch::jit::load(model_path, device);
        model.to(device);
        model.eval();
        std::cout << "AI model loaded successfully." << std::endl;
    }
    catch (const c10::Error& e) {
        std::cerr << e.what() << std::endl;
    }

    auto input = torch_tensorrt::Input(std::vector<int64_t>{ 1, 3, 512, 512 });
    std::cout << "Creating compile settings" << std::endl;
    auto compile_settings = torch_tensorrt::ts::CompileSpec({ input });
    // Compile module
    std::cout << "Compiling..." << std::endl;
    auto trt_mod = torch_tensorrt::ts::compile(model, compile_settings);  <-- CRASHES HERE.
    // Run like normal
    std::cout << "Create tensor" << std::endl;
    auto in = torch::randn({ 1, 3, 512, 512 }, device);
    std::cout << "Forward pass..." << std::endl;
    auto results = trt_mod.forward({ in });
    // Save module for later
    trt_mod.save("output/model/path.ts");

}

int main() {

    compile("path/to/traced_script_module.pt");

    return 0;
}

This is the error I get:

image

First a WARNING get printed "WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected",
and then, as you can see from the screenshot, I got an exception "read access violation. creator was nullptr." when running the following lines:

auto creator = getPluginRegistry()->getPluginCreator("Interpolate", "1", "torch_tensorrt");
auto interpolate_plugin = creator->createPlugin(name, &fc);

The file interpolate.cpp is located at path/to/Torch-TensorRT/core/conversion/converters/impl.
What am I doing wrong?

This is my CMakeLists.txt:

cmake_minimum_required (VERSION 3.8)

project(example-app)

find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")

add_executable(example-app create_trt_module.cpp)

target_include_directories(example-app PRIVATE "path/to/Torch-TensorRT/cpp/include")

target_link_libraries(example-app "${TORCH_LIBRARIES}")
target_link_libraries(example-app  path/to/Torch-TensorRT/out/build/x64-Release/lib/torchtrt.lib) 
target_link_libraries(example-app  path/to/Torch-TensorRT/out/build/x64-Release/lib/torchtrt_plugins.lib)

I exported the traced script module with the following code:

# Import model
# ...
model.to("cuda")
model.eval()

# Create dummy data for tracing and benchmarking purposes.
shape = (1, 3, 512, 512)
input_data = torch.randn(shape).to("cuda")

# Convert model to script module
print("Tracing PyTorch model...")
traced_script_module = torch.jit.trace(model, input_data)
torch.jit.save(traced_script_module, "traced_script_module.pt")

Environment

  • Torch-TensorRT Version: built from source from this branch (that is currently being merged) Add CMake support to build the libraries聽#1058
  • TensorRT Version: 8.4.1.5
  • CUDNN: 8.3.1
  • CPU Architecture: x86-64
  • OS : Windows 11
  • LIbtorch: 1.11.0
  • CUDA version: 11.5.2
  • GPU model: NVIDIA RTX 3080 Mobile
@andreabonvini andreabonvini added the bug Something isn't working label Jun 23, 2022
@gcuendet
Copy link
Contributor

The release notes for version 1.1.0 indicates

Torch-TensorRT 1.1.0 targets PyTorch 1.11, CUDA 11.3, cuDNN 8.2 and TensorRT 8.2. Due to recent JetPack upgrades, this release does not support Jetson (Jetpack 5.0DP or otherwise). Jetpack 5.0DP support will arrive in a mid-cycle release (Torch-TensorRT 1.1.x) along with support for TensorRT 8.4.

I cannot guarantee it is related, but I would try with TensorRT 8.2.

@andreabonvini
Copy link
Author

Thanks @gcuendet! I will try with TensorRT 8.2

@andreabonvini
Copy link
Author

Unfortunately, I'm facing the same problem even using TensorRT 8.2.5.1.

@gcuendet
Copy link
Contributor

gcuendet commented Jun 24, 2022

Now that I have a second look at it, I think it's just your CMake. There are a few problems, but typically, when you do

target_link_libraries(example-app  path/to/Torch-TensorRT/out/build/x64-Release/lib/torchtrt.lib)

you explicitly link a library (torchtrt.lib / dll) into your executable. That's a pretty old-fashioned way of using CMake.

The main idea of what is sometimes referred to as "modern" CMake is to use targets instead. A CMake target can encapsulate more informations about what and how to link to (such as list of libraries, obviously, but also headers that are exposed, dependencies to link against, specific flags to use, etc.).
I strongly suspect that in your case you are missing a whole bunch of dependencies (at least TensorRT to start with) since you are not using torch-tensorrt target and you are also not explicitly linking to tensorRT (and possibly other) libraries.

The branch that you are using generates a CMake finder for torch-tensorRT that allows you to do

find_package(torchtrt REQUIRED)

in the exact same way you are doing for libtorch.
With that, you'll get torch-tensorRT cmake targets usable in your own CMakeLists.txt and you should link your executable in the following way:

target_link_libraries(example-app  PRIVATE torchtrt)

Note that now torchtrt is not the name of a library anymore, but really a CMake target. Linking against that target will also link against torch-tensorRT dependencies (typically TensorRT). As a side note, you can actually do the same for libtorch and replace:

-target_link_libraries(example-app "${TORCH_LIBRARIES}")
+target_link_libraries(example-app torch)

If you want to see an example of that, check in the example directory on that "CMake" branch, that's exactly how these are linked to torch-tensorRT, using CMake.

The one small problem that you might encounter when doing find_package(torchtrt REQUIRED) is that CMake doesn't find that CMake finder I am mentioning above, depending on where you installed torch-tensorRT when you compiled it from sources. In that case, the error message is pretty explicit and should tell you what to do, but basically you can tell CMake where to find that finder by setting the CMAKE_MODULE_PATH cmake variable to the path of your torch-tensorRT install folder.

Again, as a conclusion, please have a look at how the torch-tensorRT examples are compiled and linked to torch-tensorRT.
Let me know if that helps.

@andreabonvini
Copy link
Author

andreabonvini commented Jun 24, 2022

Ok, I tried to change my CMakeLists.txt file in:

cmake_minimum_required (VERSION 3.8)

project(example-app)

find_package(Torch REQUIRED)
find_package(torchtrt REQUIRED)

add_executable(example-app create_trt_module.cpp)
target_link_libraries(example-app PRIVATE torch "-Wl,--no-as-needed" torchtrt_runtime "-Wl,--no.as-needed")

And it fails with:

1> [CMake] CMake Error at C:/src/vcpkg/scripts/buildsystems/vcpkg.cmake:335 (_find_package):
1> [CMake]   By not providing "Findtorchtrt.cmake" in CMAKE_MODULE_PATH this project has
1> [CMake]   asked CMake to find a package configuration file provided by "torchtrt",
1> [CMake]   but CMake did not find one.
1> [CMake] 
1> [CMake]   Could not find a package configuration file provided by "torchtrt" with any
1> [CMake]   of the following names:
1> [CMake] 
1> [CMake]     torchtrtConfig.cmake
1> [CMake]     torchtrt-config.cmake
1> [CMake] 
1> [CMake]   Add the installation prefix of "torchtrt" to CMAKE_PREFIX_PATH or set
1> [CMake]   "torchtrt_DIR" to a directory containing one of the above files.  If
1> [CMake]   "torchtrt" provides a separate development package or SDK, be sure it has
1> [CMake]   been installed.

So I tried with

cmake_minimum_required (VERSION 3.8)

project(example-app)

set(torchtrt_DIR C:/src/Torch-TensorRT/out/build/x64-Release)  # That contains torchtrtConfig.cmake
find_package(Torch REQUIRED)
find_package(torchtrt REQUIRED)

add_executable(example-app create_trt_module.cpp)
target_link_libraries(example-app PRIVATE torch "-Wl,--no-as-needed" torchtrt_runtime "-Wl,--no-as-needed")

And now it fails with

1> [CMake] CMake Error at C:/src/vcpkg/scripts/buildsystems/vcpkg.cmake:335 (_find_package):
1> [CMake]   By not providing "FindTensorRT.cmake" in CMAKE_MODULE_PATH this project has
1> [CMake]   asked CMake to find a package configuration file provided by "TensorRT",
1> [CMake]   but CMake did not find one.
1> [CMake] 
1> [CMake]   Could not find a package configuration file provided by "TensorRT" with any
1> [CMake]   of the following names:
1> [CMake] 
1> [CMake]     TensorRTConfig.cmake
1> [CMake]     tensorrt-config.cmake
1> [CMake] 
1> [CMake]   Add the installation prefix of "TensorRT" to CMAKE_PREFIX_PATH or set
1> [CMake]   "TensorRT_DIR" to a directory containing one of the above files.  If
1> [CMake]   "TensorRT" provides a separate development package or SDK, be sure it has

Since there's no TensorRTConfig.cmake file in the TensorRT installation directory I added your (@gcuendet ) Modules directory to CMAKE_MODULE_PATH

cmake_minimum_required (VERSION 3.8)

project(example-app)

set(CMAKE_MODULE_PATH C:/src/Torch-TensorRT/cmake/Modules) # here
set(torchtrt_DIR C:/src/Torch-TensorRT/out/build/x64-Release)  
find_package(Torch REQUIRED)
find_package(torchtrt REQUIRED)

add_executable(example-app create_trt_module.cpp)
target_link_libraries(example-app PRIVATE torch "-Wl,--no-as-needed" torchtrt_runtime "-Wl,--no-as-needed")

And I get

1> [CMake] CMake Error at C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.20/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
1> [CMake]   Could NOT find TensorRT (missing: TensorRT_LIBRARY TensorRT_INCLUDE_DIR)

So I checked FindTensorRT.cmake and at the beginning it said:

# Hints
# ^^^^^
# A user may set ``TensorRT_ROOT`` to an installation root to tell this module where to look.

So I added the following:

cmake_minimum_required (VERSION 3.8)

project(example-app)

set(TensorRT_ROOT C:/src/TensorRT-8.2.5.1/) # here
set(CMAKE_MODULE_PATH C:/src/Torch-TensorRT/cmake/Modules) 
set(torchtrt_DIR C:/src/Torch-TensorRT/out/build/x64-Release)  
find_package(Torch REQUIRED)
find_package(torchtrt REQUIRED)

add_executable(example-app create_trt_module.cpp)
target_link_libraries(example-app PRIVATE torch "-Wl,--no-as-needed" torchtrt_runtime "-Wl,--as-needed")

And I got another error:

1> [CMake] CMake Error at C:/src/Torch-TensorRT/out/build/x64-Release/torchtrtConfig.cmake:35 (include):
1> [CMake]   include could not find requested file:
1> [CMake] 
1> [CMake]     C:/src/Torch-TensorRT/out/build/x64-Release/torchtrtTargets.cmake

But at this point I guess I'm just doing something wrong.

@gcuendet
Copy link
Contributor

I think the main problem here is that you didn't install torchtrt. I am assuming that looking at:

1> [CMake] CMake Error at C:/src/Torch-TensorRT/out/build/x64-Release/torchtrtConfig.cmake:35 (include):

which looks like your build folder.

My assumption is that you did something like:

cmake -S. -Bbuild [and possibly set other options here]
cmake --build build

but not the install step:

cmake --build build --target install

Is that correct?

You should install it (last command above), when copying everything in the install folder, some paths can "correct" (typically the finders in cmake/Module get staged at the right place). Then this the path C:/src/Torch-TensorRT/out/build/x64-Release in

set(torchtrt_DIR C:/src/Torch-TensorRT/out/build/x64-Release)  # That contains torchtrtConfig.cmake

becomes the path to the install folder.:

list(APPEND CMAKE_PREFIX_PATH <install folder path>)

Actually, writing this answer, it reminded me of a similar recent issue. Maybe you could have a look at the answer there, it might be helpful.

@andreabonvini
Copy link
Author

andreabonvini commented Jun 25, 2022

Oh right, I added on VS the install option as a build command argument into the CMakeSettings for x64-Release

image

And it built an install directory with everything needed by cmake (almost, it still wanted me to specify the TensorRT_ROOT) to work properly.

This is my new CMakeLists.txt

cmake_minimum_required (VERSION 3.8)

project(example-app)

set(TensorRT_ROOT "C:/src/TensorRT-8.2.5.1/")
set(torchtrt_DIR "C:/src/Torch-TensorRT/out/install/x64-Release/lib/cmake/torchtrt/")

find_package(Torch REQUIRED)
find_package(torchtrt REQUIRED)

add_executable(example-app create_trt_module.cpp)

target_link_libraries(example-app PRIVATE torch "-Wl,--no-as-needed" torchtrt_runtime "-Wl,--no-as-needed" torchtrt_plugins "-Wl,--no-as-needed" torchtrt "-Wl,--no-as-needed" )
# I linked against all targets even though I don't think it's necessary

Unfortunately it still crashes in the same point, creator is still nullptr.

auto creator = getPluginRegistry()->getPluginCreator("Interpolate", "1", "torch_tensorrt");

@narendasan Is this something that ever happened or am I the first to experience this kind of behaviour? At this point I don't understand what I'm possibily doing wrong.

@andreabonvini
Copy link
Author

andreabonvini commented Jun 25, 2022

As an additional note, I was able to optimize the model on the same PC by using WSL. Then I tried to create a super trivial executable that just loaded the model

test.cpp

#include <ATen/Context.h>
#include <torch/torch.h>
#include <torch/script.h>


int main(){
    std::string model_path = "path/to/trt_script_module.pt";
    const torch::Device device = torch::Device(torch::kCUDA, 0);
    torch::jit::script::Module model;
    try {
        model = torch::jit::load(model_path, device);
        model.eval();
        std::cout << "AI model loaded successfully." << std::endl;
    }
    catch (const c10::Error& e) {
        std::cerr << e.what() << std::endl;
    }
  return 0;
}
cmake_minimum_required (VERSION 3.8)

project(example-app)

set(TensorRT_ROOT "C:/src/TensorRT-8.2.5.1/")
set(torchtrt_DIR "C:/src/Torch-TensorRT/out/install/x64-Release/lib/cmake/torchtrt/")

find_package(torchtrt REQUIRED)
find_package(Torch REQUIRED)

add_executable(example-app test.cpp)
target_link_libraries(example-app PRIVATE torch "-Wl,--no-as-needed" torchtrt_runtime "-Wl,--no-as-needed") 

It compiles, and when i try to run the executable it asks me to copy the torch .dll files in the build directory (a known problem when using torch in Windows) but it doesn't ask me to copytorchtrt_runtime.dll, so I'm not even sure if it's properly linking the torchtrt_runtime target!

It crashes when it tries to load the model.

@andreabonvini
Copy link
Author

andreabonvini commented Jun 27, 2022

Update: by catching the exception as a torch::jit::ErrorReport

try {
        model = torch::jit::load(model_path, device);
        model.eval();
        std::cout << "AI model loaded successfully." << std::endl;
    }
    catch (const torch::jit::ErrorReport& e) {
        std::cerr << e.what() << std::endl;
    }

I was able to get the following message:

Unknown type name '__torch__.torch.classes.tensorrt.Engine':
Serialized   File "code/__torch__/segmentation_models_pytorch/decoders/deeplabv3/model.py", line 4
  __parameters__ = []
  __buffers__ = []
  __torch___segmentation_models_pytorch_decoders_deeplabv3_model_DeepLabV3Plus_trt_engine_ : __torch__.torch.classes.tensorrt.Engine
                                                                                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  def forward(self_1: __torch__.segmentation_models_pytorch.decoders.deeplabv3.model.DeepLabV3Plus_trt,
    input_0: Tensor) -> Tensor:

So I guess, based on #642 that maybe VS isn't linking the torchtrt_runtime at all, that's probably due to the fact that the "-Wl,--no-as-needed" flag isn't working as expected.

This is the full output from VS (compilation + linking)

[1/2] C:\PROGRA~2\MICROS~2\2019\COMMUN~1\VC\Tools\MSVC\1429~1.301\bin\Hostx64\x64\cl.exe   /TP -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -IC:\src\libtorch\include -IC:\src\libtorch\include\torch\csrc\api\include -I"C:\Program Files\NVIDIA Corporation\NvToolsExt\\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\include" -I"C:\Program Files\NVIDIA Corporation\NvToolsExt\include" -I"C:\Program Files\NVIDIA\CUDNN\v8.4\include" -IC:\src\TensorRT-8.2.5.1\include /DWIN32 /D_WINDOWS /W3 /GR /EHsc  /MD /Zi /O2 /Ob1 /DNDEBUG /Z7 /EHsc /DNOMINMAX /wd4267 /wd4251 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 /bigobj /showIncludes /FoTorchTensorRT-Optimization-DEMO\CMakeFiles\example-app.dir\test.cpp.obj /FdTorchTensorRT-Optimization-DEMO\CMakeFiles\example-app.dir\ /FS -c ..\..\..\TorchTensorRT-Optimization-DEMO\test.cpp
  Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30145 for x64
  Copyright (C) Microsoft Corporation.  All rights reserved.
  
...
...
  [2/2] cmd.exe /C "cd . && "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin\cmake.exe" -E vs_link_exe --intdir=TorchTensorRT-Optimization-DEMO\CMakeFiles\example-app.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100190~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2019\COMMUN~1\VC\Tools\MSVC\1429~1.301\bin\Hostx64\x64\link.exe  TorchTensorRT-Optimization-DEMO\CMakeFiles\example-app.dir\test.cpp.obj  /out:TorchTensorRT-Optimization-DEMO\example-app.exe /implib:TorchTensorRT-Optimization-DEMO\example-app.lib /pdb:TorchTensorRT-Optimization-DEMO\example-app.pdb /version:0.0 /machine:x64 /debug /INCREMENTAL /subsystem:console  C:\src\Torch-TensorRT\out\install\x64-Release\lib\torchtrt_runtime.lib  C:\src\libtorch\lib\torch.lib  C:\src\libtorch\lib\torch_cuda.lib  C:\src\libtorch\lib\torch_cuda_cu.lib  C:\src\libtorch\lib\torch_cuda_cpp.lib  C:\src\libtorch\lib\torch_cpu.lib  -INCLUDE:?warp_size@cuda@at@@YAHXZ  C:\src\libtorch\lib\c10_cuda.lib  C:\src\libtorch\lib\c10.lib  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\lib\x64\cudart_static.lib"  "C:\Program Files\NVIDIA Corporation\NvToolsExt\lib\x64\nvToolsExt64_1.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\lib\x64\cufft.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\lib\x64\curand.lib"  "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.5\lib\x64\cublas.lib"  "C:\Program Files\NVIDIA\CUDNN\v8.4\lib\cudnn.lib"  -INCLUDE:?_torch_cuda_cu_linker_symbol_op_cuda@native@at@@YA?AVTensor@2@AEBV32@@Z  C:\src\TensorRT-8.2.5.1\lib\nvinfer.lib  C:\src\TensorRT-8.2.5.1\lib\nvinfer_plugin.lib  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cmd.exe /C "cd /D C:\Users\a.bonvini\source\repos\TorchTensorRT-Optimization-DEMO\out\build\x64-Release\TorchTensorRT-Optimization-DEMO && powershell -noprofile -executionpolicy Bypass -file C:/src/vcpkg/scripts/buildsystems/msbuild/applocal.ps1 -targetBinary C:/Users/a.bonvini/source/repos/TorchTensorRT-Optimization-DEMO/out/build/x64-Release/TorchTensorRT-Optimization-DEMO/example-app.exe -installedDir C:/src/vcpkg/installed/x64-windows/bin -OutVariable out""

Rebuild All succeeded.

And if you check the linking passage ([2/2]) it seems that everything's fine since "C:\src\Torch-TensorRT\out\install\x64-Release\lib\torchtrt_runtime.lib" appears. But still, it doesn't ask me to copy the .dll when I run the program.

@gcuendet
Copy link
Contributor

That's a very good point! If my understanding is correct, on Linux, torch-tensorRT relies on static initialisation to register the Engine class with torch (see here). That works if you force linking to that library, even when no symbol are used (that's always the case for torchtrt_runtime), so that it's automatically loaded by the consumer executable/library.
Not sure what's the equivalent on windows (or even if there is an equivalent, or if you have to "manually" dlopen the library).

Anyway, a possible (ugly) workaround is to link to torchtrt (and not torchtrt_runtime) and use at least one symbol from that lib (just instentiate a CompileSpec or an Input).

Not sure what the proper design will be, but I guess this will need to be addressed at some point, if the windows support becomes a bit more official, right @narendasan ?

@narendasan narendasan added the channel: windows bugs, questions, & RFEs around Windows label Jun 27, 2022
@andreabonvini
Copy link
Author

andreabonvini commented Jun 27, 2022

Update:

By manually loading the .dlls the script does work! This of course isn't a proper solution and should be properly adressed in #1058 (maybe it would be beneficial to change the CMake example too @gcuendet)

If we want to optimize a model through TorchTensoRT on Windows make sure to manually load torchtrt_plugins.dll:
(you should probably manually copy the torchtrt_plugins.dll to your executable's folder)

#include <windows.h>
# ...
int main(){
    HMODULE hLib = LoadLibrary(TEXT("torchtrt_plugins"));
    if (hLib == NULL) {
        std::cerr << "Library torchtrt_plugins.dll not found" << std::endl;
        exit(1);
      }
    #  ...
    return 0;
}

If we want to use an optimized model on Windows make sure to manually load torchtrt_runtime.dll:
(you should probably manually copy the torchtrt_runtime.dll to your executable's folder)

#include <windows.h>
# ...
int main(){
    HMODULE hLib = LoadLibrary(TEXT("torchtrt_runtime"));
    if (hLib == NULL) {
        std::cerr << "Library torchtrt_runtime.dll not found" << std::endl;
        exit(1);
      }
    #  ...
    return 0;
}

@noman-anjum-retro
Copy link

Hello @andreabonvini I setup the code and fixed all compile time errors, but now I'm getting weird linking errors for each access of torch_tensorrt:: in my code. Can you please explain the steps that made the torch_tensorrt work on windows? I'll be exremely grateful to you

@andreabonvini
Copy link
Author

Hi @noman-anjum-retro, can you share your CMakeLists.txt, your code, and the linking errors you're receiving?

@gcuendet
Copy link
Contributor

Great that you had it working @andreabonvini ! 馃槂

In my opinion, that issue is independent from the CMake support in itself. What I mean by that is:

  1. The solution is probably not going to be at the CMake level
  2. Something also needs to be done when compiling with Bazel (given that compiling with Bazel works on windows)

Still, that problem puzzled me and I might have a proposal.
I was wondering how that mechanism works in pytorch/vision (aka TorchVision) since there is at least one situation that works similarly to torchtrt_plugins: the library provides additional operator to pytorch that needs to be registered and potentially nothing else (no symbols to be used by the consumer of the library).
From my understanding they rely on "linker" pragmas:

VISION_API int64_t cuda_version();

 namespace detail {
 extern "C" VISION_INLINE_VARIABLE auto _register_ops = &cuda_version;
 #ifdef HINT_MSVC_LINKER_INCLUDE_SYMBOL
 #pragma comment(linker, "/include:_register_ops")
 #endif

 } // namespace detail

So basically, by just including that header (as documented in the README), the library is linked since one symbol is required.

So that could easily be done in torch-tensorRT as well. Moreover that would be consistent with torchvision. What do you think @narendasan ?

@noman-anjum-retro
Copy link

noman-anjum-retro commented Jun 28, 2022

Hello @andreabonvini. I started with the basic cmake:

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(custom_ops)

find_package(Torch REQUIRED)

add_executable(example-app example-app.cpp)
target_link_libraries(example-app "${TORCH_LIBRARIES}")
set_property(TARGET example-app PROPERTY CXX_STANDARD 14)

I run CMAKE with command cmake -G "Visual Studio 16 2019" -DCMAKE_PREFIX_PATH=D:\Codes\libtorch .
Then I tried to read the module script via following code:

    torch::jit::Module module;
      torch::Device device(torch::cuda::is_available() ? torch::kCUDA : torch::kCPU);
      std::cout << torch::cuda::is_available() << std::endl;
      // Deserialize the ScriptModule from a file using torch::jit::load().
      module = torch::jit::load("\\scriptmodule.pt", torch::kCUDA);
      std::cout << "Loading Complete";
      std::vector<int64_t> input_shape =  {1,3,8,290,290} ;
      torch_tensorrt::core::ir::Input inputs = { { 1,3,8,290,290 } };
std::vector < torch_tensorrt::core::ir::Input> innn = { inputs };
      auto compile_spec = torch_tensorrt::core::CompileSpec(innn);
      torch_tensorrt::core::CompileGraph(module, compile_spec);

Initially code was showing compilation error as I copied it from documentation. But now compilation errors are fixed. When I run the code each reference of torch_tensorrt throughs linking error, I'm sharing one below:

Severity Code Description Project File Line Suppression State
Error LNK2019 unresolved external symbol "public: __cdecl torch_tensorrt::core::ir::Input::Input(class std::vector<__int64,class std::allocator<__int64> >,enum nvinfer1::DataType,enum nvinfer1::TensorFormat,bool)" (??0Input@ir@core@torch_tensorrt@@qeaa@V?$vector@_JV?$allocator@_J@std@@@std@@W4DataType@nvinfer1@@W4TensorFormat@7@_N@Z) referenced in function "void __cdecl compile(class std::basic_string<char,struct std::char_traits,class std::allocator >)" (?compile@@yaxv?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@@z) example-app D:\Codes\C++\TorchCmake\build\example-app.obj 1

@andreabonvini
Copy link
Author

You aren't linking to torch_tensorrt. Supposing that you correctly compiled and installed torch_tensorrt (as explained above) you should have something like this:

target_link_libraries(example-app PRIVATE torch torchtrt) 

@noman-anjum-retro
Copy link

Thanks I'll take a look to it

@noman-anjum-retro
Copy link

How did you build torch_tensorrt on windows can you please explain it. When I'm running bazel build for default workspace file it throws an error

bazel build //:libtorchtrt --compilation_mode opt --distdir D:\BazelDep
BazelDep folder contains zip files of cudnn8.2 and tensorRT 8.4
Error in download_and_extract: java.io.IOException: Error downloading [https://developer.nvidia.com/compute/machine-learning/tensorrt/secure/8.2.4/tars/tensorrt-8.2.4.2.linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz] to C:/users/nomananjum/_bazel_nomananjum/ebmkavax/external/tensorrt/temp17912101049819068034/tensorrt-8.2.4.2.linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz: GET returned 403 Forbidden

@andreabonvini
Copy link
Author

andreabonvini commented Jun 28, 2022

You have to build and install the library through CMake, you can do that only from this branch.

@noman-anjum-retro
Copy link

Alright, that makes sense now. Thanks Alot

@noman-anjum-retro
Copy link

noman-anjum-retro commented Jun 28, 2022

Hey @andreabonvini I tried compiling with above mentioned steps. Code runs well with this script:
`int main(){
std::string model_path = "path/to/trt_script_module.pt";

const torch::Device device = torch::Device(torch::kCUDA, 0);

torch::jit::script::Module model;

try {

    model = torch::jit::load(model_path, device);

    model.eval();

    std::cout << "AI model loaded successfully." << std::endl;

}

catch (const c10::Error& e) {

    std::cerr << e.what() << std::endl;

}

return 0;

}`

However when I try to compile script_module it throws linking error:
Code:
`
model = torch::jit::load(model_path, device);

    std::vector<int64_t> input_shape = { 1,3,8,290,290 };

    std::vector < torch_tensorrt::Input> innn = { input_shape };

    auto compile_spec = torch_tensorrt::torchscript::CompileSpec(innn);

    torch_tensorrt::torchscript::compile(model, compile_spec);`

Link Error:
image

Any Idea What's Wrong??

@andreabonvini
Copy link
Author

It still seems that your are not linking against torch_tensorrt. You should provide more information about your environment and, above all, your CMakeLists.txt file.

@noman-anjum-retro
Copy link

noman-anjum-retro commented Jun 29, 2022

Environment
Windows 11
GPU 3080 mobile
Cuda 11.4
Cudnn 8.2

So the steps are as follow: I copied this PR into my system and compiled it with command
cmake --build . --target install.
This created a Torch_TensorRT folder in Program Files x86. Secondly I created a torch project using following CmakeList.txt
`
cmake_minimum_required (VERSION 3.8)

project(example-app)

set(TensorRT_ROOT "C:\Program Files\TensorRT\TensorRT-8.4.1.5\")

set(torchtrt_DIR "C:\Program Files (x86)\Torch-TensorRT\lib\cmake\torchtrt")

find_package(torchtrt REQUIRED)

find_package(Torch REQUIRED)

add_executable(example-app create_trt_module.cpp)

target_link_libraries(example-app PRIVATE torch "${TORCH_LIBRARIES}" "-Wl,--no-as-needed" torchtrt_runtime "-Wl,--no-as-needed")
I then Used command cmake -G "Visual Studio 16 2019" -DCMAKE_PREFIX_PATH=D:\Codes\libtorch .`
which created a visual studio project in the folder.

Additional Step:
When it didn't work I explicitly added include and lib path of torchlib to the project using this

@andreabonvini
Copy link
Author

andreabonvini commented Jun 29, 2022

@noman-anjum-retro, as already said, you are not linking againt torch_tensorrt. You are linking against torchtrt_runtime instead, that is a light-weight library that is needed if you just want to load an optimized model through torch::jit::load.

Moreover, maybe using TensorRT 8.4.1.5 maybe it's not the best idea cause of this.

@noman-anjum-retro
Copy link

Thank you soo much @andreabonvini. It is working now

@andreabonvini
Copy link
Author

andreabonvini commented Jul 11, 2022

Update: To correctly load torchtrt_runtime.dll on Windows in a VS project it is necessary to copy the following dlls to your executable's directory (other than torchtrt_runtime.dll, of course)

nvinfer_plugin.dll  // From TensorRT
torch_cuda_cu.dll
torch_cuda_cpp.dll
torch_cpu.dll
c10_cuda.dll
c10.dll
nvinfer.dll   // From TensorRT

(This is the output from running dumpbin /dependents torchtrt_runtime.dll)

@narendasan
Copy link
Collaborator

narendasan commented Aug 11, 2022

Great that you had it working @andreabonvini ! 馃槂

In my opinion, that issue is independent from the CMake support in itself. What I mean by that is:

1. The solution is probably not going to be at the CMake level

2. Something also needs to be done when compiling with Bazel (given that compiling with Bazel works on windows)

Still, that problem puzzled me and I might have a proposal. I was wondering how that mechanism works in pytorch/vision (aka TorchVision) since there is at least one situation that works similarly to torchtrt_plugins: the library provides additional operator to pytorch that needs to be registered and potentially nothing else (no symbols to be used by the consumer of the library). From my understanding they rely on "linker" pragmas:

VISION_API int64_t cuda_version();

 namespace detail {
 extern "C" VISION_INLINE_VARIABLE auto _register_ops = &cuda_version;
 #ifdef HINT_MSVC_LINKER_INCLUDE_SYMBOL
 #pragma comment(linker, "/include:_register_ops")
 #endif

 } // namespace detail

So basically, by just including that header (as documented in the README), the library is linked since one symbol is required.

So that could easily be done in torch-tensorRT as well. Moreover that would be consistent with torchvision. What do you think @narendasan ?

@gcuendet At first glance seems reasonable. We could make a torch_tensorrt/runtime.h or something with this in it

@noman-anjum-retro
Copy link

Hello, it is working fine with debug mode in visual studio, however when I switched to Release mode,(I also changed libtorch to release) it is unable to load TrTCompiledEngine and throwing following error
model = torch::jit::load(model_path, device);
Unknown type name '__torch__.torch.classes.tensorrt.Engine': File "code/__torch__/movinets/models.py", line 4 __parameters__ = [] __buffers__ = [] __torch___movinets_models_MoViNet_trt_engine_ : __torch__.torch.classes.tensorrt.Engine ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE def forward(self_1: __torch__.movinets.models.MoViNet_trt, input_0: Tensor) -> Tensor:

Any Idea about it?. I need to switch to release mode because openv debug mode is very slow, and it's ruining the gain obtained via trt

@narendasan
Copy link
Collaborator

@noman-anjum-retro Seems like the runtime is not properly registering. Are you running this in C++? Can you try latest master?

@noman-anjum-retro
Copy link

Yeah, It's working now with the latest master. Thanks

@github-actions
Copy link

github-actions bot commented Dec 7, 2022

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working channel: windows bugs, questions, & RFEs around Windows No Activity
Projects
None yet
Development

No branches or pull requests

5 participants