Example inference code of TensorRT(C++)

This is example inference code of TensorRT.
I checked on the following environment.

reComputer J4012(Jetson Orin NX 16GB)
JetPack 5.1.2
TensorRT 8.5.2

And, I used onnxruntime to compare the result between ONNX Runtime and TensorRT.

onnxruntime-gpu 1.15.1
- https://elinux.org/Jetson_Zoo#ONNX_Runtime

Preparation

create ONNX model

I created model/model_bn.onnx. This model was generated using the following steps.
https://github.com/NVIDIA-AI-IOT/jetson_dla_tutorial

Build TensorRT Engine

Please build engine by TensorRT.

trtexec --verbose --profilingVerbosity=detailed --buildOnly --memPoolSize=workspace:8192MiB --onnx=model/model_bn.onnx --saveEngine=model/model_bn.onnx.engine > model_bn.onnx.engine.build.log

If you use DLA(Deep Learning Accelerator), please add --useDLACore option.

trtexec --verbose --profilingVerbosity=detailed --buildOnly --memPoolSize=workspace:8192MiB --onnx=model/model_bn.onnx --saveEngine=model/model_bn.onnx.engine --useDLACore=0 --allowGPUFallback > model_bn.onnx.engine.build.log

Inference

I created trt_infer.cpp to infer using TensorRT Engine.

include NvInfer.h

#include <NvInfer.h>

deserialize TensorRT Engine

nvinfer1::ICudaEngine* engine = runtime->deserializeCudaEngine((const void*)engine_data.get(), engine_size);

create context

nvinfer1::IExecutionContext* context = engine->createExecutionContext();

inference

context->setTensorAddress("input", d_input);
context->setTensorAddress("output", d_output);
bool status = context->enqueueV3(stream);

Build

cmake -Bbuild -DCMAKE_BUILD_TYPE=Release
cp -r model build
cd build
make

Result

ONNX Runtime(CPUExecutionProvider)

$ python3 ort_infer.py
[[-0.00628578 -0.02112402 -0.00283293  0.01181907  0.02438403  0.00028906
  -0.03561208  0.02654092  0.0145703   0.00279154]]

TensorRT(without DLA)

$ ./trt_infer
[TRT] Loaded engine size: 6 MiB
[TRT] Deserialization required 4967 microseconds.
[TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +5, now: CPU 0, GPU 5 (MiB)
[TRT] Total per-runner device persistent memory is 0
[TRT] Total per-runner host persistent memory is 19872
[TRT] Allocated activation device memory of size 131584
[TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1, now: CPU 0, GPU 6 (MiB)
-0.00628827 -0.0211218 -0.00284093 0.0118219 0.0243848 0.000285783 -0.0356103 0.0265424 0.014573 0.00278646

TensorRT(with DLA)

$ ./trt_infer
[TRT] Loaded engine size: 3 MiB
[TRT] Deserialization required 2182 microseconds.
[TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +3, GPU +0, now: CPU 3, GPU 0 (MiB)
[TRT] Total per-runner device persistent memory is 0
[TRT] Total per-runner host persistent memory is 1472
[TRT] Allocated activation device memory of size 12800
[TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 3, GPU 0 (MiB)
-0.00628281 -0.0211182 -0.00282669 0.0118103 0.0243835 0.000305653 -0.0355835 0.0265503 0.0145798 0.00279236

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
image		image
model		model
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
logger.hpp		logger.hpp
ort_infer.py		ort_infer.py
trt_infer.cpp		trt_infer.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Example inference code of TensorRT(C++)

Preparation

create ONNX model

Build TensorRT Engine

Inference

include NvInfer.h

deserialize TensorRT Engine

create context

inference

Build

Result

ONNX Runtime(CPUExecutionProvider)

TensorRT(without DLA)

TensorRT(with DLA)

Reference

About

Uh oh!

Releases

Packages

Languages

License

atinfinity/trt-infer-example-cpp

Folders and files

Latest commit

History

Repository files navigation

Example inference code of TensorRT(C++)

Preparation

create ONNX model

Build TensorRT Engine

Inference

include NvInfer.h

deserialize TensorRT Engine

create context

inference

Build

Result

ONNX Runtime(CPUExecutionProvider)

TensorRT(without DLA)

TensorRT(with DLA)

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages