[TensorRT EP] Load precompiled TRT engine file directly #18217

chilo-ms · 2023-11-01T17:22:00Z

When the TRT engine cache (precompiled engine) is present, it doesn't make sense to go over the processes of model verification, model optimization, TRT EP's GetCapability(), TRT EP's model proto reconstruction, calling TRT parser and engine compilation.
This PR makes TRT EP skip those processes and directly load the engine to perform inference.

The feature request: #18072

Features:

Replace original model with TRT engine wrapped ONNX model. It can save a lot of time as mentioned above.
How to get TRT engine wrapped ONNX model?
1. Set trt_dump_ep_context_model provider option to "true" and run the inference. You will find the "xxx_wrapper.onnx" at the engine cache path. (The same logic of generating engine cache)
2. Use gen_trt_engine_wrapper_onnx_model.py
Three provider options are added,
trt_dump_ep_context_model: Enable dump wrapped onnx model by TRT EP
trt_ep_context_embed_mode: Add embed_mode as attribute. 0 means engine cache path, 1 means engine binary data.
trt_ep_context_compute_capability_enable: Add hardware_arch as attribute. When running the model, TRT EP will check consistency between model's hardware_arch and GPU's compute capability.
trt_ep_context_file_path: Please see this PR.
When the engine cache path is given in the wrapped model, TRT EP will first search for the engine file using the path (relative to model path), if it can't find it, it will change to use the path as it is (depends on user, could be relative to working dir or absolute path)

Note:

This PR includes the change of [TensorRT EP] Switch to enqueueV3 with support DDS output #17751

Constraints:

The whole model should be fully supported by TRT.
Users need to make sure the engine is built with min/max/opt optimization profiles that large enough to cover the range of all inputs. TRT EP will simply fail and won't rebuild the engine if the input shape is out of range during runtime.

onnxruntime/test/testdata/trt_engine_wrapper_test.py

onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.h

github-advanced-security

lintrunner found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

gedoensmax · 2024-01-08T10:03:30Z

@jywu-msft Since enqueuev3 is now what are the plans for this PR ? Should I work on a rebase or do we let it rest until Chi returns ?

jywu-msft · 2024-01-08T16:15:03Z

@jywu-msft Since enqueuev3 is now what are the plans for this PR ? Should I work on a rebase or do we let it rest until Chi returns ?

plan is to merge this by end of week. @chilo-ms will work on it.

…FromPrecompiledEngine in the file for better solving merge conflict

include/onnxruntime/core/providers/tensorrt/tensorrt_provider_options.h

chilo-ms · 2024-01-09T22:50:35Z

@jywu-msft @gedoensmax

Github isn't smart enough to make this PR smoothly merge main. So, i have to resolve a lot of conflicts ...

The key change regarding the code path inside TensorrtExecutionProvider::Compile()is I added two functions: CreateNodeComputeInfoFromPrecompiledEngine() and CreateNodeComputeInfoFromGraph()
I moved almost all the code that originally was in Compile() into CreateNodeComputeInfoFromGraph().
For CreateNodeComputeInfoFromPrecompiledEngine(), it's the streamlined function that only keeps essential setting for directly loading the engine.
Hope this explanation help review.

gedoensmax

The PR looks good to me, the few comments I left could be considered as nitpicks to be honest.

onnxruntime/python/tools/tensorrt/gen_trt_engine_wrapper_onnx_model.py

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.cc

onnxruntime/core/graph/contrib_ops/contrib_defs.cc

…text helper function

jywu-msft · 2024-01-12T04:28:05Z

The PR looks good to me, the few comments I left could be considered as nitpicks to be honest.

thanks. these suggestions will be added in a follow-up PR. we will merge this one asap as soon as CI's pass.

When the TRT engine cache (precompiled engine) is present, it doesn't make sense to go over the processes of model verification, model optimization, TRT EP's GetCapability(), TRT EP's model proto reconstruction, calling TRT parser and engine compilation. This PR makes TRT EP skip those processes and directly load the engine to perform inference. The feature request: #18072 Features: - Replace original model with TRT engine wrapped ONNX model. It can save a lot of time as mentioned above. - How to get TRT engine wrapped ONNX model? 1. Set `trt_dump_ep_context_model` provider option to "true" and run the inference. You will find the "xxx_wrapper.onnx" at the engine cache path. (The same logic of generating engine cache) 2. Use gen_trt_engine_wrapper_onnx_model.py - Three provider options are added, `trt_dump_ep_context_model`: Enable dump wrapped onnx model by TRT EP `trt_ep_context_embed_mode`: Add embed_mode as attribute. 0 means engine cache path, 1 means engine binary data. `trt_ep_context_compute_capability_enable`: Add hardware_arch as attribute. When running the model, TRT EP will check consistency between model's hardware_arch and GPU's compute capability. - When the engine cache path is given in the wrapped model, TRT EP will first search for the engine file using the path (relative to model path), if it can't find it, it will change to use the path as it is (depends on user, could be relative to working dir or absolute path) Note: 1. This PR includes the change of #17751 Constraints: 1. The whole model should be fully supported by TRT. 4. Users need to make sure the engine is built with min/max/opt optimization profiles that large enough to cover the range of all inputs. TRT EP will simply fail and won't rebuild the engine if the input shape is out of range during runtime.

chilo-ms · 2024-01-15T21:56:46Z

@jywu-msft as discussed in our last sync the idea was to quick load an engine if the engine path is not a folder, but a file. Do we plan to include this here or should this be another PR ?

Please see the PR here for naming the "EP context" model. Feel free to provide feedbacks.

…der options (#19154) Several changes: 1. To align with other EPs' setting of EP context configs in session options, for example [QNN EP](#18877), EP context configs for TRT EP can be configured through: 1. Session Options: `ep.context_enable`, `ep.context_file_path` and `ep.context_embed_mode` 2. Provider Options: `trt_dump_ep_context_model`, `trt_ep_context_file_path` and `trt_dump_ep_context_embed_mode` 3. Above setting has 1:1 mapping and provider options has higher priority over session options. ``` Please note that there are rules for using following context model related provider options: 1. In the case of dumping the context model and loading the context model, for security reason, TRT EP doesn't allow the "ep_cache_context" node attribute of EP context node to be the absolute path or relative path that is outside of context model directory. It means engine cache needs to be in the same directory or sub-directory of context model. 2. In the case of dumping the context model, the engine cache path will be changed to the relative path of context model directory. For example: If "trt_dump_ep_context_model" is enabled and "trt_engine_cache_enable" is enabled, if "trt_ep_context_file_path" is "./context_model_dir", - if "trt_engine_cache_path" is "" -> the engine cache will be saved to "./context_model_dir" - if "trt_engine_cache_path" is "engine_dir" -> the engine cache will be saved to "./context_model_dir/engine_dir" ``` 2. User can decide the naming of the dumped "EP context" model by using `trt_ep_context_file_path`, please see GetCtxModelPath() for more details. 3. Added suggested comments from #18217

chilo-ms added 6 commits October 31, 2023 22:46

update

f8099b1

fix bug

ee81de5

update

452a629

remove redundant check

6f18d8d

remove unused variable

9430768

add script to generate epcontext node

e9507e5

github-advanced-security bot found potential problems Nov 1, 2023

View reviewed changes

onnxruntime/test/testdata/trt_engine_wrapper_test.py Fixed Show fixed Hide fixed

chilo-ms requested a review from jywu-msft November 1, 2023 17:47

chilo-ms mentioned this pull request Nov 1, 2023

[Feature Request] TensorRT deserialization for subgraphs using ONNX nodes #18072

Open

chilo-ms and others added 11 commits November 1, 2023 22:19

fix bug

35a4d33

update

44a7cc5

update

b2fdb06

refactor

eeb6552

update

1b5117d

update

df7ef46

change function name

993b2ad

Merge branch 'main' into chi/trt_engine_wrapper

34a86d7

update

9631f73

update

93f9fbb

refactor

d5974fc

github-advanced-security bot found potential problems Nov 6, 2023

View reviewed changes

onnxruntime/core/providers/tensorrt/onnx_ctx_model_helper.h Fixed Show fixed Hide fixed

chilo-ms added 4 commits November 6, 2023 05:51

check compute capability

7202b73

Merge branch 'main' into chi/trt_engine_wrapper

60f6e7e

update for reading engine byte data

5838143

add script to generate engine wrapper onnx model

aea26b1

github-advanced-security bot found potential problems Nov 7, 2023

View reviewed changes

chilo-ms added 4 commits November 7, 2023 01:04

refactor

f4b38f7

fix bug

de9f510

fix format

65331ed

refactor

befba02

chilo-ms and others added 5 commits January 8, 2024 23:50

swap the position of CreateNodeComputeFromGraph and CreateNodeCompute…

c3a028b

…FromPrecompiledEngine in the file for better solving merge conflict

merge PR 18879 and PR 18834

db46b64

merge PR 18879 and PR 18834 (continue)

842cdf0

Merge branch 'main' into chi/trt_engine_wrapper

ca8d49f

remove kernelcontext_setoutput

e0d3346

jywu-msft reviewed Jan 9, 2024

View reviewed changes

include/onnxruntime/core/providers/tensorrt/tensorrt_provider_options.h Show resolved Hide resolved

fix bugs after merge main

f9231a5

apply lintrunner -a

cebfcd8

gedoensmax reviewed Jan 10, 2024

View reviewed changes

jywu-msft reviewed Jan 11, 2024

View reviewed changes

onnxruntime/core/graph/contrib_ops/contrib_defs.cc Outdated Show resolved Hide resolved

jywu-msft reviewed Jan 11, 2024

View reviewed changes

onnxruntime/core/graph/contrib_ops/contrib_defs.cc Outdated Show resolved Hide resolved

chilo-ms and others added 4 commits January 11, 2024 18:16

Use 'hardware_architecture'

b1c4305

merge main and also enforce get compute capability once inside ep con…

0435971

…text helper function

Merge branch 'main' into chi/trt_engine_wrapper

77bf077

remove unnecessary code

28bdd0a

jywu-msft approved these changes Jan 12, 2024

View reviewed changes

jywu-msft merged commit 46dd0d3 into main Jan 12, 2024
92 checks passed

jywu-msft deleted the chi/trt_engine_wrapper branch January 12, 2024 06:20

chilo-ms mentioned this pull request Jan 15, 2024

[TensorRT EP] Enhance EP context configs in session options and provider options #19154

Merged

chilo-ms mentioned this pull request Mar 26, 2024

a perfermance issue when use onnx runtime-tensorrt #19934

Open

ekmixon mentioned this pull request Apr 6, 2024

[Snyk] Security upgrade eslint from 7.25.0 to 9.0.0 ekmixon/onnxruntime#192

Open

MaxMood96 mentioned this pull request Apr 6, 2024

[Snyk] Security upgrade eslint from 7.25.0 to 9.0.0 MaxMood96/onnxruntime#449

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorRT EP] Load precompiled TRT engine file directly #18217

[TensorRT EP] Load precompiled TRT engine file directly #18217

chilo-ms commented Nov 1, 2023 •

edited

github-advanced-security bot left a comment

gedoensmax commented Jan 8, 2024

jywu-msft commented Jan 8, 2024

chilo-ms commented Jan 9, 2024

gedoensmax left a comment

jywu-msft commented Jan 12, 2024

chilo-ms commented Jan 15, 2024

[TensorRT EP] Load precompiled TRT engine file directly #18217

[TensorRT EP] Load precompiled TRT engine file directly #18217

Conversation

chilo-ms commented Nov 1, 2023 • edited

github-advanced-security bot left a comment

Choose a reason for hiding this comment

gedoensmax commented Jan 8, 2024

jywu-msft commented Jan 8, 2024

chilo-ms commented Jan 9, 2024

gedoensmax left a comment

Choose a reason for hiding this comment

jywu-msft commented Jan 12, 2024

chilo-ms commented Jan 15, 2024

chilo-ms commented Nov 1, 2023 •

edited