Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorRT EP] Load precompiled TRT engine file directly #18217

Merged
merged 53 commits into from
Jan 12, 2024

Conversation

chilo-ms
Copy link
Contributor

@chilo-ms chilo-ms commented Nov 1, 2023

When the TRT engine cache (precompiled engine) is present, it doesn't make sense to go over the processes of model verification, model optimization, TRT EP's GetCapability(), TRT EP's model proto reconstruction, calling TRT parser and engine compilation.
This PR makes TRT EP skip those processes and directly load the engine to perform inference.

The feature request: #18072

Features:

  • Replace original model with TRT engine wrapped ONNX model. It can save a lot of time as mentioned above.

  • How to get TRT engine wrapped ONNX model?

    1. Set trt_dump_ep_context_model provider option to "true" and run the inference. You will find the "xxx_wrapper.onnx" at the engine cache path. (The same logic of generating engine cache)
    2. Use gen_trt_engine_wrapper_onnx_model.py
  • Three provider options are added,
    trt_dump_ep_context_model: Enable dump wrapped onnx model by TRT EP
    trt_ep_context_embed_mode: Add embed_mode as attribute. 0 means engine cache path, 1 means engine binary data.
    trt_ep_context_compute_capability_enable: Add hardware_arch as attribute. When running the model, TRT EP will check consistency between model's hardware_arch and GPU's compute capability.
    trt_ep_context_file_path: Please see this PR.

  • When the engine cache path is given in the wrapped model, TRT EP will first search for the engine file using the path (relative to model path), if it can't find it, it will change to use the path as it is (depends on user, could be relative to working dir or absolute path)

Note:

  1. This PR includes the change of [TensorRT EP] Switch to enqueueV3 with support DDS output #17751

Constraints:

  1. The whole model should be fully supported by TRT.
  2. Users need to make sure the engine is built with min/max/opt optimization profiles that large enough to cover the range of all inputs. TRT EP will simply fail and won't rebuild the engine if the input shape is out of range during runtime.

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lintrunner found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

@gedoensmax
Copy link
Contributor

@jywu-msft Since enqueuev3 is now what are the plans for this PR ? Should I work on a rebase or do we let it rest until Chi returns ?

@jywu-msft
Copy link
Member

@jywu-msft Since enqueuev3 is now what are the plans for this PR ? Should I work on a rebase or do we let it rest until Chi returns ?

plan is to merge this by end of week. @chilo-ms will work on it.

@chilo-ms
Copy link
Contributor Author

chilo-ms commented Jan 9, 2024

@jywu-msft @gedoensmax

Github isn't smart enough to make this PR smoothly merge main. So, i have to resolve a lot of conflicts ...

The key change regarding the code path inside TensorrtExecutionProvider::Compile()is I added two functions: CreateNodeComputeInfoFromPrecompiledEngine() and CreateNodeComputeInfoFromGraph()
I moved almost all the code that originally was in Compile() into CreateNodeComputeInfoFromGraph().
For CreateNodeComputeInfoFromPrecompiledEngine(), it's the streamlined function that only keeps essential setting for directly loading the engine.
Hope this explanation help review.

Copy link
Contributor

@gedoensmax gedoensmax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me, the few comments I left could be considered as nitpicks to be honest.

@jywu-msft
Copy link
Member

The PR looks good to me, the few comments I left could be considered as nitpicks to be honest.

thanks. these suggestions will be added in a follow-up PR. we will merge this one asap as soon as CI's pass.

@jywu-msft jywu-msft merged commit 46dd0d3 into main Jan 12, 2024
92 checks passed
@jywu-msft jywu-msft deleted the chi/trt_engine_wrapper branch January 12, 2024 06:20
mszhanyi pushed a commit that referenced this pull request Jan 15, 2024
When the TRT engine cache (precompiled engine) is present, it doesn't
make sense to go over the processes of model verification, model
optimization, TRT EP's GetCapability(), TRT EP's model proto
reconstruction, calling TRT parser and engine compilation.
This PR makes TRT EP skip those processes and directly load the engine
to perform inference.

The feature request:
#18072

Features:

- Replace original model with TRT engine wrapped ONNX model. It can save
a lot of time as mentioned above.

- How to get TRT engine wrapped ONNX model?
1. Set `trt_dump_ep_context_model` provider option to "true" and run the
inference. You will find the "xxx_wrapper.onnx" at the engine cache
path. (The same logic of generating engine cache)
    2. Use gen_trt_engine_wrapper_onnx_model.py

- Three provider options are added, 
`trt_dump_ep_context_model`: Enable dump wrapped onnx model by TRT EP
`trt_ep_context_embed_mode`: Add embed_mode as attribute. 0 means engine
cache path, 1 means engine binary data.
`trt_ep_context_compute_capability_enable`: Add hardware_arch as
attribute. When running the model, TRT EP will check consistency between
model's hardware_arch and GPU's compute capability.

- When the engine cache path is given in the wrapped model, TRT EP will
first search for the engine file using the path (relative to model
path), if it can't find it, it will change to use the path as it is
(depends on user, could be relative to working dir or absolute path)

Note: 

1. This PR includes the change of
#17751


Constraints:

1. The whole model should be fully supported by TRT. 
4. Users need to make sure the engine is built with min/max/opt
optimization profiles that large enough to cover the range of all
inputs. TRT EP will simply fail and won't rebuild the engine if the
input shape is out of range during runtime.
@chilo-ms
Copy link
Contributor Author

@jywu-msft as discussed in our last sync the idea was to quick load an engine if the engine path is not a folder, but a file. Do we plan to include this here or should this be another PR ?

Please see the PR here for naming the "EP context" model. Feel free to provide feedbacks.

jywu-msft pushed a commit that referenced this pull request Jan 21, 2024
…der options (#19154)

Several changes:

1. To align with other EPs' setting of EP context configs in session
options, for example [QNN
EP](#18877), EP context
configs for TRT EP can be configured through:
1. Session Options: `ep.context_enable`, `ep.context_file_path` and
`ep.context_embed_mode`
2. Provider Options: `trt_dump_ep_context_model`,
`trt_ep_context_file_path` and `trt_dump_ep_context_embed_mode`
3. Above setting has 1:1 mapping and provider options has higher
priority over session options.
    
```
    Please note that there are rules for using following context model related provider options:

     1. In the case of dumping the context model and loading the context model,
        for security reason, TRT EP doesn't allow the "ep_cache_context" node attribute of EP context node to be
        the absolute path or relative path that is outside of context model directory.
        It means engine cache needs to be in the same directory or sub-directory of context model.

     2. In the case of dumping the context model, the engine cache path will be changed to the relative path of context model directory.
        For example:
        If "trt_dump_ep_context_model" is enabled and "trt_engine_cache_enable" is enabled,
           if "trt_ep_context_file_path" is "./context_model_dir",
           - if "trt_engine_cache_path" is "" -> the engine cache will be saved to "./context_model_dir"
           - if "trt_engine_cache_path" is "engine_dir" -> the engine cache will be saved to "./context_model_dir/engine_dir"
```    

2. User can decide the naming of the dumped "EP context" model by using
`trt_ep_context_file_path`, please see GetCtxModelPath() for more
details.

3. Added suggested comments from
#18217
YUNQIUGUO pushed a commit that referenced this pull request Jan 23, 2024
…der options (#19154)

Several changes:

1. To align with other EPs' setting of EP context configs in session
options, for example [QNN
EP](#18877), EP context
configs for TRT EP can be configured through:
1. Session Options: `ep.context_enable`, `ep.context_file_path` and
`ep.context_embed_mode`
2. Provider Options: `trt_dump_ep_context_model`,
`trt_ep_context_file_path` and `trt_dump_ep_context_embed_mode`
3. Above setting has 1:1 mapping and provider options has higher
priority over session options.
    
```
    Please note that there are rules for using following context model related provider options:

     1. In the case of dumping the context model and loading the context model,
        for security reason, TRT EP doesn't allow the "ep_cache_context" node attribute of EP context node to be
        the absolute path or relative path that is outside of context model directory.
        It means engine cache needs to be in the same directory or sub-directory of context model.

     2. In the case of dumping the context model, the engine cache path will be changed to the relative path of context model directory.
        For example:
        If "trt_dump_ep_context_model" is enabled and "trt_engine_cache_enable" is enabled,
           if "trt_ep_context_file_path" is "./context_model_dir",
           - if "trt_engine_cache_path" is "" -> the engine cache will be saved to "./context_model_dir"
           - if "trt_engine_cache_path" is "engine_dir" -> the engine cache will be saved to "./context_model_dir/engine_dir"
```    

2. User can decide the naming of the dumped "EP context" model by using
`trt_ep_context_file_path`, please see GetCtxModelPath() for more
details.

3. Added suggested comments from
#18217
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants