-
Notifications
You must be signed in to change notification settings - Fork 861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[multimodal] add tensorrt tutorial #2987
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Job PR-2987-3abbc5a is done. |
Job PR-2987-92bba50 is done. |
Job PR-2987-5cba8f8 is done. |
"trt_predictor = MultiModalPredictor.load(path=model_path)\n", | ||
"trt_predictor.optimize_for_inference()\n", | ||
"\n", | ||
"# Agagin, use first prediction for initialization (e.g., allocating memory)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo agagin
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe give more explanations of using the first prediction for initialization. Otherwise, users may wonder why this is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo agagin?
Nice catch. Fixed typo.
Maybe give more explanations of using the first prediction for initialization. Otherwise, users may wonder why this is necessary.
This is indicated with an example being allocating memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first time calling model forward takes more time than the following calls. I'm not sure whether we can explain this observation as model initialization since Pytorch uses dynamic graph and eager execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the batch size is changeable during predictions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my observation, it would automatically re-compile when batch_size dimension is larger than initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any Pytorch documentation regarding this phenomenon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The re-compile only happens to tensorrt backend in onnxruntime, not directly related to pytorch.
This behavior is not well documented, it may related to Shape Inference for TensorRT Subgraphs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above pytorch module also uses the first prediction as initialization. That's why we use Again
here. So, this behavior exists for both pytorch and onnxruntime/tensorrt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For pytorch, it is used for memory allocation; for onnxruntime/tensorrt, it is used for 1) fair comparison, 2) model compilation. (model compilation actually happens when calling optimize_for_inference() )
Job PR-2987-4a30526 is done. |
Job PR-2987-133854a is done. |
Job PR-2987-75ae324 is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.