Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multimodal] add tensorrt tutorial #2987

Merged
merged 10 commits into from
Mar 16, 2023
Merged

[multimodal] add tensorrt tutorial #2987

merged 10 commits into from
Mar 16, 2023

Conversation

liangfu
Copy link
Collaborator

@liangfu liangfu commented Mar 1, 2023

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link

Job PR-2987-3abbc5a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2987/3abbc5a/index.html

@liangfu liangfu added the model list checked You have updated the model list after modifying multimodal unit tests/docs label Mar 13, 2023
@liangfu liangfu marked this pull request as ready for review March 13, 2023 21:26
@github-actions
Copy link

Job PR-2987-92bba50 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2987/92bba50/index.html

@github-actions
Copy link

Job PR-2987-5cba8f8 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2987/5cba8f8/index.html

"trt_predictor = MultiModalPredictor.load(path=model_path)\n",
"trt_predictor.optimize_for_inference()\n",
"\n",
"# Agagin, use first prediction for initialization (e.g., allocating memory)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo agagin?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe give more explanations of using the first prediction for initialization. Otherwise, users may wonder why this is necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo agagin?

Nice catch. Fixed typo.

Maybe give more explanations of using the first prediction for initialization. Otherwise, users may wonder why this is necessary.

This is indicated with an example being allocating memory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time calling model forward takes more time than the following calls. I'm not sure whether we can explain this observation as model initialization since Pytorch uses dynamic graph and eager execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the batch size is changeable during predictions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my observation, it would automatically re-compile when batch_size dimension is larger than initialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any Pytorch documentation regarding this phenomenon?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The re-compile only happens to tensorrt backend in onnxruntime, not directly related to pytorch.

This behavior is not well documented, it may related to Shape Inference for TensorRT Subgraphs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above pytorch module also uses the first prediction as initialization. That's why we use Again here. So, this behavior exists for both pytorch and onnxruntime/tensorrt?

Copy link
Collaborator Author

@liangfu liangfu Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For pytorch, it is used for memory allocation; for onnxruntime/tensorrt, it is used for 1) fair comparison, 2) model compilation. (model compilation actually happens when calling optimize_for_inference() )

@github-actions
Copy link

Job PR-2987-4a30526 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2987/4a30526/index.html

@github-actions
Copy link

Job PR-2987-133854a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2987/133854a/index.html

@github-actions
Copy link

Job PR-2987-75ae324 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2987/75ae324/index.html

Copy link
Contributor

@zhiqiangdon zhiqiangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@liangfu liangfu merged commit 19f4db7 into autogluon:master Mar 16, 2023
@liangfu liangfu deleted the trt-2 branch March 16, 2023 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model list checked You have updated the model list after modifying multimodal unit tests/docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants