This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
[MXNET-1417][Performance] Caching Dynamic Shape Checking Result #15262
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
(Please see appendix for experiment details)
PR #14192 that enables dynamic shapes slows down a model that originally runs in 235.65 ms by 7.26 ms (to 242.91 ms).
Also noted that a seemingly relevant PR #14665 suggesting itself to be improving "[performance]", does not change performance number in any means - It still runs in 242.35 ms.
This PR fixes this by caching the checking result of whether dynamic shape exists. The mechanism itself is quick simple: if the dynamic shape existence has been checked, let's simply don't do it again, because the graph does not change.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
dynamic_shape_checked
in CachedOp, which indicates whether dynamic shape existence has been checked.Comments
Experiment environment: EC2 p2.8xlarge, CUDA 10 and cuDNN 7.5. The model itself is confidential.
The detailed benchmark is as below (mean ± stdev). The experiment is conducted in 20 runs, warmup run is excluded.
On commit 39412b3 (right before PR [MXNET-1324] Add NaiveRunGraph to imperative utils #14192 is merge):
Hybridize w/ static_alloc: 235.65 ± 0.22246 ms
On commit 83d2c2d (where PR [MXNET-1324] Add NaiveRunGraph to imperative utils #14192 is merged):
Hybridize w/ static_alloc: 242.91 ms ± 0.71125 ms
PR [performance] Avoid uneccesary vector copies in imperative_utils.cc #14665 patched to commit 83d2c2d
Hybridize w/ static_alloc: 242.35 ± 0.25124 ms
After this patch applied to commit 83d2c2d
Hybridize w/ static_alloc: 234.95 ± 0.39334 ms
CC: @szha @zheng-da please review :-)