-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fix Cached_op with static_shape=true #15298
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the fix :-)
ecf77b3
to
ac55c06
Compare
#15297 should be fixed with this PR. |
@anirudh2290 @roywei please take a review. |
The segfault & core dump is fixed.
all failure seems to happen at sockeye side.
|
} | ||
CHECK_EQ(outputs.size(), in_grad_.size()); | ||
for (size_t i = 0; i < outputs.size(); ++i) in_grad_[i] = outputs[i]; | ||
bwd_init_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this caching was first removed in #14738 . I think this has certain performance implications since we are not caching the TBlobs anymore. Is the use case also similar, is this caused by split operator ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using legacy ops in Cached_op, this caching is not correct, because even static_alloc=true and static_shape=true, the input or output TBlobs may changed if they are the input or output of Cached_op.
Thinking a small case that end-user only hybridize one legacy op, then its input is the Cached_op's input, and also for output. Then end-user may pass different NDArrays to this Cached_op, and this TBlobs cache isn't correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, thanks for clarifying!
@roywei test/unit/test_data_io.py::test_parallel_sample_iter FAILED Those failures are also reproducible before merging trouble PR: 09202f7. So I think those sockeye failures doesn't relate to that PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have verified this resolves the sockeye failure, remaining test failures should be fixed at sockeye side. it's not related to cached op. Thanks for the fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ZhennanQin Please verify the performance of this PR with our internal tests and NLP tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix
@pengzhao-intel Tested symbolic & gluon inference speed and bert, seems everything works fine. |
Thanks, merging now. |
Please pick up this fix to r1.5 branch. |
* Fix * run ci
@pengzhao-intel - I will be very interested to learn more about what internal tests and benchmark setup you have. Main motivation is to see if some of these tests should be bought to Nightly CI. |
@sandeep-krishnamurthy It's a good idea :) We have a branch of models and tested the latency and throughput for each CI so we can guarantee the performance of FP32 and INT8. Currently, the 2nd generation scalable processor is available in EC2, C5.12xlarge and C5.24xlarge. |
Thanks @pengzhao-intel - I will create a Github issue to discuss this with community members helping in CI and other activities around benchmarks/performance tests. |
Description
Should address #15281
@pengzhao-intel @TaoLv @junrushao1994 @zheng-da
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments