This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Integrating the MKL VML functions to MXNET to speed-up the (element-wised) mathematic computation #14893
Integrating the MKL VML functions to MXNET to speed-up the (element-wised) mathematic computation #14893
Changes from 36 commits
f0c7264
9311777
a79f7db
015fd0a
495ce36
672be6a
c69a25c
2c5c20c
b1b6355
f96c34a
06c51e9
acd7b56
dc0086f
1758e91
4461f62
a3efd02
64d01a4
7edca49
d6139fc
46a49d6
0e36f93
c6e2518
1153479
e60493c
f360320
15f2f20
01d3f7e
7a360e8
22a9c4c
2b9eca4
56384df
8d1dfee
c557788
7e99f3e
94bafb0
a3e07c5
e275daa
a383f46
ff76244
b13d6ef
eb4c82b
0cb1120
fc51292
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does all of these functions will be mapped automatically when MKL is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We just put all the VML functions here. We think these functions can be leveraged by MXNet in the future. But currently it need to change the registration of each operator to use these functions. In this PR we only optimized some operators which are used in BERT. We propose to optimize others when we face performance problems on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. We can add it back when we use it; otherwise, it is a little confusion for other developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance to fusion some of these operations to reduce the memory bandwidth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much faster is this version compared to the mshadow one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading the code, I think the current implementation, which relies on the vectorized operations, should be fast at scaling and shifting the data (data * gamma & data + beta). One possible improvement is to use the Welford's online algorithm (https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance) to calculate the mean/variance in one pass, the code will look like this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pengzhao-intel @sxjscience loops are fused in the latest commit. I also removed the required workspace but that means we can not leverage VML functions and need rely on compiler for vectorization.