-
Notifications
You must be signed in to change notification settings - Fork 6.8k
MKL-DNN integration: request for reviews #7931
Conversation
elemwise sum bug fixes
@piiswrong We have fixed convergence issue in Resnet. There were some problems in Conv and Batch norm layers. Also, we have added some more optimizations to get more speed-ups. Now, MKL-DNN version is 15% faster than MKLML (MKL2017) for inference and training on average. |
@szha @piiswrong MKL-DNN doesn't support fp64(double) data type. Do you think this is an issue? The library team is more focusing on adding lower precisions. |
* @param req | ||
* @param out_data | ||
*/ | ||
template<typename xpu, typename DType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need xpu for any MKLDNN functions? Doesn't the code always run on CPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. I was following the convention of other MKLDNN operators to be consistent. I can change this to cpu only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also discussed with Young and Ashok, we would like to keep this template parameter for supporting future Intel devices. We may support devices other than traditional CPU in the future.
|
||
if (req[0] == kNullOp) return; | ||
|
||
Stream<xpu> *s = ctx.get_stream<xpu>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem the stream is used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true right now. We may need to use the stream when we try to support other tensor shapes beside nchw. I can remove it for now.
.set_attr<FInferStorageType>("FInferStorageType", | ||
ElemwiseStorageType<2, 1, true, false, false>) \ | ||
.set_attr<FCompute>("FCompute<cpu>", MKLDNNElementWiseAddCompute<cpu>) \ | ||
.set_attr<FComputeEx>("FComputeEx<cpu>", ElemwiseBinaryOp::ComputeEx<cpu, mshadow::op::plus>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MKLDNN implementation is only defined for FCompute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since FComputeEx is for NDArray inputs and all current MKLDNN operators are only supporting TBlobs and the main benefit of the MKLDNN elemwise sum operator comes from working with other MKLDNN operators. With the upcoming sparse tensor support, we may need to make some adjustments.
@szha Sure, I am looking into it. |
@ykim362 BTW is the fix in mklml_lnx_2018.0.20170908.tgz? Does it make sense to upgrade the library for mkl2017 use case? Many people are using MKL version (with MKL2017 and experimental on). |
@szha MKL-DNN also utilizes MKL2017. So, it would be useful to update MKL2017(2018) as well. And, the fix would be in the MXNet code. I am still investigating it. |
is there an update on this issue? we are keen to include MKL in an upcoming project. |
@piiswrong @szha @sbodenstein MKL-DNN now officially supports OSX. So, we don't need to worry about this issue. |
@ykim362: thanks! I saw that. Btw: do you have any estimate for when this PR will be ready? |
@sbodenstein From my understanding, this PR is not going to be directly merged. It's going to be merged with another revision with sparse tensor (mkl storage). @piiswrong Is this correct? |
@ykim362: do you know if bugs, like the resnet convergence bug, are still unsolved with v0.11 MKL-DNN? |
@sbodenstein MKL with v0.11 is quite buggy. I often got (i.e. even without MKL-DNN MKL is buggy) |
closing since @zheng-da is making a new PR for this |
This PR is a beta version for the code reviews and experiments. There are several known issues which are being debugged.
If this version is built with 'USE_MKL2017' and 'USE_MKL2017_EXPERIMENTAL' flag, it will provide the same functionalities and performance as the current MKLML release. If this is built with 'USE_MKLDNN' flag, it will go through the new code pass (MKL-DNN integration).
MKL-DNN
A new open-source deep learning library providing IA optimized DNN kernels.
https://github.com/01org/mkl-dnn
Advantages
More functionalities
New functionalities will be mainly added to MKL-DNN rather than MKLML library.
Below are two examples.
Performance optimization
As of Sep. 18 2017.
Alexnet Inference (BS:256): 1474 (MKLML) --> 1568 (MKL-DNN)
inception-bn inference (BS:32): 454 (MKLML) --> 483 (MKL-DNN)
on Skylake 20-core machine (6148)
Resnet 50 inference (BS: 32): 99 (MKLML) --> 116 (MKL-DNN)
on KNL 7250
Known issues
Contributors for this PR
@ashokei @karkadad @louisfeng @adstraw