[MXNET-359] fix checks on convolution parameters in MKLDNN. #10666

zheng-da · 2018-04-24T01:35:11Z

Description

As I explained in #10663, there is mismatch between MXNet and ONNX. This is a temp fix from the MKLDNN side for the problem: MKLDNN conv follows the behavior of MXNet conv (always uses the first two elements in the tuple as padding).

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

anirudh2290

Can you add a test case for symmetric padding with length of tuple as 4 ?

anirudh2290 · 2018-04-24T07:24:52Z

src/operator/nn/mkldnn/mkldnn_convolution.cc

@@ -88,26 +91,23 @@ static mkldnn::convolution_backward_data::primitive_desc GetConvBwdData(
  auto weight_md = GetWeightDesc(weights, param.num_group);
  auto out_md = GetMemDesc(output);
  auto engine = CpuEngine::Get()->get_engine();
+  CHECK_GE(param.stride.ndim(), 2U);


should this be CHECK_EQ ?

after ONNX is fixed, this should be CHECK_EQ. I didn't know if ONNIX would be fixed when I submitted the PR.

Please disregard my comment. I think this change shouldnt depend on whether onnx is fixed or not. CHECK_GE looks good to make it consistent with existing behavior.

anirudh2290 · 2018-04-24T07:25:09Z

src/operator/nn/mkldnn/mkldnn_convolution.cc

@@ -123,16 +123,15 @@ static mkldnn::convolution_backward_weights::primitive_desc GetConvBwdWeights(
  auto weight_md = GetWeightDesc(weights, param.num_group);
  auto out_md = GetMemDesc(output);
  auto engine = CpuEngine::Get()->get_engine();
+  CHECK_GE(param.stride.ndim(), 2U);


should this be CHECK_EQ ?

anirudh2290 · 2018-04-24T07:25:50Z

src/operator/nn/mkldnn/mkldnn_deconvolution.cc

-    strides[0] = param.stride[0];
-    strides[1] = param.stride[1];
-  } else if (param.stride.ndim() == 1) {
-    strides[0] = param.stride[0];


param.pad.ndim() == 1 will not use mkldnn anymore ?

mxnet always assume 2 elements in the tuple. in the python, if the input is one element, it'll convert it to 2-element tuple, so in practice, we don't get stride with one element.

Python will extend one element to two-element tuple. What about other frontend languages or what about someone calling c APIs to build their model?

anirudh2290 · 2018-04-24T07:27:55Z

src/operator/nn/mkldnn/mkldnn_deconvolution.cc

@@ -32,6 +32,12 @@
 namespace mxnet {
 namespace op {

+bool SupportMKLDNNConv(const DeconvolutionParam& params, const NDArray &input) {
+  if (params.kernel.ndim() != 2)


do we need to add check for strides and dilate too ?

I think we should have a check in the parameter parser of mxnet conv, so we don't need to check it in the MKLDNN code.

If we are checking strides and dilates ndim to be greater than equal to 2, can we fallback to default implementation and return false here when ndim of stride, pad or dilates is less than 2 ?

zheng-da · 2018-04-24T19:05:56Z

@piiswrong @eric-haibin-lin @ashokei @pengzhao-intel @TaoLv
Could you help review the code and merge it, so it goes to the release?

eric-haibin-lin · 2018-04-24T21:20:47Z

Do we still want this change if onnx correctly handles padding?

anirudh2290 · 2018-04-24T21:26:04Z

@eric-haibin-lin yes we still need this change to make the behavior consistent (with or without MKLDNN) enabled. We can add this back later on, when we add support to raise error for MXNet conv (without MKLDNN enabled).

zheng-da · 2018-04-25T02:22:13Z

I’m not sure about other front ends. What I see is mxnet conv operator always assumes two-elements tuple in this case. Ideally, we should fix the tuple correctly when the parameters are parsed.

…

On Tue, Apr 24, 2018 at 6:51 PM Tao Lv ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/operator/nn/mkldnn/mkldnn_deconvolution.cc <#10666 (comment)> : > mkldnn::memory::dims strides{0, 0}; - if (param.stride.ndim() == 2) { - strides[0] = param.stride[0]; - strides[1] = param.stride[1]; - } else if (param.stride.ndim() == 1) { - strides[0] = param.stride[0]; Python will extend one element to two-element tuple. What about other frontend languages or what about someone calling c APIs to build their model? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#10666 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAETUYqp6vSbkepdsnuLqqU6xXWUqfiBks5tr9acgaJpZM4Tg3hf> .

anirudh2290 · 2018-04-25T20:47:37Z

I ran the following simple script with the code pulled from your branch:

import mxnet as mx
arr = mx.nd.random.uniform(shape=(10, 10, 32, 32))
weight1 = mx.nd.random.uniform(shape=(10, 10, 3, 3))
arr = mx.nd.Convolution(data=arr, weight=weight1, no_bias=True, kernel=(3, 3), stride=(1), num_filter=10)
print(arr.asnumpy())

MXNet conv seems to be reading the stride with ndim = 1 correctly here: https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/convolution-inl.h#L400.
There is an inconsistency here between MXNet conv (with or without mkldnn enabled). With MKLDNN enabled the script throws an exception, without MKLDNN enabled the script doesn't throw an exception.

To avoid this inconsistency for now, we can fallback to default compute if any of pad , stride or dilate have ndim < 2. Let me know what you think.

zheng-da · 2018-04-25T21:34:50Z

import mxnet as mx
arr = mx.nd.random.uniform(shape=(10, 10, 32, 32))
weight1 = mx.nd.random.uniform(shape=(10, 10, 3, 3))
arr1 = mx.nd.Convolution(data=arr, weight=weight1, no_bias=True, kernel=(3, 3), stride=(1), num_filter=10)
arr2 = mx.nd.Convolution(data=arr, weight=weight1, no_bias=True, kernel=(3, 3), stride=(1, 1), num_filter=10)
print((arr1 == arr2).asnumpy().sum())

This outputs 2616.0, while we expect 3000 because the output shape is (10L, 10L, 30L, 1L).

piiswrong · 2018-05-02T17:49:36Z

src/operator/nn/convolution.cc

@@ -363,6 +365,9 @@ static void ConvolutionParamParser(nnvm::NodeAttrs* attrs) {
    if (param_.dilate.ndim() == 0) param_.dilate = Shape3(1, 1, 1);
    if (param_.pad.ndim() == 0) param_.pad = Shape3(0, 0, 0);
  }
+  CHECK_EQ(param_.kernel.ndim(), param_.stride.ndim());


These checks needs to have error messages.
"stride must have the same number of dimensions with kernel_size, but kernel_size is set to (x,x,x) while stride is (x,x)"

…10666)" This reverts commit 1420697.

…0666) * fix check on tuples of conv. * check params in (de)conv. * rename. * add messages.

fix check on tuples of conv.

82234d4

zheng-da changed the title ~~fix checks on convolution parameters in MKLDNN.~~ [MXNET-359] fix checks on convolution parameters in MKLDNN. Apr 24, 2018

anirudh2290 reviewed Apr 24, 2018

View reviewed changes

anirudh2290 approved these changes Apr 24, 2018

View reviewed changes

zheng-da added 2 commits May 1, 2018 18:24

check params in (de)conv.

e2c64d5

rename.

099f235

piiswrong reviewed May 2, 2018

View reviewed changes

add messages.

9150928

piiswrong merged commit 1420697 into apache:master May 2, 2018

marcoabreu added a commit that referenced this pull request May 3, 2018

Revert "[MXNET-359] fix checks on convolution parameters in MKLDNN. (#…

54e7b6f

…10666)" This reverts commit 1420697.

anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 7, 2018

[MXNET-359] fix checks on convolution parameters in MKLDNN. (apache#1…

c679fcd

…0666) * fix check on tuples of conv. * check params in (de)conv. * rename. * add messages.

jinhuang415 pushed a commit to jinhuang415/incubator-mxnet that referenced this pull request May 29, 2018

[MXNET-359] fix checks on convolution parameters in MKLDNN. (apache#1…

64c1a23

…0666) * fix check on tuples of conv. * check params in (de)conv. * rename. * add messages.

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018

[MXNET-359] fix checks on convolution parameters in MKLDNN. (apache#1…

8e587bb

…0666) * fix check on tuples of conv. * check params in (de)conv. * rename. * add messages.

zheng-da added a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018

[MXNET-359] fix checks on convolution parameters in MKLDNN. (apache#1…

50ee8b1

…0666) * fix check on tuples of conv. * check params in (de)conv. * rename. * add messages.

zheng-da mentioned this pull request Jul 2, 2018

MXNet conv should check the number of elements in strides, padding, etc. #10689

Closed

zheng-da deleted the fix_conv branch July 5, 2018 06:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-359] fix checks on convolution parameters in MKLDNN. #10666

[MXNET-359] fix checks on convolution parameters in MKLDNN. #10666

zheng-da commented Apr 24, 2018 •

edited

anirudh2290 left a comment

anirudh2290 Apr 24, 2018

zheng-da Apr 24, 2018

anirudh2290 Apr 24, 2018

anirudh2290 Apr 24, 2018

anirudh2290 Apr 24, 2018

zheng-da Apr 24, 2018

TaoLv Apr 25, 2018

anirudh2290 Apr 24, 2018

zheng-da Apr 24, 2018

anirudh2290 Apr 25, 2018

zheng-da commented Apr 24, 2018

eric-haibin-lin commented Apr 24, 2018

anirudh2290 commented Apr 24, 2018

zheng-da commented Apr 25, 2018 via email

anirudh2290 commented Apr 25, 2018 •

edited

zheng-da commented Apr 25, 2018

piiswrong May 2, 2018

[MXNET-359] fix checks on convolution parameters in MKLDNN. #10666

[MXNET-359] fix checks on convolution parameters in MKLDNN. #10666

Conversation

zheng-da commented Apr 24, 2018 • edited

Description

Checklist

Essentials

anirudh2290 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zheng-da commented Apr 24, 2018

eric-haibin-lin commented Apr 24, 2018

anirudh2290 commented Apr 24, 2018

zheng-da commented Apr 25, 2018 via email

anirudh2290 commented Apr 25, 2018 • edited

zheng-da commented Apr 25, 2018

Choose a reason for hiding this comment

zheng-da commented Apr 24, 2018 •

edited

anirudh2290 commented Apr 25, 2018 •

edited