[MXNET-497] fix bugs in MKLDNN operators to handle the kAddTo request #11129

azai91 · 2018-06-01T23:49:44Z

Description

(Brief description on what this PR is about)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

add MKLDNN tests to support kAddTo request type

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

zheng-da · 2018-06-02T18:27:47Z

i don't think all operators support kAddTo

zheng-da · 2018-06-12T16:37:49Z

src/common/exec_utils.h

      CHECK(temp.IsDefaultData());
+#else
+      NDArray temp = bufs != nullptr ? bufs->at(i) : NDArray(nd.shape(), nd.ctx(),
+          true, nd.dtype());


sparse arrays doesn't have kAddTo? @eric-haibin-lin

Yes. The executor won't generate kAddTo for sparse outputs. Sparse operators don't support that

zheng-da · 2018-06-12T16:44:00Z

src/operator/nn/mkldnn/mkldnn_base.cc

-    CHECK(mem != nullptr);
+    if (mem == nullptr) {
+      auto tmp_memory = TmpMemMgr::Get()->Alloc(target_pd);
+      CopyMKLDNNMem(*res_memory, tmp_memory);


I think you can just call GetMKLDNNDataReorder.
@pengzhao-intel tries to fix this bug, but doesn't have a test for his fix.
#11095

the purpose of this block is coerce the input memory (res) to the same size as the output ndarray's memory. the issue here is that we do not have the ndarray of the input, only the memory in res.second so we don't have the GetMKLDNNDataReorder member function.

I see. When there is a format mismatch between arr and res, you use the format of arr. In this case, you may remove line 214 and 216 and have the sum written to mem directly. WriteInplace works in MKLDNN as long as the inputs and outputs have the same shape and format.

zheng-da · 2018-06-12T16:46:57Z

src/operator/nn/mkldnn/mkldnn_base.cc

      const_cast<NDArray &>(output).InvalidateMKLDNNData();
-    else if (req[i] == kAddTo)
-      output = outputs[i].Reorder2Default();
+    } else if (req[0] == kAddTo && output.IsMKLDNNData()) {


I wonder if it's necessary that in the case of fallback, the output is still an MKLDNN array?
I know the unit test can trigger the problem.

if the output is not MKLDNN then we do not need to enter this block because we will able to call the FCompute function on the output directly.

zheng-da · 2018-06-12T16:50:24Z

src/operator/nn/mkldnn/mkldnn_base.cc

-    else if (req[i] == kAddTo)
-      output = outputs[i].Reorder2Default();
+    } else if (req[0] == kAddTo && output.IsMKLDNNData()) {
+      NDArray temp = outputs[0].Reorder2Default();


it should be outputs[i]

zheng-da · 2018-06-12T16:50:52Z

src/operator/nn/mkldnn/mkldnn_base.cc

  fn(attrs, ctx, in_blobs, req, out_blobs);
+  if (req[0] == kAddTo && outputs[0].IsMKLDNNData())


why is the index always 0?

zheng-da · 2018-06-12T16:52:34Z

src/operator/nn/mkldnn/mkldnn_sum.cc

+    if (in_mem == nullptr) {
+      auto tmp_memory = TmpMemMgr::Get()->Alloc(target_pd);
+      auto input_memory = inputs[i].GetMKLDNNData();
+      CopyMKLDNNMem(*input_memory, tmp_memory);


I think you can just call GetMKLDNNDataReorder

zheng-da · 2018-06-12T16:54:16Z

tests/cpp/operator/mkldnn.cc

-    EXPECT_EQ(d1[i], std::fmax(d2[i], 0));
-  }
+  for (size_t i = 0; i < tmp1.shape().Size(); i++)
+    ASSERT_EQ(d2[i], std::fmax(d1[i], 0));


shouldn't we use google test check?

ASSERT_EQ is from gtest? I am using ASSERT_EQ over EXPECT_EQ here cause I want the test to fail as soon as one comparison fails (else we get an error message from every incorrect element in the vector).

ASSERT_EQ is defined by mxnet. We should use EXPECT_EQ in the unit test because we want to see all failures.

zheng-da · 2018-06-12T16:54:26Z

tests/cpp/operator/mkldnn.cc

  mshadow::default_real_t *d1 = in1.data().dptr<mshadow::default_real_t>();
  mshadow::default_real_t *d2 = in2.data().dptr<mshadow::default_real_t>();
  mshadow::default_real_t *o = out.data().dptr<mshadow::default_real_t>();
  for (size_t i = 0; i < in1.shape().Size(); i++)
-    EXPECT_EQ(d1[i] + d2[i], o[i]);
+    ASSERT_EQ(d1[i] + d2[i], o[i]);


the same here?

pengzhao-intel · 2018-06-13T00:53:13Z

@azai91 Really thanks your efforts for improving the quality of MKL-DNN backend.

@azai91 @zheng-da @eric-haibin-lin @marcoabreu

Two suggestions:

1) avoiding change the MKL-DNN implementations when adding the test cases.
It will be hard to track back the changes. If it really needs to change (for the bugfix or other reasons), I think we can file another PR.

One example is #11026, this PR is marked as adding test cases, so we don't pay much attention because it's always a good thing for more cases. But the implementation was changed and led to a performance regression for resnet-50 inference (from 238 drop to 179), see below.

2) running the benchmark_score.py to verify the performance for each of CI

commit 92286c9106dd63d2bfd062f9abb0e53b071a46e4
Author: Alexander Zai azai91@gmail.com
Date:   Tue May 29 17:48:33 2018 -0700

[lvtao@mlt-skx052 image-classification]$ python benchmark_score.py
INFO:root:network: resnet-50
INFO:root:device: cpu(0)
/home/lvtao/Workspace/mxnet-official/python/mxnet/module/base_module.py:66: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
  warnings.warn(msg)
[22:09:59] src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 9437184 bytes with malloc directly
INFO:root:batch size  1, image/sec: 83.039246
INFO:root:batch size  2, image/sec: 118.022742
INFO:root:batch size  4, image/sec: 149.483457
INFO:root:batch size  8, image/sec: 162.208255
INFO:root:batch size 16, image/sec: 166.800091
INFO:root:batch size 32, image/sec: 165.821744
INFO:root:batch size 64, image/sec: 175.934763
INFO:root:batch size 128, image/sec: 179.160898
INFO:root:batch size 256, image/sec: 178.767242


commit 9514a1e39f8356f8fee6202cd86c8f20fbf301b6
Author: kpmurali 37911926+kpmurali@users.noreply.github.com
Date:   Tue May 29 17:36:35 2018 -0700

[lvtao@mlt-skx052 image-classification]$ python benchmark_score.py
INFO:root:network: resnet-50
INFO:root:device: cpu(0)
/home/lvtao/Workspace/mxnet-official/python/mxnet/module/base_module.py:66: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
  warnings.warn(msg)
[22:19:23] src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 147456 bytes with malloc directly
[22:19:23] src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 589824 bytes with malloc directly
[22:19:23] src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 2359296 bytes with malloc directly
[22:19:23] src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 9437184 bytes with malloc directly
INFO:root:batch size  1, image/sec: 89.205967
INFO:root:batch size  2, image/sec: 128.775569
INFO:root:batch size  4, image/sec: 156.363125
INFO:root:batch size  8, image/sec: 195.993911
INFO:root:batch size 16, image/sec: 219.215664
INFO:root:batch size 32, image/sec: 224.414152
INFO:root:batch size 64, image/sec: 238.657344
INFO:root:batch size 128, image/sec: 230.266770
INFO:root:batch size 256, image/sec: 225.139635

azai91 · 2018-06-13T01:29:03Z

okay. will add MKLDNN benchmark tests in a separate PR

azai91 · 2018-06-13T02:06:18Z

@pengzhao-intel is anyone working on a PR to revert the regression or is someone already reverting?

pengzhao-intel · 2018-06-13T02:19:22Z

@azai91, You can try to figure out the root cause and fix it with a new PR. I don't think we need to revert the previous PR.

zheng-da · 2018-06-13T16:06:18Z

@pengzhao-intel I think we should fix bugs when adding tests. We can change the title of this PR to something like "fix bugs in MKLDNN operators to handle the kAddTo request".

zheng-da · 2018-06-13T16:18:28Z

src/common/exec_utils.h

 #if MXNET_USE_MKLDNN == 1
+        NDArray temp = bufs != nullptr ? bufs->at(i) : nd.IsMKLDNNData() ?
+            nd.Reorder2Default() : NDArray(nd.shape(), nd.ctx(), true, nd.dtype());


when MXNET_USE_MKLDNN == 1, isn't is_default the same as nd.IsMKLDNNData()?
why do you still need to check nd.IsMKLDNNData() here?

is_default could be false because it is a sparse ndarray. in which case we cannot call reorder2default and will just make a copy like we previously did. this does mean that kAddTo won't work as we are not preserving the data to tmp.

the logic here looks very complex. i guess you might need to check IsMKLDNNData first before checking bufs.
Also, what happens if nd is a default dense array and req is kAddTo? should we use nd directly? Should you check kAddTo more explicitly?

zheng-da · 2018-06-13T16:28:45Z

src/operator/nn/mkldnn/mkldnn_base.cc

-    CHECK(mem != nullptr);
+    if (mem == nullptr) {
+      auto tmp_memory = TmpMemMgr::Get()->Alloc(target_pd);
+      CopyMKLDNNMem(*res_memory, tmp_memory);


I see. When there is a format mismatch between arr and res, you use the format of arr. In this case, you may remove line 214 and 216 and have the sum written to mem directly. WriteInplace works in MKLDNN as long as the inputs and outputs have the same shape and format.

azai91 · 2018-06-13T18:08:43Z

Address regression issue in this diff #11262

pengzhao-intel · 2018-06-14T00:19:34Z

@zheng-da agree. It's nice to keep consistency between the title/description with the real changes.

azai91 · 2018-06-14T22:24:36Z

working on breaking this diff into smaller diffs. will postpone trying to merge this for a couple of days.

pengzhao-intel · 2018-06-19T04:29:40Z

@azai91 I have closed my PR since you have already fixed it and added the tests, thanks a lot.
Please take the try for the A3C example, and make can work :)

azai91 · 2018-06-26T01:20:40Z

@zheng-da @pengzhao-intel updated PR. please take a look when you have time.

pengzhao-intel · 2018-06-26T05:53:11Z

src/operator/nn/mkldnn/mkldnn_base.cc

+    if (mem == nullptr) {
+      auto tmp_memory = TmpMemMgr::Get()->Alloc(target_pd);
+      MKLDNNCopy(*res_memory, tmp_memory);
+      res_memory = tmp_memory;


As my understanding, MKLDNNCopy already reordered the res_memory with tmp_memory, so why we need to assign tmp_memory to res_memory again?

in line 224 we use res_memory and add it with mem.

azai91 · 2018-06-28T07:05:46Z

@pengzhao-intel does my answer above address the issue?

zheng-da · 2018-07-03T22:08:59Z

src/common/exec_utils.h

      CHECK(temp.IsDefaultData());
+#else
+      NDArray temp = bufs != nullptr ? bufs->at(i) : NDArray(nd.shape(), nd.ctx(),
+          true, nd.dtype());


zheng-da · 2018-07-03T22:14:50Z

tests/cpp/operator/mkldnn.cc

@@ -412,6 +422,8 @@ OpAttrs GetReluBackwardsOp() {
  attrs.dispatches.resize(2);
  attrs.dispatches[0] = DispatchMode::kFCompute;
  attrs.dispatches[1] = DispatchMode::kFComputeEx;
+  attrs.requests.insert(OpReqType::kWriteTo);
+  attrs.requests.insert(OpReqType::kWriteInplace);


why there isn't a kAdd test here?

…apache#11129) * fix lint * requests added to opattr * comment out addto * can invalidate kAddTo request mkldarrays * revert adding kAddTo to invalidate * use copy of output instead of creating new array * convert output to default if fallback * do not make copy when init * copyex fallback copies to old array with kAddTo * change input mem desc to output mem desc if not equal * reorder memory in commitoutput * allocate temp memory * fix var names * create helper reorder function to handle diff format/shapes * fix typos * fix typos * remove unused code * fix param * fix header files * force input memory to output * reorder2default keeps pointer to mkldnn memory * pass reference * remove extra lines * do not get raw mem from ptr * remove isView check * fallback writes back to output * remove redundant line * remove commented out code * use fallback in copy (refactor) * remove unused header * fix lint * reorder2default only if mkldnn flag * only reorder if mkldnn * does not assume 1 output * sum compares input and output shape * compare address and pd in sum * refactor mkldnnsum * fix const param * fix header * improve control flow when setting output blob * fix merge * remove kaddto comment * add reqests to operators * fix spacing * do sum in place * fix conditionals * remove redundant reqs * use wait to read all * fix lint * create multiple outputs * create multiple copies for kaddto * retrigger * retriggrer * retrigger * retrigger * another retrigger * retrigger * retrigger * another another retrigger * fix merge * retrigger * add kAddto to relu op * retrigger

azai91 force-pushed the test-kAddTo branch from ae5ad7c to b089e25 Compare June 2, 2018 01:48

azai91 force-pushed the test-kAddTo branch 8 times, most recently from 71b0db4 to 50befd6 Compare June 11, 2018 16:21

zheng-da reviewed Jun 12, 2018

View reviewed changes

azai91 force-pushed the test-kAddTo branch from 7be6940 to 9a4b299 Compare June 12, 2018 20:04

zheng-da reviewed Jun 13, 2018

View reviewed changes

pengzhao-intel mentioned this pull request Jun 19, 2018

[MKLDNN] reorder the mem format for the AddBack mode in case src & dst is different #11095

Closed

7 tasks

azai91 added 7 commits June 20, 2018 12:24

fix lint

31fdc8b

requests added to opattr

b644e02

comment out addto

612f64f

can invalidate kAddTo request mkldarrays

2c4b41e

revert adding kAddTo to invalidate

2dc646a

use copy of output instead of creating new array

5278ef9

convert output to default if fallback

3adcd8d

azai91 force-pushed the test-kAddTo branch 2 times, most recently from 8b909d5 to c055940 Compare June 26, 2018 01:08

create multiple copies for kaddto

5718651

azai91 force-pushed the test-kAddTo branch from c055940 to 5718651 Compare June 26, 2018 01:10

azai91 added 2 commits June 25, 2018 19:50

retrigger

d91df93

retriggrer

993c7aa

pengzhao-intel reviewed Jun 26, 2018

View reviewed changes

azai91 force-pushed the test-kAddTo branch from f636209 to 9f986fc Compare June 26, 2018 18:13

merge

e7d18be

azai91 force-pushed the test-kAddTo branch from 9f986fc to e7d18be Compare June 26, 2018 19:07

azai91 added 8 commits June 26, 2018 13:37

retrigger

e2a464d

retrigger

dc742c8

another retrigger

92c50f0

Merge branch 'master' into test-kAddTo

eb97f3d

retrigger

113903a

retrigger

ecbde64

another another retrigger

be84769

Merge branch 'master' into test-kAddTo

5181420

azai91 added 4 commits June 29, 2018 12:06

merge

0731a58

fix merge

ad3c70e

retrigger

2874d0a

merge

0e249f7

zheng-da reviewed Jul 3, 2018

View reviewed changes

azai91 added 2 commits July 3, 2018 16:14

add kAddto to relu op

581495f

retrigger

9e7c22e

zheng-da approved these changes Jul 8, 2018

View reviewed changes

eric-haibin-lin merged commit 0d5ebe1 into apache:master Jul 8, 2018

		fn(attrs, ctx, in_blobs, req, out_blobs);
		if (req[0] == kAddTo && outputs[0].IsMKLDNNData())

[MXNET-497] fix bugs in MKLDNN operators to handle the kAddTo request #11129

[MXNET-497] fix bugs in MKLDNN operators to handle the kAddTo request #11129

Conversation

azai91 commented Jun 1, 2018 • edited

Description

Checklist

Essentials

Changes

Comments

zheng-da commented Jun 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pengzhao-intel commented Jun 13, 2018

azai91 commented Jun 13, 2018

azai91 commented Jun 13, 2018 • edited

pengzhao-intel commented Jun 13, 2018

zheng-da commented Jun 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azai91 commented Jun 13, 2018

pengzhao-intel commented Jun 14, 2018

azai91 commented Jun 14, 2018

pengzhao-intel commented Jun 19, 2018

azai91 commented Jun 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azai91 commented Jun 28, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azai91 commented Jun 1, 2018 •

edited

azai91 commented Jun 13, 2018 •

edited