Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-497] fix bugs in MKLDNN operators to handle the kAddTo request #11129

Merged
merged 69 commits into from
Jul 8, 2018

Conversation

azai91
Copy link
Contributor

@azai91 azai91 commented Jun 1, 2018

Description

(Brief description on what this PR is about)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • add MKLDNN tests to support kAddTo request type

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@zheng-da
Copy link
Contributor

zheng-da commented Jun 2, 2018

i don't think all operators support kAddTo

@azai91 azai91 force-pushed the test-kAddTo branch 8 times, most recently from 71b0db4 to 50befd6 Compare June 11, 2018 16:21
CHECK(temp.IsDefaultData());
#else
NDArray temp = bufs != nullptr ? bufs->at(i) : NDArray(nd.shape(), nd.ctx(),
true, nd.dtype());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sparse arrays doesn't have kAddTo? @eric-haibin-lin

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The executor won't generate kAddTo for sparse outputs. Sparse operators don't support that

CHECK(mem != nullptr);
if (mem == nullptr) {
auto tmp_memory = TmpMemMgr::Get()->Alloc(target_pd);
CopyMKLDNNMem(*res_memory, tmp_memory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just call GetMKLDNNDataReorder.
@pengzhao-intel tries to fix this bug, but doesn't have a test for his fix.
#11095

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose of this block is coerce the input memory (res) to the same size as the output ndarray's memory. the issue here is that we do not have the ndarray of the input, only the memory in res.second so we don't have the GetMKLDNNDataReorder member function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. When there is a format mismatch between arr and res, you use the format of arr. In this case, you may remove line 214 and 216 and have the sum written to mem directly. WriteInplace works in MKLDNN as long as the inputs and outputs have the same shape and format.

const_cast<NDArray &>(output).InvalidateMKLDNNData();
else if (req[i] == kAddTo)
output = outputs[i].Reorder2Default();
} else if (req[0] == kAddTo && output.IsMKLDNNData()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's necessary that in the case of fallback, the output is still an MKLDNN array?
I know the unit test can trigger the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the output is not MKLDNN then we do not need to enter this block because we will able to call the FCompute function on the output directly.

else if (req[i] == kAddTo)
output = outputs[i].Reorder2Default();
} else if (req[0] == kAddTo && output.IsMKLDNNData()) {
NDArray temp = outputs[0].Reorder2Default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be outputs[i]

fn(attrs, ctx, in_blobs, req, out_blobs);
if (req[0] == kAddTo && outputs[0].IsMKLDNNData())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the index always 0?

if (in_mem == nullptr) {
auto tmp_memory = TmpMemMgr::Get()->Alloc(target_pd);
auto input_memory = inputs[i].GetMKLDNNData();
CopyMKLDNNMem(*input_memory, tmp_memory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just call GetMKLDNNDataReorder

EXPECT_EQ(d1[i], std::fmax(d2[i], 0));
}
for (size_t i = 0; i < tmp1.shape().Size(); i++)
ASSERT_EQ(d2[i], std::fmax(d1[i], 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we use google test check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASSERT_EQ is from gtest? I am using ASSERT_EQ over EXPECT_EQ here cause I want the test to fail as soon as one comparison fails (else we get an error message from every incorrect element in the vector).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASSERT_EQ is defined by mxnet. We should use EXPECT_EQ in the unit test because we want to see all failures.

mshadow::default_real_t *d1 = in1.data().dptr<mshadow::default_real_t>();
mshadow::default_real_t *d2 = in2.data().dptr<mshadow::default_real_t>();
mshadow::default_real_t *o = out.data().dptr<mshadow::default_real_t>();
for (size_t i = 0; i < in1.shape().Size(); i++)
EXPECT_EQ(d1[i] + d2[i], o[i]);
ASSERT_EQ(d1[i] + d2[i], o[i]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here?

@pengzhao-intel
Copy link
Contributor

@azai91 Really thanks your efforts for improving the quality of MKL-DNN backend.

@azai91 @zheng-da @eric-haibin-lin @marcoabreu

Two suggestions:

1) avoiding change the MKL-DNN implementations when adding the test cases.
It will be hard to track back the changes. If it really needs to change (for the bugfix or other reasons), I think we can file another PR.

One example is #11026, this PR is marked as adding test cases, so we don't pay much attention because it's always a good thing for more cases. But the implementation was changed and led to a performance regression for resnet-50 inference (from 238 drop to 179), see below.

2) running the benchmark_score.py to verify the performance for each of CI

commit 92286c9106dd63d2bfd062f9abb0e53b071a46e4
Author: Alexander Zai azai91@gmail.com
Date:   Tue May 29 17:48:33 2018 -0700

[lvtao@mlt-skx052 image-classification]$ python benchmark_score.py
INFO:root:network: resnet-50
INFO:root:device: cpu(0)
/home/lvtao/Workspace/mxnet-official/python/mxnet/module/base_module.py:66: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
  warnings.warn(msg)
[22:09:59] src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 9437184 bytes with malloc directly
INFO:root:batch size  1, image/sec: 83.039246
INFO:root:batch size  2, image/sec: 118.022742
INFO:root:batch size  4, image/sec: 149.483457
INFO:root:batch size  8, image/sec: 162.208255
INFO:root:batch size 16, image/sec: 166.800091
INFO:root:batch size 32, image/sec: 165.821744
INFO:root:batch size 64, image/sec: 175.934763
INFO:root:batch size 128, image/sec: 179.160898
INFO:root:batch size 256, image/sec: 178.767242


commit 9514a1e39f8356f8fee6202cd86c8f20fbf301b6
Author: kpmurali 37911926+kpmurali@users.noreply.github.com
Date:   Tue May 29 17:36:35 2018 -0700

[lvtao@mlt-skx052 image-classification]$ python benchmark_score.py
INFO:root:network: resnet-50
INFO:root:device: cpu(0)
/home/lvtao/Workspace/mxnet-official/python/mxnet/module/base_module.py:66: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label'])
  warnings.warn(msg)
[22:19:23] src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 147456 bytes with malloc directly
[22:19:23] src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 589824 bytes with malloc directly
[22:19:23] src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 2359296 bytes with malloc directly
[22:19:23] src/operator/nn/mkldnn/mkldnn_base.cc:72: Allocate 9437184 bytes with malloc directly
INFO:root:batch size  1, image/sec: 89.205967
INFO:root:batch size  2, image/sec: 128.775569
INFO:root:batch size  4, image/sec: 156.363125
INFO:root:batch size  8, image/sec: 195.993911
INFO:root:batch size 16, image/sec: 219.215664
INFO:root:batch size 32, image/sec: 224.414152
INFO:root:batch size 64, image/sec: 238.657344
INFO:root:batch size 128, image/sec: 230.266770
INFO:root:batch size 256, image/sec: 225.139635

@azai91
Copy link
Contributor Author

azai91 commented Jun 13, 2018

okay. will add MKLDNN benchmark tests in a separate PR

@azai91
Copy link
Contributor Author

azai91 commented Jun 13, 2018

@pengzhao-intel is anyone working on a PR to revert the regression or is someone already reverting?

@pengzhao-intel
Copy link
Contributor

@azai91, You can try to figure out the root cause and fix it with a new PR. I don't think we need to revert the previous PR.

@zheng-da
Copy link
Contributor

@pengzhao-intel I think we should fix bugs when adding tests. We can change the title of this PR to something like "fix bugs in MKLDNN operators to handle the kAddTo request".

#if MXNET_USE_MKLDNN == 1
NDArray temp = bufs != nullptr ? bufs->at(i) : nd.IsMKLDNNData() ?
nd.Reorder2Default() : NDArray(nd.shape(), nd.ctx(), true, nd.dtype());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when MXNET_USE_MKLDNN == 1, isn't is_default the same as nd.IsMKLDNNData()?
why do you still need to check nd.IsMKLDNNData() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_default could be false because it is a sparse ndarray. in which case we cannot call reorder2default and will just make a copy like we previously did. this does mean that kAddTo won't work as we are not preserving the data to tmp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic here looks very complex. i guess you might need to check IsMKLDNNData first before checking bufs.
Also, what happens if nd is a default dense array and req is kAddTo? should we use nd directly? Should you check kAddTo more explicitly?

CHECK(mem != nullptr);
if (mem == nullptr) {
auto tmp_memory = TmpMemMgr::Get()->Alloc(target_pd);
CopyMKLDNNMem(*res_memory, tmp_memory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. When there is a format mismatch between arr and res, you use the format of arr. In this case, you may remove line 214 and 216 and have the sum written to mem directly. WriteInplace works in MKLDNN as long as the inputs and outputs have the same shape and format.

@azai91
Copy link
Contributor Author

azai91 commented Jun 13, 2018

Address regression issue in this diff #11262

@pengzhao-intel
Copy link
Contributor

@zheng-da agree. It's nice to keep consistency between the title/description with the real changes.

@azai91
Copy link
Contributor Author

azai91 commented Jun 14, 2018

working on breaking this diff into smaller diffs. will postpone trying to merge this for a couple of days.

@pengzhao-intel
Copy link
Contributor

@azai91 I have closed my PR since you have already fixed it and added the tests, thanks a lot.
Please take the try for the A3C example, and make can work :)

@azai91 azai91 force-pushed the test-kAddTo branch 2 times, most recently from 8b909d5 to c055940 Compare June 26, 2018 01:08
@azai91
Copy link
Contributor Author

azai91 commented Jun 26, 2018

@zheng-da @pengzhao-intel updated PR. please take a look when you have time.

if (mem == nullptr) {
auto tmp_memory = TmpMemMgr::Get()->Alloc(target_pd);
MKLDNNCopy(*res_memory, tmp_memory);
res_memory = tmp_memory;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As my understanding, MKLDNNCopy already reordered the res_memory with tmp_memory, so why we need to assign tmp_memory to res_memory again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in line 224 we use res_memory and add it with mem.

@azai91
Copy link
Contributor Author

azai91 commented Jun 28, 2018

@pengzhao-intel does my answer above address the issue?

CHECK(temp.IsDefaultData());
#else
NDArray temp = bufs != nullptr ? bufs->at(i) : NDArray(nd.shape(), nd.ctx(),
true, nd.dtype());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent.

@@ -412,6 +422,8 @@ OpAttrs GetReluBackwardsOp() {
attrs.dispatches.resize(2);
attrs.dispatches[0] = DispatchMode::kFCompute;
attrs.dispatches[1] = DispatchMode::kFComputeEx;
attrs.requests.insert(OpReqType::kWriteTo);
attrs.requests.insert(OpReqType::kWriteInplace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why there isn't a kAdd test here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@eric-haibin-lin eric-haibin-lin merged commit 0d5ebe1 into apache:master Jul 8, 2018
XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
…apache#11129)

* fix lint

* requests added to opattr

* comment out addto

* can invalidate kAddTo request mkldarrays

* revert adding kAddTo to invalidate

* use copy of output instead of creating new array

* convert output to default if fallback

* do not make copy when init

* copyex fallback copies to old array with kAddTo

* change input mem desc to output mem desc if not equal

* reorder memory in commitoutput

* allocate temp memory

* fix var names

* create helper reorder function to handle diff format/shapes

* fix typos

* fix typos

* remove unused code

* fix param

* fix header files

* force input memory to output

* reorder2default keeps pointer to mkldnn memory

* pass reference

* remove extra lines

* do not get raw mem from ptr

* remove isView check

* fallback writes back to output

* remove redundant line

* remove commented out code

* use fallback in copy (refactor)

* remove unused header

* fix lint

* reorder2default only if mkldnn flag

* only reorder if mkldnn

* does not assume 1 output

* sum compares input and output shape

* compare address and pd in sum

* refactor mkldnnsum

* fix const param

* fix header

* improve control flow when setting output blob

* fix merge

* remove kaddto comment

* add reqests to operators

* fix spacing

* do sum in place

* fix conditionals

* remove redundant reqs

* use wait to read all

* fix lint

* create multiple outputs

* create multiple copies for kaddto

* retrigger

* retriggrer

* retrigger

* retrigger

* another retrigger

* retrigger

* retrigger

* another another retrigger

* fix merge

* retrigger

* add kAddto to relu op

* retrigger
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants