Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-365] handle inplace in mkldnn FallBackCompute #10591

Merged
merged 11 commits into from
May 11, 2018

Conversation

ashokei
Copy link
Contributor

@ashokei ashokei commented Apr 18, 2018

Description

handle inplace in mkldnn FallBackCompute. This bug is noticed in mkldnn sum, if one of the inputs is cpu layout

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

if (req[i] == kWriteTo)
// ensure output does not use mkldnn mem.
// for inplace, we already converted & copied input above.
if ((req[i] == kWriteTo) || (req[i] == kWriteInplace))
const_cast<NDArray &>(outputs[i]).InvalidateMKLDNNData();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we need to handle kAddTo properly here as well. If it's kAddTo, we can convert the layout of the output array to the default format directly. since it's output array, we expect its memory will be written.

@ashokei ashokei force-pushed the mkldnn_fallback_inplace_bug branch from ba9de4d to 4a25536 Compare April 19, 2018 19:31
// for inplace, we already converted & copied input above.
if ((req[i] == kWriteTo) || (req[i] == kWriteInplace))
const_cast<NDArray &>(output).InvalidateMKLDNNData();
if (req[i] == kAddTo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you make this "else if"?

@ashokei ashokei force-pushed the mkldnn_fallback_inplace_bug branch from 4a25536 to 3c792b9 Compare April 25, 2018 20:19
@ashokei
Copy link
Contributor Author

ashokei commented Apr 26, 2018

@zheng-da added your feedback comments, @piiswrong can you please merge if ok. thanks.

@zheng-da
Copy link
Contributor

can you add a test for this?

@ashokei ashokei force-pushed the mkldnn_fallback_inplace_bug branch from 3c792b9 to 7a4c26d Compare April 27, 2018 03:36
@ashokei
Copy link
Contributor Author

ashokei commented Apr 27, 2018

@zheng-da added unittest for this, please accept review if ok , thanks

@ashokei ashokei changed the title handle inplace in mkldnn FallBackCompute [MXNET-365] handle inplace in mkldnn FallBackCompute Apr 27, 2018
@ashokei ashokei force-pushed the mkldnn_fallback_inplace_bug branch 4 times, most recently from 7d16a4b to 8f8b792 Compare May 1, 2018 22:59
@ashokei ashokei force-pushed the mkldnn_fallback_inplace_bug branch from 8f8b792 to 0a8ce05 Compare May 2, 2018 18:22
@ashokei
Copy link
Contributor Author

ashokei commented May 2, 2018

@piiswrong can you please merge, thanks.

@ashokei
Copy link
Contributor Author

ashokei commented May 4, 2018

@piiswrong ping...

@ashokei
Copy link
Contributor Author

ashokei commented May 5, 2018

@marcoabreu is this something you can merge. thanks.

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MKL tests are currently not run. Please remove --build from https://github.com/apache/incubator-mxnet/blob/master/Jenkinsfile#L112

@zheng-da
Copy link
Contributor

zheng-da commented May 5, 2018

Why do MKL tests not run? We need to enable them to test MKLDNN-specific scripts, right?

@marcoabreu
Copy link
Contributor

There is a mistake in our launch command

@ashokei ashokei force-pushed the mkldnn_fallback_inplace_bug branch 3 times, most recently from 36f0367 to fdc17d4 Compare May 7, 2018 18:42
@ashokei
Copy link
Contributor Author

ashokei commented May 8, 2018

@marcoabreu made requested changes, can we please merge

# output should have 0.3376348
assert_almost_equal(y[0, 0, 0, 0], 0.3376348)
assert_almost_equal(y[0, 0, 0, 0], 0.016711406)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remind me what this is? why do you need to change it?
BTW, you need to update the comment as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just computes with fixed 1s as input, and this is the value i see as a result with GPU or cpu (non-mkldnn mode).

@ashokei ashokei force-pushed the mkldnn_fallback_inplace_bug branch 2 times, most recently from 2511534 to bd9d4e1 Compare May 8, 2018 17:07
@ashokei ashokei force-pushed the mkldnn_fallback_inplace_bug branch from bd9d4e1 to e2ac42f Compare May 8, 2018 19:13
@ashokei
Copy link
Contributor Author

ashokei commented May 8, 2018

@marcoabreu this one is ready for merge too, made changes suggested by you. thanks.

@marcoabreu
Copy link
Contributor

Let me just retrigger CI

// ensure output does not use mkldnn mem.
// for inplace, we already converted & copied input above.
if ((req[i] == kWriteTo) || (req[i] == kWriteInplace))
const_cast<NDArray &>(output).InvalidateMKLDNNData();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this threadsafe? What happens if multiple threads try to read/write or they call it in sequence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

called in sequence, this PR only adds kWriteInplace option to existing kWriteTo handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is whether this operation does have any side effects that could cause problems if they are called from multiple threads in parallel or multiple times in sequence.
We have seen quite a few regressions in terms of race conditions related to MKLDNN and I would like to highlight threadsafety a bit more in our reviews

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are output arrays. If the dependency works correctly in the execution engine, no other threads will read or modify them when this function runs. InvalidateMKLDNNData has a side effect. so is writing data to the output arrays.

I guess you refer to the bug I reported in my other PR. As I said, the race condition isn't MKLDNN-specific. It's just MKLDNN that makes it noticeable. A lot of our tests just run the computation and don't verify the correctness of the data.

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please assess the thread safety before merging

@ashokei
Copy link
Contributor Author

ashokei commented May 8, 2018

@marcoabreu we now have CI running though these test with -- build change you suggested above.

@zheng-da
Copy link
Contributor

zheng-da commented May 9, 2018

Could you please change the name of the tests? Thanks

@ashokei ashokei force-pushed the mkldnn_fallback_inplace_bug branch from e2ac42f to 4b19536 Compare May 9, 2018 20:08
@ashokei ashokei force-pushed the mkldnn_fallback_inplace_bug branch from 4b19536 to df08174 Compare May 10, 2018 16:31
@ashokei
Copy link
Contributor Author

ashokei commented May 10, 2018

@zheng-da updated PR to use testname, @marcoabreu can you merge now, thanks.

@marcoabreu marcoabreu merged commit 5858a4f into apache:master May 11, 2018
jinhuang415 pushed a commit to jinhuang415/incubator-mxnet that referenced this pull request May 29, 2018
* handle inplace in mkldnn FallBackCompute

* add comments

* handle kAddTo in mkldnn FallBackCompute

* add PR feedback

* add unittest for mkldnn inplace sum with cpu data

* add back mkldnn engine threading unittest

* separate mkldnn install test and fix pylint issue

* remove --build from mkldnn jenkins test

* update mkldnn unittests

* update comments for mkldnn test

* remove python test doc string so unittest name is used
rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
* handle inplace in mkldnn FallBackCompute

* add comments

* handle kAddTo in mkldnn FallBackCompute

* add PR feedback

* add unittest for mkldnn inplace sum with cpu data

* add back mkldnn engine threading unittest

* separate mkldnn install test and fix pylint issue

* remove --build from mkldnn jenkins test

* update mkldnn unittests

* update comments for mkldnn test

* remove python test doc string so unittest name is used
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 8, 2018
* handle inplace in mkldnn FallBackCompute

* add comments

* handle kAddTo in mkldnn FallBackCompute

* add PR feedback

* add unittest for mkldnn inplace sum with cpu data

* add back mkldnn engine threading unittest

* separate mkldnn install test and fix pylint issue

* remove --build from mkldnn jenkins test

* update mkldnn unittests

* update comments for mkldnn test

* remove python test doc string so unittest name is used
anirudh2290 pushed a commit that referenced this pull request Jun 13, 2018
* handle inplace in mkldnn FallBackCompute

* add comments

* handle kAddTo in mkldnn FallBackCompute

* add PR feedback

* add unittest for mkldnn inplace sum with cpu data

* add back mkldnn engine threading unittest

* separate mkldnn install test and fix pylint issue

* remove --build from mkldnn jenkins test

* update mkldnn unittests

* update comments for mkldnn test

* remove python test doc string so unittest name is used
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
* handle inplace in mkldnn FallBackCompute

* add comments

* handle kAddTo in mkldnn FallBackCompute

* add PR feedback

* add unittest for mkldnn inplace sum with cpu data

* add back mkldnn engine threading unittest

* separate mkldnn install test and fix pylint issue

* remove --build from mkldnn jenkins test

* update mkldnn unittests

* update comments for mkldnn test

* remove python test doc string so unittest name is used
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants