Fix deconvolution / PR 13421 #13433

azai91 · 2018-11-28T01:19:37Z

Description

PR to fix issues - #13421. Added unit test for deconvolution inference and reverted refactor change to mkldnn deconvolution inference commmited in #11778.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

refactor change to mkldnn deconvolution inference.

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

azai91 · 2018-11-28T01:19:53Z

@TaoLv @anirudh2290

vandanavk · 2018-11-28T01:40:59Z

@mxnet-label-bot add [MKLDNN, pr-awaiting-review]

pengzhao-intel · 2018-11-28T07:07:17Z

tests/python/mkl/test_mkldnn.py

@@ -398,6 +398,24 @@ def softmax_forward(input_data, true_output):
    softmax_forward(mx.nd.array([[[[-3.4e38,-3.4e38]]]]), np.array([1.0,1.0]))
    softmax_forward(mx.nd.array([[[[3.4e38,3.4e38]]]]), np.array([1.0,1.0]))

+def test_deconvolution_inference():
+    np.random.seed(12345)


set random seed @with_seed()

TaoLv · 2018-11-28T08:52:47Z

@juliusshufan Could you help to check if the issue of DCGAN can be fixed via this PR? Thank you.

TaoLv

@azai91 Could help to check why this issue was not exposed in tests/python/gpu/test_operator_gpu.py? I think there are several deconv tests.

TaoLv · 2018-11-28T14:53:27Z

tests/python/mkl/test_mkldnn.py

@@ -398,6 +398,24 @@ def softmax_forward(input_data, true_output):
    softmax_forward(mx.nd.array([[[[-3.4e38,-3.4e38]]]]), np.array([1.0,1.0]))
    softmax_forward(mx.nd.array([[[[3.4e38,3.4e38]]]]), np.array([1.0,1.0]))

+@with_seed(12345)


No need to use a fixed seed here.

@TaoLv do you mean tests/python/unittest/test_operator.py. The tests in tests/python/gpu/test_operator_gpu.py are only meant to run with gpu context.
I see that there are tests in tests/python/unittest/test_operator.py for deconv but they are also skipped because of flakiness. Doesnt seem like we have any coverage on the Deconvolution operator for CPU

nvm. forgot about the check_consistency tests.

azai91 · 2018-11-28T17:22:16Z

@TaoLv I think you're referring to this test? https://github.com/apache/incubator-mxnet/blob/dc3648bd910092c2ded2d67b6fb66d48d91f55ce/tests/python/gpu/test_operator_gpu.py#L628

looking into

azai91 · 2018-11-28T19:35:04Z

the difference is that the unit test does not use filter length that is divisible by 8. When the the input shape has a filter size that is divisible by 8, then mkldnn reorders the data and thus the check fails. the gpu tests uses filter length of 8.

anirudh2290 · 2018-11-28T20:28:45Z

tests/python/mkl/test_mkldnn.py

+    exe = y.simple_bind(ctx=mx.cpu(), x=shape, grad_req='null')
+    exe.arg_arrays[0][:] = np.random.normal(size=exe.arg_arrays[0].shape)
+    exe.arg_arrays[1][:] = np.random.normal(size=exe.arg_arrays[1].shape)
+    for i in range(10):


nit: do we need this 10 times.

anirudh2290 · 2018-11-28T20:30:37Z

tests/python/mkl/test_mkldnn.py

+    for i in range(10):
+        exe.forward(is_train=False)
+        o = exe.outputs[0]
+        t = o.asnumpy()


can we just do exe.outputs[0].wait_to_read()

you're right. this is only dependent on the shape and not values.

TaoLv · 2018-11-29T00:51:18Z

the difference is that the unit test does not use filter length that is divisible by 8. When the the input shape has a filter size that is divisible by 8, then mkldnn reorders the data and thus the check fails. the gpu tests uses filter length of 8.

Nice catch. Do you mind adding new test shapes to the tests in test_operator_gpu.py? Then there is no need for us to add and maintain a new test case for MKL-DNN specific.

…reorder

juliusshufan · 2018-11-29T08:10:22Z

@azai91 The previously encountered issue can't be reproduced on applying your PR on the MXNET repo.

azai91 · 2018-11-29T08:54:21Z

what do you mean? the test changes don't catch the issue on the master branch which currently has the bug?

juliusshufan · 2018-11-29T10:37:15Z

@azai91 I can also reproduce the issue on an internal test case I am tracking, this case involving deconv, and the test case can pass with your fix.

TaoLv · 2018-11-29T12:26:05Z

@juliusshufan Please approve if this PR fix your issue. Thanks.

anirudh2290 · 2018-11-29T19:59:11Z

@TaoLv from @juliusshufan 's comment this PR fixed his failing test.

TaoLv · 2018-11-30T02:13:57Z

Thanks, @azai91 @anirudh2290 @juliusshufan . Now I will close the corresponding issue.

TaoLv · 2018-11-30T02:15:22Z

@azai91 Do you mind porting this PR to v1.4.x?

* add test case * revert refactor * use with seed decorator * retrigger * remove seed * remove iteration * remove old test * update deconvolution test to have filter length that triggers mkldnn reorder

azai91 added 2 commits November 27, 2018 16:51

add test case

b713021

revert refactor

c742913

azai91 requested a review from anirudh2290 as a code owner November 28, 2018 01:19

marcoabreu added MKLDNN pr-awaiting-review PR is waiting for code review labels Nov 28, 2018

pengzhao-intel reviewed Nov 28, 2018

View reviewed changes

azai91 added 2 commits November 28, 2018 01:00

use with seed decorator

6eb5b06

retrigger

1f32ac1

TaoLv reviewed Nov 28, 2018

View reviewed changes

remove seed

0d4160c

anirudh2290 reviewed Nov 28, 2018

View reviewed changes

remove iteration

34c8ef7

azai91 added 2 commits November 28, 2018 16:59

remove old test

e310258

update deconvolution test to have filter length that triggers mkldnn …

d0c6d13

…reorder

anirudh2290 approved these changes Nov 29, 2018

View reviewed changes

juliusshufan approved these changes Nov 30, 2018

View reviewed changes

anirudh2290 merged commit 8a3bd9b into apache:master Nov 30, 2018

TaoLv mentioned this pull request Nov 30, 2018

MKL-DNN deconvolution runs into crash #13421

Closed

azai91 mentioned this pull request Nov 30, 2018

[v1.4.x] Apply deconv FIx #13497

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deconvolution / PR 13421 #13433

Fix deconvolution / PR 13421 #13433

azai91 commented Nov 28, 2018

azai91 commented Nov 28, 2018

vandanavk commented Nov 28, 2018 •

edited

pengzhao-intel Nov 28, 2018

TaoLv commented Nov 28, 2018

TaoLv left a comment

TaoLv Nov 28, 2018

anirudh2290 Nov 28, 2018 •

edited

anirudh2290 Nov 28, 2018

azai91 commented Nov 28, 2018

azai91 commented Nov 28, 2018

anirudh2290 Nov 28, 2018

anirudh2290 Nov 28, 2018

azai91 Nov 28, 2018

TaoLv commented Nov 29, 2018

juliusshufan commented Nov 29, 2018

azai91 commented Nov 29, 2018

juliusshufan commented Nov 29, 2018 via email •

edited

TaoLv commented Nov 29, 2018

anirudh2290 commented Nov 29, 2018 •

edited

TaoLv commented Nov 30, 2018

TaoLv commented Nov 30, 2018

Fix deconvolution / PR 13421 #13433

Fix deconvolution / PR 13421 #13433

Conversation

azai91 commented Nov 28, 2018

Description

Checklist

Essentials

Changes

Comments

azai91 commented Nov 28, 2018

vandanavk commented Nov 28, 2018 • edited

pengzhao-intel Nov 28, 2018

Choose a reason for hiding this comment

TaoLv commented Nov 28, 2018

TaoLv left a comment

Choose a reason for hiding this comment

TaoLv Nov 28, 2018

Choose a reason for hiding this comment

anirudh2290 Nov 28, 2018 • edited

Choose a reason for hiding this comment

anirudh2290 Nov 28, 2018

Choose a reason for hiding this comment

azai91 commented Nov 28, 2018

azai91 commented Nov 28, 2018

anirudh2290 Nov 28, 2018

Choose a reason for hiding this comment

anirudh2290 Nov 28, 2018

Choose a reason for hiding this comment

azai91 Nov 28, 2018

Choose a reason for hiding this comment

TaoLv commented Nov 29, 2018

juliusshufan commented Nov 29, 2018

azai91 commented Nov 29, 2018

juliusshufan commented Nov 29, 2018 via email • edited

TaoLv commented Nov 29, 2018

anirudh2290 commented Nov 29, 2018 • edited

TaoLv commented Nov 30, 2018

TaoLv commented Nov 30, 2018

vandanavk commented Nov 28, 2018 •

edited

anirudh2290 Nov 28, 2018 •

edited

juliusshufan commented Nov 29, 2018 via email •

edited

anirudh2290 commented Nov 29, 2018 •

edited