Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix deconvolution / PR 13421 #13433

Merged
merged 8 commits into from Nov 30, 2018
Merged

Conversation

azai91
Copy link
Contributor

@azai91 azai91 commented Nov 28, 2018

Description

PR to fix issues - #13421. Added unit test for deconvolution inference and reverted refactor change to mkldnn deconvolution inference commmited in #11778.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • refactor change to mkldnn deconvolution inference.

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@azai91
Copy link
Contributor Author

azai91 commented Nov 28, 2018

@TaoLv @anirudh2290

@vandanavk
Copy link
Contributor

vandanavk commented Nov 28, 2018

@mxnet-label-bot add [MKLDNN, pr-awaiting-review]

@marcoabreu marcoabreu added MKLDNN pr-awaiting-review PR is waiting for code review labels Nov 28, 2018
@@ -398,6 +398,24 @@ def softmax_forward(input_data, true_output):
softmax_forward(mx.nd.array([[[[-3.4e38,-3.4e38]]]]), np.array([1.0,1.0]))
softmax_forward(mx.nd.array([[[[3.4e38,3.4e38]]]]), np.array([1.0,1.0]))

def test_deconvolution_inference():
np.random.seed(12345)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set random seed @with_seed()

@TaoLv
Copy link
Member

TaoLv commented Nov 28, 2018

@juliusshufan Could you help to check if the issue of DCGAN can be fixed via this PR? Thank you.

Copy link
Member

@TaoLv TaoLv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azai91 Could help to check why this issue was not exposed in tests/python/gpu/test_operator_gpu.py? I think there are several deconv tests.

@@ -398,6 +398,24 @@ def softmax_forward(input_data, true_output):
softmax_forward(mx.nd.array([[[[-3.4e38,-3.4e38]]]]), np.array([1.0,1.0]))
softmax_forward(mx.nd.array([[[[3.4e38,3.4e38]]]]), np.array([1.0,1.0]))

@with_seed(12345)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use a fixed seed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TaoLv do you mean tests/python/unittest/test_operator.py. The tests in tests/python/gpu/test_operator_gpu.py are only meant to run with gpu context.
I see that there are tests in tests/python/unittest/test_operator.py for deconv but they are also skipped because of flakiness. Doesnt seem like we have any coverage on the Deconvolution operator for CPU

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm. forgot about the check_consistency tests.

@azai91
Copy link
Contributor Author

azai91 commented Nov 28, 2018

@azai91
Copy link
Contributor Author

azai91 commented Nov 28, 2018

the difference is that the unit test does not use filter length that is divisible by 8. When the the input shape has a filter size that is divisible by 8, then mkldnn reorders the data and thus the check fails. the gpu tests uses filter length of 8.

exe = y.simple_bind(ctx=mx.cpu(), x=shape, grad_req='null')
exe.arg_arrays[0][:] = np.random.normal(size=exe.arg_arrays[0].shape)
exe.arg_arrays[1][:] = np.random.normal(size=exe.arg_arrays[1].shape)
for i in range(10):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need this 10 times.

for i in range(10):
exe.forward(is_train=False)
o = exe.outputs[0]
t = o.asnumpy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just do exe.outputs[0].wait_to_read()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. this is only dependent on the shape and not values.

@TaoLv
Copy link
Member

TaoLv commented Nov 29, 2018

the difference is that the unit test does not use filter length that is divisible by 8. When the the input shape has a filter size that is divisible by 8, then mkldnn reorders the data and thus the check fails. the gpu tests uses filter length of 8.

Nice catch. Do you mind adding new test shapes to the tests in test_operator_gpu.py? Then there is no need for us to add and maintain a new test case for MKL-DNN specific.

@juliusshufan
Copy link
Contributor

@azai91 The previously encountered issue can't be reproduced on applying your PR on the MXNET repo.

@azai91
Copy link
Contributor Author

azai91 commented Nov 29, 2018

what do you mean? the test changes don't catch the issue on the master branch which currently has the bug?

@juliusshufan
Copy link
Contributor

juliusshufan commented Nov 29, 2018 via email

@TaoLv
Copy link
Member

TaoLv commented Nov 29, 2018

@juliusshufan Please approve if this PR fix your issue. Thanks.

@anirudh2290
Copy link
Member

anirudh2290 commented Nov 29, 2018

@TaoLv from @juliusshufan 's comment this PR fixed his failing test.

@anirudh2290 anirudh2290 merged commit 8a3bd9b into apache:master Nov 30, 2018
@TaoLv
Copy link
Member

TaoLv commented Nov 30, 2018

Thanks, @azai91 @anirudh2290 @juliusshufan . Now I will close the corresponding issue.

@TaoLv
Copy link
Member

TaoLv commented Nov 30, 2018

@azai91 Do you mind porting this PR to v1.4.x?

azai91 added a commit to azai91/incubator-mxnet that referenced this pull request Nov 30, 2018
* add test case

* revert refactor

* use with seed decorator

* retrigger

* remove seed

* remove iteration

* remove old test

* update deconvolution test to have filter length that triggers mkldnn reorder
@azai91 azai91 mentioned this pull request Nov 30, 2018
6 tasks
anirudh2290 pushed a commit that referenced this pull request Dec 1, 2018
* add test case

* revert refactor

* use with seed decorator

* retrigger

* remove seed

* remove iteration

* remove old test

* update deconvolution test to have filter length that triggers mkldnn reorder
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
MKLDNN pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants