Add axes support to Dropout for variational dropout in NLP #9931

zhanghang1989 · 2018-03-01T00:57:26Z

add axes support to dropout for variational dropout in NLP

test pending
MKL part hasn't been updated

@szha Could you test this implementation? ping @yzhliu for MKL implementation

Description

(Brief description on what this PR is about)

Checklist

Essentials

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

…l part hasn't been updated

cjolivier01

what's the speed differences between the old and new default axes implementation (cpu, gpu, mkl) it can be measured by dropout_perf test

szha · 2018-03-01T18:20:27Z

src/operator/nn/dropout-inl.h

+    LOG(FATAL) << "NDim too large ";  \
+  }
+
+inline int BinaryBroadcastShapeCompact(const TShape& lshape, const TShape& rshape,


These look like a copy from src/operator/tensor/elemwise_binary_broadcast_op.h. Can we avoid copying the code?

cjolivier01 · 2018-03-01T18:28:20Z

src/operator/nn/dropout-inl.h

+  *new_lshape = TShape(odim);
+  *new_rshape = TShape(odim);
+  *new_oshape = TShape(odim);
+  index_t bl = oshape.ndim() - lshape.ndim();


nit: const index_t
should we do a CHECK()) for oshape.dim() >= lshape.dim()?

cjolivier01 · 2018-03-01T18:32:45Z

If MKL doesn't support the non-default axis behavior, it should skip using MKL for non-default axes similarly to how batch norm doesn't use MKL of CUDNN for non-default channel axis.

yzhliu · 2018-03-01T19:04:52Z

src/operator/nn/dropout.cc

  if (dshape.ndim() == 0) return false;
  out_shape->clear();
  out_shape->push_back(dshape);
+  if (param.axes.ndim() != 0) {


can be removed

This axes can be empty for normal dropout :)

yzhliu · 2018-03-01T19:08:36Z

src/operator/nn/dropout-inl.h

+    for (int i = 1; i < length; ++i) {
+      inc(&coord, oshape, &lidx, lstride, &ridx, rstride);
+      // When tuning, don't actually run the op, since it's not going to be tuned against
+      // the actual op we'll eventually be using


cannot quite get the point.

yzhliu · 2018-03-01T19:17:23Z

src/operator/nn/dropout-inl.h

+              in_data[dropout::kData].dptr<DType>(),
+              mask.dptr<DType>(), out.dptr<DType>());
+            });
+          }
        }


if MKL enables, the broadcast will not happen?

I haven't updated the MKL code for variational dropout (enabling axes). I need help with MKL

for non default axes, you can fall back to this op instead of the MKL op

Thx @cjolivier01 . I added the condition check here https://github.com/apache/incubator-mxnet/pull/9931/files#diff-4aea2cc24c0bb4e8e48face9faf4aa26R249

yzhliu · 2018-03-01T19:17:33Z

src/operator/nn/dropout-inl.h

+                                          mask.dptr<DType>(),
+                                          this->pkeep_);
+          if (req[0] != kNullOp) {
+            // broardcast mul


typo broadcast

cjolivier01 · 2018-03-02T18:48:54Z

src/operator/nn/dropout-inl.h

                                    const real_t pkeep) {
      RNG_KERNEL_LOOP(xpu, DType, id, gen, N, step, {
        const real_t rand_num = static_cast<real_t>(genImpl.uniform());
        mask_out[i] = mshadow_op::threshold::Map<real_t>(rand_num, pkeep) * (1.0f / pkeep);
-        dropout_out[i] = input_data[i] * mask_out[i];


I am not saying it needs to be done, but have you considered merging this operation with the other kernel, perhaps by deriving from broadcast_kernel or passing a modified version of the mul OP to broadcast_kernel?
Making two full passes across the memory is going to cause a performance hit due to both OMP overhead as well as CPU cache.

I am not sure I understand you clearly.
I separate the original dropout kernel into two parts: 1) BernoulliKernel 2) broad_cast Mul

I am not sure I understand you clearly.
I separate the original dropout kernel into two parts: 1) BernoulliKernel 2) broad_cast mul
so that we can enable axes support for variational dropout.

Thx

Right. What's the performance impact of using two kernels instead of one?

Thx @cjolivier01 . I get your point for efficiency. I have added a condition check for standard dropout, which has the same efficiency when none-axes provided:
https://github.com/apache/incubator-mxnet/pull/9931/files#diff-4aea2cc24c0bb4e8e48face9faf4aa26R252

cjolivier01 · 2018-03-06T01:51:04Z

src/operator/nn/dropout-inl.h

@@ -337,6 +336,7 @@ class DropoutOp {
  real_t pkeep_;
  /*! \brief Dropout mode */
  dropout::DropoutOpMode mode_;
+  TShape axes;


nit: member variable name should end in an underscore

szha

Looks good to me.

@zhanghang1989 could you provide some performance number reports for the speed difference before and after the change?

Pinging @cjolivier01 and @yzhliu for a final review. I intend to merge this as soon as possible, so I will wait for either your approval or three days for lazy consensus, whichever is earlier.

zhanghang1989 · 2018-03-06T18:29:53Z

@szha The performance of dropout should be the same as before, when no axes are given.

cjolivier01 · 2018-03-06T18:30:41Z

If you have a request for changes from a committer, you can't merge per Apache guidelines.

cjolivier01 · 2018-03-06T18:31:11Z

What is the performance impact of these changes for default axes behavior compared ot the older code?

szha · 2018-03-06T18:32:22Z

Hang suggested that if axes is empty, the behavior is exactly the same as before the change https://github.com/apache/incubator-mxnet/pull/9931/files#diff-4aea2cc24c0bb4e8e48face9faf4aa26R252

szha · 2018-03-06T18:33:29Z

Did the guideline explain how committers deal with stale reviews?

cjolivier01 · 2018-03-06T18:40:11Z

Apache say:

"A code-modification proposal may be stopped dead in its tracks by a -1 vote by a qualified voter. This constitutes a veto, and it cannot be overruled nor overridden by anyone. Vetos stand until and unless withdrawn by their casters."

I am guessing for a "stale" review (stale I would imagine > 2 months old?), a death certificate of said committer would be useful.

szha · 2018-03-06T18:41:22Z

Haha OK. Jokes aside, did Hang sufficiently address your concern?

cjolivier01 · 2018-03-06T18:46:30Z

Yeah, I'm good.

marcoabreu · 2018-03-06T23:14:22Z

Sorry to be a bit late here, but this PR has no test coverage. Could you please elaborate @yzhliu @szha ?

szha · 2018-03-06T23:32:17Z

@zhanghang1989 is working on it

zhanghang1989 · 2018-03-06T23:34:07Z

I am creating another PR for unit test.

marcoabreu · 2018-03-07T00:17:15Z

Next time please make sure to have code changes and tests within the same PR instead of splitting them.

piiswrong · 2018-03-11T09:34:19Z

@yzhliu @zhanghang1989
Please make sure we don't merge code without test coverage next time.

zhanghang1989 · 2018-03-12T03:17:32Z

👍 Got it. My bad.

* add axes support to dropout for variational dropout, test pending, mkl part hasn't been updated * avoid copy code * fix typo * consider non broadcast case * fix backward * avoid mkl * rm redundent * condition check for standard dropout

add axes support to dropout for variational dropout, test pending, mk…

14dbff4

…l part hasn't been updated

zhanghang1989 requested a review from cjolivier01 as a code owner March 1, 2018 00:57

cjolivier01 reviewed Mar 1, 2018

View reviewed changes

szha self-assigned this Mar 1, 2018

szha reviewed Mar 1, 2018

View reviewed changes

cjolivier01 suggested changes Mar 1, 2018

View reviewed changes

yzhliu reviewed Mar 1, 2018

View reviewed changes

zhanghang1989 added 2 commits March 2, 2018 10:37

avoid copy code

372a154

fix typo

2d4333b

cjolivier01 reviewed Mar 2, 2018

View reviewed changes

zhanghang1989 added 4 commits March 5, 2018 11:07

consider non broadcast case

9785fd6

fix backward

b4057b8

avoid mkl

d03be57

rm redundent

785c212

cjolivier01 reviewed Mar 6, 2018

View reviewed changes

condition check for standard dropout

6971886

szha approved these changes Mar 6, 2018

View reviewed changes

cjolivier01 approved these changes Mar 6, 2018

View reviewed changes

yzhliu merged commit 40de6ab into apache:master Mar 6, 2018

This was referenced Mar 7, 2018

Fix bug/typo for Dropout using axes #10027

Closed

fix bug for dropout with axes #10028

Closed

Fix bug for Dropout with axes, also adding unit test #10030

Merged

szha mentioned this pull request Mar 21, 2018

[MXNET-99] Upgrade to cuda 9.1 cudnn 7 #10108

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add axes support to Dropout for variational dropout in NLP #9931

Add axes support to Dropout for variational dropout in NLP #9931

zhanghang1989 commented Mar 1, 2018 •

edited

cjolivier01 left a comment •

edited

szha Mar 1, 2018

cjolivier01 Mar 1, 2018

cjolivier01 commented Mar 1, 2018

yzhliu Mar 1, 2018

zhanghang1989 Mar 2, 2018

yzhliu Mar 1, 2018

yzhliu Mar 1, 2018

zhanghang1989 Mar 2, 2018

cjolivier01 Mar 2, 2018

zhanghang1989 Mar 6, 2018

yzhliu Mar 1, 2018

cjolivier01 Mar 2, 2018

zhanghang1989 Mar 2, 2018

zhanghang1989 Mar 2, 2018

cjolivier01 Mar 2, 2018

zhanghang1989 Mar 2, 2018 •

edited

cjolivier01 Mar 6, 2018

zhanghang1989 Mar 6, 2018

szha left a comment

zhanghang1989 commented Mar 6, 2018

cjolivier01 commented Mar 6, 2018

cjolivier01 commented Mar 6, 2018

szha commented Mar 6, 2018

szha commented Mar 6, 2018

cjolivier01 commented Mar 6, 2018

szha commented Mar 6, 2018

cjolivier01 commented Mar 6, 2018

marcoabreu commented Mar 6, 2018 •

edited

szha commented Mar 6, 2018

zhanghang1989 commented Mar 6, 2018

marcoabreu commented Mar 7, 2018

piiswrong commented Mar 11, 2018

zhanghang1989 commented Mar 12, 2018

Add axes support to Dropout for variational dropout in NLP #9931

Add axes support to Dropout for variational dropout in NLP #9931

Conversation

zhanghang1989 commented Mar 1, 2018 • edited

Description

Checklist

Essentials

Changes

Comments

cjolivier01 left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjolivier01 commented Mar 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhanghang1989 Mar 2, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szha left a comment

Choose a reason for hiding this comment

zhanghang1989 commented Mar 6, 2018

cjolivier01 commented Mar 6, 2018

cjolivier01 commented Mar 6, 2018

szha commented Mar 6, 2018

szha commented Mar 6, 2018

cjolivier01 commented Mar 6, 2018

szha commented Mar 6, 2018

cjolivier01 commented Mar 6, 2018

marcoabreu commented Mar 6, 2018 • edited

szha commented Mar 6, 2018

zhanghang1989 commented Mar 6, 2018

marcoabreu commented Mar 7, 2018

piiswrong commented Mar 11, 2018

zhanghang1989 commented Mar 12, 2018

zhanghang1989 commented Mar 1, 2018 •

edited

cjolivier01 left a comment •

edited

zhanghang1989 Mar 2, 2018 •

edited

marcoabreu commented Mar 6, 2018 •

edited