[MXNET-1234] Fix shape inference problems in Activation backward #13409

larroy · 2018-11-26T15:59:41Z

Description

Exposes ReLU activation shape inference bug via C++ unit test.
Fixes Activation gradient shape inference problems due to different activation functions have different input parameters in backward, also depending on GPU / MKLDNN / CPU.
The logic now is now easier to maintain and safer.
This bug is uncovered when running C++ unit tests without MKL and without GPU, which we were not doing.

Fixes #13333

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

stu1130 · 2018-11-26T17:37:31Z

@mxnet-label-bot add [pr-work-in-progress]
Thanks for doing this @larroy

larroy · 2018-11-27T20:53:12Z

@eric-haibin-lin @KellenSunderland Please review

eric-haibin-lin · 2018-11-27T20:59:21Z

Thanks for fixing this! Do we have an unit test that can be used to verify this bug?

larroy · 2018-11-27T21:01:23Z

@eric-haibin-lin It's in the patch as stated in the description and commit messages. The first commit introduces a failing test. If you have a better idea / suggestion let me know.

https://github.com/apache/incubator-mxnet/pull/13409/files#diff-91b46355311af5a0cdff26d6e706403fR43

eric-haibin-lin · 2018-11-27T21:12:51Z

src/operator/elemwise_op_common.h

@@ -128,29 +128,33 @@ inline bool ElemwiseAttr(const nnvm::NodeAttrs& attrs,
  if (n_out != -1)
    out_size = static_cast<size_t>(n_out);

-  auto deduce = [&](std::vector<AttrType> *vec, size_t size, const char *name) {


Can we revert this change? The common practice is that if we change the elements in the vector, we will use pointer instead of const reference.

Right. We are not changing the vector in this function if you look closely. So your comment doesn't apply.

eric-haibin-lin · 2018-11-27T21:18:20Z

Oh right, I missed the cpp unit test. I thought it is going to be a python test. Nice work! One comment regarding coding style, otherwise looks good.

lebeg

Nice fix

TaoLv · 2018-11-29T13:55:51Z

src/operator/nn/activation.cc

@@ -27,10 +27,10 @@
 #include "./activation-inl.h"
 #include "../mshadow_op.h"
 #include "../tensor/elemwise_unary_op.h"
-#if MXNET_USE_MKLDNN == 1
+#if MXNET_USE_CUDNN == 0 && MXNET_USE_MKLDNN == 1


If USE_CUDNN=1, then MKL-DNN code in this file will not be compiled?

If MXNet is built with CUDNN and MKL-DNN enabled at the same time, I hope the behavior is the activation will be dispatched to CUDNN if model context is gpu, otherwise it should be dispatched to MKL-DNN implementation. But with this change, MKL-DNN implementation will not be compiled.

This is a good point, the latest commit is just a test. What I observed is that we are calculating the backward in MKLDNN even if we have CUDNN.
via ActivationGradComputeExCPU as per:

https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/activation.cc#L189

Should't backward go through ActivationGradCompute and dispatched to cpu and GPU as you suggest? in this case we would need further changes to call mkldnn from ActivationGradCompute. Or is there something I'm missing?

I‘m not sure. But I think if MKL-DNN is enabled, the execution should be dispatched to FComputeEx according to the infer storage, but seems FComputeEx is not registered for gpu Activation. Need inputs from expert of execution engine.

Seems FComputeEx takes precedence:

https://github.com/apache/incubator-mxnet/blob/master/src/imperative/imperative.cc#L65

@eric-haibin-lin @piiswrong

My commit is passing all the unit tests, as it doesn't use MKLDNN if CUDNN is enabled. I think is the right way to go as of now, since we can't selectively dispatch to MKLDNN depending on the context without a bigger refactor.

larroy · 2018-11-29T22:51:33Z

@TaoLv can you approve? or have any other concerns / comments?

TaoLv · 2018-11-30T02:22:15Z

If it's a true problem for execution engine, I'm afraid same issue would also exist in other MKL-DNN operators. Also this workaround will hurt the performance when the users run model with CPU context and mxnet-cuxxmkl package.

@pengzhao-intel thoughts?

pengzhao-intel · 2018-11-30T03:10:38Z

@larroy Thanks for the fix :) The current solution can work and resolve the problem.

However, two things I want to highlight:

Divergence
this kind of workaround will lead divergent codes which are hard to understand and maintain later
Performance degradation
As @TaoLv mentioned, definitely, the performance will drop in CUXXMKL build compared with pure MKL build. And it's really hard to debug for end-user too.

So, I think we can work together to make

An testcase/example to reproduce the issue.
We still need to futher analysis and see if there're other solutions (maybe inside MKLDNN OP).
A plan for the executor improvements
Agree @larroy's comments, this would be a big refactor or chanllange so it's better to have a plan earily.

Feel free to correct me :)

TaoLv · 2018-11-30T08:34:48Z

@larroy Could you provide a simple reproducer for this problem? We can help to take a look. Actually, I cannot reproduce it with below code snippet:

import mxnet as mx
import numpy as np
np.random.seed(12345)

num_filter = 256
num_group = 1
kernel = (3, 3)
pad = (1, 1)
shape = (1, 256, 200, 233)

x = mx.sym.Variable('x')
w = mx.sym.Variable('w')

conv = mx.sym.Convolution(data=x, weight=w, num_filter=num_filter, num_group=num_group, kernel=kernel, no_bias=True, pad=pad)
relu = mx.sym.Activation(data=conv, act_type='relu', name='relu')
exe = relu.simple_bind(ctx=mx.gpu(), x=shape)

exe.arg_arrays[0][:] = np.random.normal(size=exe.arg_arrays[0].shape)
exe.arg_arrays[1][:] = np.random.normal(size=exe.arg_arrays[1].shape)

for i in range(10):
    exe.forward(is_train=True)
    exe.backward(exe.outputs[0])
    o = exe.grad_arrays[0]
    t = o.asnumpy()

larroy · 2018-11-30T17:49:43Z

@TaoLv the test that reproduces the problem is in the first commit in this PR. I propose to fix for now as this bug affects builds without mkl and ARM. And address the corner case performance problem in a subsequent PR.

larroy · 2018-11-30T18:00:47Z

Just to be clear. Compile CPU without mkl and run cppunit tests to exercise the bug. Also see my first commit

larroy · 2018-11-30T20:52:41Z

Do you guys have a better suggestion? I don't think with the current architecture there's a better solution, as there's no context information in functions like:

mxnet::op::ActivationGrad::operator() activation.cc:148 to know if we are in CPU or GPU context. As MKLDNN, CUDNN and CPU have all different set of inputs for Activation Backward, I don't see another possibility other than this fix. It's also a bit of a corner case having a CPU activation backward running when CUDNN is available, I don't see the concern when most users won't have such a case. When you have a GPU available why would you run it on the CPU?

TaoLv · 2018-12-01T15:13:54Z

tests/cpp/operator/activation_perf.cc

+    test::op::CoreOperatorRunner<float> runner;
+    runner.RunBidirectional(false, { shape }, test::op::CoreOpExecutor<float>::ArgsWithOpName(
+            activation_args, "Activation", "_backward_Activation"), 1);
+  }


I tried this test case on my machine with both MKL-DNN and CUDNN enabled, without the last commit. All the 5 activations run into MXNet CPU implementation (ActivationCompute<cpu> and ActivationGradCompute<cpu>). If set isGPU=true, all the 5 activations run into GPU implementation (ActivationCompute<gpu> and ActivationGradCompute<gpu>). Seems no MKL-DNN implementation was triggered for both isGPU=false and isGPU=true.

@larroy How to reproduce "What I observed is that we are calculating the backward in MKLDNN even if we have CUDNN." ?

@TaoLv you are right, this statement is not true in the master branch. I retract it.

We can use this patch to reproduce the bug and see what runs and doesn't run:

then run:

build/tests/mxnet_unit_tests --gtest_filter="ACTIVATION_PERF.ExecuteBidirectional"

diff --git a/src/operator/nn/activation-inl.h b/src/operator/nn/activation-inl.h index 2705177..c656e33 100644 --- a/src/operator/nn/activation-inl.h +++ b/src/operator/nn/activation-inl.h @@ -88,6 +88,7 @@ void ActivationForward(const OpContext &ctx, const TBlob &in_data, const OpReqType &req, const TBlob &out_data) { using namespace mshadow; using namespace mshadow::expr; + std::cout << "ActivationForward" << std::endl; Stream<xpu> *s = ctx.get_stream<xpu>(); const size_t sz = in_data.shape_.Size(); if (sz) { @@ -104,6 +105,7 @@ template<typename xpu, typename ForwardOp, typename BackwardOp> void ActivationBackward(const OpContext &ctx, const TBlob &out_grad, const TBlob &out_data, const OpReqType &req, const TBlob &in_grad) { + std::cout << "ActivationBackward" << std::endl; using namespace mshadow; using namespace mshadow::expr; Stream<xpu> *s = ctx.get_stream<xpu>(); @@ -123,6 +125,7 @@ template<typename xpu> void ActivationComputeImpl(const nnvm::NodeAttrs& attrs, const OpContext &ctx, const std::vector<TBlob>& inputs, const std::vector<OpReqType>& req, const std::vector<TBlob>& outputs) { + std::cout << "ActivationComputeImpl" << std::endl; const ActivationParam& param = nnvm::get<ActivationParam>(attrs.parsed); switch (param.act_type) { case activation::kReLU: @@ -154,6 +157,7 @@ template<typename xpu> void ActivationGradComputeImpl(const nnvm::NodeAttrs& attrs, const OpContext &ctx, const std::vector<TBlob>& inputs, const std::vector<OpReqType>& req, const std::vector<TBlob>& outputs) { + std::cout << "ActivationGradComputeImpl" << std::endl; const ActivationParam& param = nnvm::get<ActivationParam>(attrs.parsed); switch (param.act_type) { case activation::kReLU: @@ -187,6 +191,7 @@ void ActivationCompute(const nnvm::NodeAttrs& attrs, const std::vector<TBlob>& inputs, const std::vector<OpReqType>& req, const std::vector<TBlob>& outputs) { + std::cout << "ActivationCompute" << std::endl; CHECK_EQ(inputs.size(), 1U); CHECK_EQ(outputs.size(), 1U); ActivationComputeImpl<xpu>(attrs, ctx, inputs, req, outputs); @@ -198,6 +203,7 @@ void ActivationGradCompute(const nnvm::NodeAttrs& attrs, const std::vector<TBlob>& inputs, const std::vector<OpReqType>& req, const std::vector<TBlob>& outputs) { + std::cout << "ActivationGradCompute" << std::endl; const ActivationParam& param = nnvm::get<ActivationParam>(attrs.parsed); #if (MXNET_USE_CUDNN == 1 || MXNET_USE_MKLDNN == 1) bool relu = param.act_type == activation::kReLU; diff --git a/src/operator/nn/activation.cc b/src/operator/nn/activation.cc index ba44ebd..cb5d5ab 100644 --- a/src/operator/nn/activation.cc +++ b/src/operator/nn/activation.cc @@ -72,6 +72,7 @@ static void ActivationComputeExCPU(const nnvm::NodeAttrs& attrs, const std::vector<NDArray>& inputs, const std::vector<OpReqType>& req, const std::vector<NDArray>& outputs) { + std::cout << "ActivationComputeExCPU" << std::endl; CHECK_EQ(inputs.size(), 1U); CHECK_EQ(outputs.size(), 1U); if (SupportMKLDNN(inputs[0])) { @@ -88,6 +89,7 @@ void ActivationGradComputeExCPU(const nnvm::NodeAttrs& attrs, const std::vector<NDArray>& inputs, const std::vector<OpReqType>& req, const std::vector<NDArray>& outputs) { + std::cout << "ActivationGradComputeExCPU" << std::endl; const ActivationParam& param = nnvm::get<ActivationParam>(attrs.parsed); bool relu = param.act_type == activation::kReLU; CHECK_EQ(inputs.size(), relu ? 2U : 3U); diff --git a/src/operator/nn/activation.cu b/src/operator/nn/activation.cu index 8892cc3..5aaeb78 100644 --- a/src/operator/nn/activation.cu +++ b/src/operator/nn/activation.cu @@ -51,6 +51,7 @@ void ActivationCompute<gpu>(const nnvm::NodeAttrs& attrs, const std::vector<TBlob>& inputs, const std::vector<OpReqType>& req, const std::vector<TBlob>& outputs) { + std::cout << "ActivationCompute (GPU)" << std::endl; CHECK_EQ(inputs.size(), 1U); CHECK_EQ(outputs.size(), 1U); const ActivationParam& param = nnvm::get<ActivationParam>(attrs.parsed); @@ -75,6 +76,7 @@ void ActivationGradCompute<gpu>(const nnvm::NodeAttrs& attrs, const std::vector<TBlob>& inputs, const std::vector<OpReqType>& req, const std::vector<TBlob>& outputs) { + std::cout << "ActivationGradCompute (GPU)" << std::endl; const ActivationParam& param = nnvm::get<ActivationParam>(attrs.parsed); bool relu = param.act_type == activation::kReLU; CHECK_EQ(inputs.size(), relu ? 2U : 3U); diff --git a/tests/cpp/operator/activation_perf.cc b/tests/cpp/operator/activation_perf.cc index 1bd8ca8..bba8a3e 100644 --- a/tests/cpp/operator/activation_perf.cc +++ b/tests/cpp/operator/activation_perf.cc @@ -38,13 +38,27 @@ const kwargs_t basic_activation_args = { }; * \brief Generic bidirectional sanity test */ TEST(ACTIVATION_PERF, ExecuteBidirectional) { + using namespace std; TShape shape({5, 5}); - kwargs_t kwargs = basic_activation_args; - kwargs.push_back({"act_type", "tanh"}); - - test::op::CoreOperatorRunner<float> runner; - runner.RunBidirectional(false, { shape }, test::op::CoreOpExecutor<float>::ArgsWithOpName( - kwargs, "Activation", "_backward_Activation"), 1); + vector<string> activations = { + "relu", + "sigmoid", + "tanh", + "softrelu", + "softsign" + }; + for (const string& activation : activations) { + kwargs_t activation_args = {{"act_type", activation}}; + test::op::CoreOperatorRunner<float> runner; + runner.RunBidirectional(false, { shape }, test::op::CoreOpExecutor<float>::ArgsWithOpName( + activation_args, "Activation", "_backward_Activation"), 1); + } + for (const string& activation : activations) { + kwargs_t activation_args = {{"act_type", activation}}; + test::op::CoreOperatorRunner<float> runner; + runner.RunBidirectional(true, { shape }, test::op::CoreOpExecutor<float>::ArgsWithOpName( + activation_args, "Activation", "_backward_Activation"), 1); + } } /*!

Applied this patch to mxnet master branch and got below error:

[lvtao@mlt-gpu207 mxnet-official]$ ./mxnet_unit_tests --gtest_filter="ACTIVATION_PERF.ExecuteBidirectional" Found CUDA Device #: 0 properties: 6.0 Note: Google Test filter = ACTIVATION_PERF.ExecuteBidirectional [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from ACTIVATION_PERF [ RUN ] ACTIVATION_PERF.ExecuteBidirectional unknown file: Failure C++ exception with description "[10:31:44] src/operator/tensor/./../elemwise_op_common.h:176: Check failed: in_attrs->size() == static_cast<size_t>(n_in) (2 vs. 3) in operator Stack trace returned 10 entries: [bt] (0) ./mxnet_unit_tests(dmlc::StackTrace()+0x4a) [0x42f64a] [bt] (1) ./mxnet_unit_tests(dmlc::LogMessageFatal::~LogMessageFatal()+0x21) [0x42fca1] [bt] (2) lib/libmxnet.so(bool mxnet::op::ElemwiseType<3l, 1l>(nnvm::NodeAttrs const&, std::vector<int, std::allocator<int> >*, std::vector<int, std::allocator<int> >*)+0xe2) [0x7f1f0691bd52] [bt] (3) ./mxnet_unit_tests() [0x477593] [bt] (4) ./mxnet_unit_tests() [0x43a007] [bt] (5) ./mxnet_unit_tests() [0x479a20] [bt] (6) ./mxnet_unit_tests() [0x5e4838] [bt] (7) ./mxnet_unit_tests() [0x5e5bc8] [bt] (8) ./mxnet_unit_tests() [0x5e6119] [bt] (9) ./mxnet_unit_tests() [0x632904] " thrown in the test body. [ FAILED ] ACTIVATION_PERF.ExecuteBidirectional (27 ms) [----------] 1 test from ACTIVATION_PERF (27 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (27 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] ACTIVATION_PERF.ExecuteBidirectional 1 FAILED TEST

TaoLv · 2018-12-01T15:19:45Z

I don't see the concern when most users won't have such a case. When you have a GPU available why would you run it on the CPU?

Not necessarily true. Otherwise, there is no need for MXNet to provide a mxnet-cuxxmkl pip package.

larroy · 2018-12-03T15:48:38Z

@TaoLv When I run your python reproducer in master with my patch printing which activation runs I get everything run on CPU:

ActivationForward
ActivationGradCompute
ActivationGradComputeImpl
ActivationBackward

Is this what you would expect?

I'm in commit 9979c3c with my previous patch applied to print activations.

larroy · 2018-12-03T15:55:24Z

This never executes on MKL:

Only if you bind the relu to cpu() context it does.

import mxnet as mx
import numpy as np
np.random.seed(12345)

num_filter = 256
num_group = 1
kernel = (3, 3)
pad = (1, 1)
shape = (1, 256, 200, 233)

x = mx.sym.Variable('x')
w = mx.sym.Variable('w')

conv = mx.sym.Convolution(data=x, weight=w, num_filter=num_filter, num_group=num_group, kernel=kernel, no_bias=True, pad=pad)
relu = mx.sym.Activation(data=conv, act_type='relu', name='relu')
exe = relu.simple_bind(ctx=mx.gpu(), x=shape)

exe.arg_arrays[0][:] = mx.nd.array(np.random.normal(size=exe.arg_arrays[0].shape), ctx=mx.cpu())
exe.arg_arrays[1][:] = mx.nd.array(np.random.normal(size=exe.arg_arrays[1].shape), ctx=mx.cpu())

for i in range(10):
    exe.forward(is_train=True)
    exe.backward(exe.outputs[0])
    o = exe.grad_arrays[0]
    t = o.asnumpy()

Output is:

ActivationGradCompute
ActivationGradComputeImpl
ActivationBackward
ActivationCompute
ActivationComputeImpl
ActivationForward
ActivationGradCompute
ActivationGradComputeImpl
ActivationBackward

fixes: apache#13333

…ppening right now

larroy · 2018-12-03T16:03:50Z

@TaoLv @pengzhao-intel please check if the latest patch addresses your concerns.

larroy · 2018-12-03T16:11:03Z

One more clarification, shape inference doesn't fail in python because it doesn't call the operator using imperative. Such as Imperative::Invoke -> MXImperativeInvokeImpl which is going to call SetShapeType and thus fail on shape inference.

TaoLv · 2018-12-04T05:26:59Z

I'm validating benchmark performance with this PR. Will update soon.

TaoLv

I built MXNet from source with both CUDNN and MKL-DNN eanbled and tested the performance of AlexNet/VGG-16/ResNet-50/ResNet-152 from benchmark_score.py and didn't observe any regression. So I think my concern was addressed. Thank you for the contribution, @larroy .

larroy · 2018-12-04T14:58:55Z

@mxnet-label-bot [pr-awaiting-merge]

* Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: #13333 * Add softsign Activation to test_gluon.py * Don't disable MKLDNN

…ile (#13478) * updated to v1.5.0 * Bumped minor version from 1.4.0 to 1.5.0 on master * added Anirudh as maintainer for R package ... adding something useful and re-trigger PR check * Updated license file for clojure, onnx-tensorrt, gtest, R-package * Get the correct include path in pip package (#13452) * add find_include_path API * address reviewer comment * change return type from list to string * add unit test * address reviewer comment * address reviewer comment * address reviewer comment * address reviewer comment * fix include path problem in pip package * add comment * fix lint error * address reviewer comment * address reviewer comment * Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (#13431) * Skip flaky test #13446 (#13480) * Rewrite dataloader with process pool, improves responsiveness and reliability (#13447) * fix recordio.py * rewrite dataloader with pool * fix batch as tuple * fix prefetching * fix pylint * picklable function * use pickle * add missing commit * Fix errors in docstrings for subgraph op; use code directive (#13463) * [MXNET-1158] JVM Memory Management Documentation (#13105) * update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix * Update row_sparse tutorial (#13414) Update row_sparse tutorial * Add resiliency to onnx export code (#13426) * Added resiliency to onnx export code - With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code. * Fixed name of net in unittest * Fix pylint * [MXNET-1185] Support large array in several operators (part 1) (#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI * [MXNET-1210 ] Gluon Audio - Example (#13325) * Initialized the example * Addressed PR comments, about existing synset.txt file - no overwrite * RST - docstring issues fixed * added README * Addressed PR comments * Addressed PR comments, checking Divide by 0 * Raising error if format is not supported. * changed a line for ndarray of labels * Trigger CI * Trigger CI * PR comments addressed around skip_header argument * Addressed PR comments around librosa import * PR Comments * Passing lazy=lazy from argument * Added PR comments, labels to README.MD * Trigger CI * Addressing PR Comments in README * Modified README.md * Added example under audio folder * Retrigger CI * Retrigger CI * ONNX export: Instance normalization, Shape (#12920) * ONNX import/export: Make backend_rep common * ONNX export: Instance Normalization * ONNX export: Shape operator * Clarify dependency on OpenCV in CNN Visualization tutorial. (#13495) * clarify ops faq regarding docs strings (#13492) * Add graph_compact operator. (#13436) * add graph_compact. * fix. * add doc. * add tests for graph_compact. * address comments. * update docs. * trigger CI * Deprecate Jenkinsfile (#13474) * update github location for sampled_block.py (#13508) Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py * #13453 [Clojure] - Add Spec Validations to the Optimizer namespace (#13499) * ONNX export: Logical operators (#12852) * Fix cmake options parsing in dev_menu (#13458) Add GPU+MKLDNN unittests to dev_menu * Revert "Manually track num_max_thread (#12380)" (#13501) This reverts commit 7541021. * Feature/mkldnn static 2 (#13503) * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * remove commented code * remove mkldnn dnaymic for unitest * force static for mkldnn lib * remove dynamic mkldnn bind * only link windows * add mkldnn.mk * try force linking * remove mkldnn dynanmic check * remove test mkldnn install * fix spacing * fix index * add artifacts * add comment about windows * remove static * update makefile * fix toctree Sphinx errors (#13489) * fix toctree errors * nudging file for CI * Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (#13527) * [MXNET-1234] Fix shape inference problems in Activation backward (#13409) * Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: #13333 * Add softsign Activation to test_gluon.py * Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now * Don't disable MKLDNN

…che#13409) * Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: apache#13333 * Add softsign Activation to test_gluon.py * Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now * Don't disable MKLDNN

…ile (apache#13478) * updated to v1.5.0 * Bumped minor version from 1.4.0 to 1.5.0 on master * added Anirudh as maintainer for R package ... adding something useful and re-trigger PR check * Updated license file for clojure, onnx-tensorrt, gtest, R-package * Get the correct include path in pip package (apache#13452) * add find_include_path API * address reviewer comment * change return type from list to string * add unit test * address reviewer comment * address reviewer comment * address reviewer comment * address reviewer comment * fix include path problem in pip package * add comment * fix lint error * address reviewer comment * address reviewer comment * Use ~/.ccache as default ccache directory so is not cache is not erased on reboot (apache#13431) * Skip flaky test apache#13446 (apache#13480) * Rewrite dataloader with process pool, improves responsiveness and reliability (apache#13447) * fix recordio.py * rewrite dataloader with pool * fix batch as tuple * fix prefetching * fix pylint * picklable function * use pickle * add missing commit * Fix errors in docstrings for subgraph op; use code directive (apache#13463) * [MXNET-1158] JVM Memory Management Documentation (apache#13105) * update train_mnist * Add documentation for JVM Memory Management * update doc * address nit picks * address nit picks * Grammar and clarity edits for memory management doc * Edits for scala memory management * Update memory-management.md * Update memory-management.md * Update memory-management.md * capitalization fix * Update row_sparse tutorial (apache#13414) Update row_sparse tutorial * Add resiliency to onnx export code (apache#13426) * Added resiliency to onnx export code - With previous infer-shape implementation, if input shape was list instead of tuple or if extra non-existent parameters were provided, the code would still work. The fixes in this commit make sure that behavior is restored to prevent any compatibility issues with existing export code. * Fixed name of net in unittest * Fix pylint * [MXNET-1185] Support large array in several operators (part 1) (apache#13418) * fix a few operators with large arrays (# of elements) * fix bug in broadcast_div and add tests * address reviewer comment * add unit test * add empty line * retrigger CI * [MXNET-1210 ] Gluon Audio - Example (apache#13325) * Initialized the example * Addressed PR comments, about existing synset.txt file - no overwrite * RST - docstring issues fixed * added README * Addressed PR comments * Addressed PR comments, checking Divide by 0 * Raising error if format is not supported. * changed a line for ndarray of labels * Trigger CI * Trigger CI * PR comments addressed around skip_header argument * Addressed PR comments around librosa import * PR Comments * Passing lazy=lazy from argument * Added PR comments, labels to README.MD * Trigger CI * Addressing PR Comments in README * Modified README.md * Added example under audio folder * Retrigger CI * Retrigger CI * ONNX export: Instance normalization, Shape (apache#12920) * ONNX import/export: Make backend_rep common * ONNX export: Instance Normalization * ONNX export: Shape operator * Clarify dependency on OpenCV in CNN Visualization tutorial. (apache#13495) * clarify ops faq regarding docs strings (apache#13492) * Add graph_compact operator. (apache#13436) * add graph_compact. * fix. * add doc. * add tests for graph_compact. * address comments. * update docs. * trigger CI * Deprecate Jenkinsfile (apache#13474) * update github location for sampled_block.py (apache#13508) Updated to https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/sampled_block.py * apache#13453 [Clojure] - Add Spec Validations to the Optimizer namespace (apache#13499) * ONNX export: Logical operators (apache#12852) * Fix cmake options parsing in dev_menu (apache#13458) Add GPU+MKLDNN unittests to dev_menu * Revert "Manually track num_max_thread (apache#12380)" (apache#13501) This reverts commit 7541021. * Feature/mkldnn static 2 (apache#13503) * build mkldnn as static lib * update makefile to statically build mkldnn * build static mkldnn * fix static name * fix static name * update static for mac * rename mkldnn dep in ci * remove moving mkldnn dynamic lib * remove commented code * remove mkldnn dnaymic for unitest * force static for mkldnn lib * remove dynamic mkldnn bind * only link windows * add mkldnn.mk * try force linking * remove mkldnn dynanmic check * remove test mkldnn install * fix spacing * fix index * add artifacts * add comment about windows * remove static * update makefile * fix toctree Sphinx errors (apache#13489) * fix toctree errors * nudging file for CI * Disabled flaky test test_gluon_data.test_recordimage_dataset_with_data_loader_multiworker (apache#13527) * [MXNET-1234] Fix shape inference problems in Activation backward (apache#13409) * Provide a failing test for ReLU activation shape inference bug * Fix Activation backward shape inference fixes: apache#13333 * Add softsign Activation to test_gluon.py * Use activation in GPU if we are using CUDNN and not MKLDNN as it's happening right now * Don't disable MKLDNN

larroy requested a review from anirudh2290 as a code owner November 26, 2018 15:59

larroy force-pushed the relu_activation branch from 597f9e5 to c807d74 Compare November 26, 2018 16:06

marcoabreu added the pr-work-in-progress PR is still work in progress label Nov 26, 2018

larroy force-pushed the relu_activation branch 4 times, most recently from e3e5973 to c03d0ef Compare November 27, 2018 20:48

larroy changed the title ~~[WIP] Provide a failing test for ReLU activation shape inference bug~~ Fix shape inference problems in Activation backward Nov 27, 2018

eric-haibin-lin reviewed Nov 27, 2018

View reviewed changes

larroy force-pushed the relu_activation branch from c03d0ef to 33b436c Compare November 27, 2018 22:20

larroy mentioned this pull request Nov 27, 2018

[CI] Add CPU C++ test stage #13338

Merged

5 tasks

lebeg approved these changes Nov 27, 2018

View reviewed changes

larroy force-pushed the relu_activation branch 2 times, most recently from 9f404f3 to 148fd9a Compare November 28, 2018 00:50

larroy changed the title ~~Fix shape inference problems in Activation backward~~ [MXNET-1234] Fix shape inference problems in Activation backward Nov 28, 2018

larroy force-pushed the relu_activation branch from a9854d5 to 980f4a2 Compare November 29, 2018 13:42

TaoLv reviewed Nov 29, 2018

View reviewed changes

marcoabreu added pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress labels Nov 29, 2018

larroy mentioned this pull request Nov 30, 2018

[v1.4.x][MXNET-1234] Port of #13409 to the release branch #13487

Merged

5 tasks

TaoLv reviewed Dec 1, 2018

View reviewed changes

larroy force-pushed the relu_activation branch from 980f4a2 to 37581f4 Compare December 3, 2018 13:01

larroy and others added 5 commits December 3, 2018 17:02

Provide a failing test for ReLU activation shape inference bug

6de9148

Fix Activation backward shape inference

d91a85a

fixes: apache#13333

Add softsign Activation to test_gluon.py

af3006a

Use activation in GPU if we are using CUDNN and not MKLDNN as it's ha…

c965e1d

…ppening right now

Don't disable MKLDNN

7f10239

larroy force-pushed the relu_activation branch from 1c55faf to 7f10239 Compare December 3, 2018 16:02

TaoLv approved these changes Dec 4, 2018

View reviewed changes

eric-haibin-lin merged commit 7dde0eb into apache:master Dec 4, 2018

This was referenced Dec 17, 2018

update with release notes for 1.4.0 release #13657

Merged

[C++] Unit tests failing on Mac #9015

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-1234] Fix shape inference problems in Activation backward #13409

[MXNET-1234] Fix shape inference problems in Activation backward #13409

larroy commented Nov 26, 2018 •

edited

stu1130 commented Nov 26, 2018

larroy commented Nov 27, 2018

eric-haibin-lin commented Nov 27, 2018

larroy commented Nov 27, 2018 •

edited

eric-haibin-lin Nov 27, 2018

larroy Nov 27, 2018 •

edited

eric-haibin-lin commented Nov 27, 2018

lebeg left a comment

TaoLv Nov 29, 2018

TaoLv Nov 29, 2018 •

edited

larroy Nov 29, 2018 •

edited

TaoLv Nov 29, 2018

larroy Nov 29, 2018

larroy Nov 29, 2018

larroy commented Nov 29, 2018

TaoLv commented Nov 30, 2018

pengzhao-intel commented Nov 30, 2018

TaoLv commented Nov 30, 2018

larroy commented Nov 30, 2018 •

edited

larroy commented Nov 30, 2018

larroy commented Nov 30, 2018

TaoLv Dec 1, 2018 •

edited

TaoLv Dec 1, 2018

larroy Dec 3, 2018 •

edited

TaoLv Dec 4, 2018

TaoLv commented Dec 1, 2018

larroy commented Dec 3, 2018

larroy commented Dec 3, 2018 •

edited

larroy commented Dec 3, 2018

larroy commented Dec 3, 2018

TaoLv commented Dec 4, 2018

TaoLv left a comment •

edited

larroy commented Dec 4, 2018

[MXNET-1234] Fix shape inference problems in Activation backward #13409

[MXNET-1234] Fix shape inference problems in Activation backward #13409

Conversation

larroy commented Nov 26, 2018 • edited

Description

Checklist

Essentials

Checklist

Essentials

stu1130 commented Nov 26, 2018

larroy commented Nov 27, 2018

eric-haibin-lin commented Nov 27, 2018

larroy commented Nov 27, 2018 • edited

eric-haibin-lin Nov 27, 2018

Choose a reason for hiding this comment

larroy Nov 27, 2018 • edited

Choose a reason for hiding this comment

eric-haibin-lin commented Nov 27, 2018

lebeg left a comment

Choose a reason for hiding this comment

TaoLv Nov 29, 2018

Choose a reason for hiding this comment

TaoLv Nov 29, 2018 • edited

Choose a reason for hiding this comment

larroy Nov 29, 2018 • edited

Choose a reason for hiding this comment

TaoLv Nov 29, 2018

Choose a reason for hiding this comment

larroy Nov 29, 2018

Choose a reason for hiding this comment

larroy Nov 29, 2018

Choose a reason for hiding this comment

larroy commented Nov 29, 2018

TaoLv commented Nov 30, 2018

pengzhao-intel commented Nov 30, 2018

TaoLv commented Nov 30, 2018

larroy commented Nov 30, 2018 • edited

larroy commented Nov 30, 2018

larroy commented Nov 30, 2018

TaoLv Dec 1, 2018 • edited

Choose a reason for hiding this comment

TaoLv Dec 1, 2018

Choose a reason for hiding this comment

larroy Dec 3, 2018 • edited

Choose a reason for hiding this comment

TaoLv Dec 4, 2018

Choose a reason for hiding this comment

TaoLv commented Dec 1, 2018

larroy commented Dec 3, 2018

larroy commented Dec 3, 2018 • edited

larroy commented Dec 3, 2018

larroy commented Dec 3, 2018

TaoLv commented Dec 4, 2018

TaoLv left a comment • edited

Choose a reason for hiding this comment

larroy commented Dec 4, 2018

larroy commented Nov 26, 2018 •

edited

larroy commented Nov 27, 2018 •

edited

larroy Nov 27, 2018 •

edited

TaoLv Nov 29, 2018 •

edited

larroy Nov 29, 2018 •

edited

larroy commented Nov 30, 2018 •

edited

TaoLv Dec 1, 2018 •

edited

larroy Dec 3, 2018 •

edited

larroy commented Dec 3, 2018 •

edited

TaoLv left a comment •

edited