Gluon PReLU, ELU, SELU, Swish #9662

szha · 2018-02-01T04:21:43Z

Description

Picking up #8912 (@joeddav), #9111 (@anjishnu)

Checklist

Essentials

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Fix LeakyReLU(act_type='prelu')
nn.PReLU and test
nn.ELU, nn.SELU, nn.Swish and tests

Comments

Fixed operator LeakyReLU(act_type='prelu'), since it didn't have the proper parameter or logic.

piiswrong · 2018-02-01T07:31:21Z

python/mxnet/gluon/nn/activations.py

+    Outputs:
+        - **out**: output tensor with the same shape as `data`.
+    """
+    def __init__(self, alpha_initializer='zeros', *args):


Is 0 initialization standard?

Tensorflow/Keras uses zeros, Pytorch uses 0.25

Do we have a constant initializer to achieve the latter?

piiswrong · 2018-02-01T07:31:55Z

python/mxnet/gluon/nn/activations.py

+    """
+    def __init__(self, **kwargs):
+        super(SELU, self).__init__(**kwargs)
+        self.scale = 1.0507009873554804934193349852946


self._scale

piiswrong · 2018-02-01T07:32:12Z

python/mxnet/gluon/nn/activations.py

+
+    def __init__(self, beta=1.0, **kwargs):
+        super(Swish, self).__init__(**kwargs)
+        self.beta = beta


piiswrong · 2018-02-01T07:32:49Z

src/operator/leaky_relu-inl.h

-        grad_weight = sumall_except_dim<1>(F<prelu_grad>(data) * grad);
-        gdata = F<mshadow_op::xelu_grad>(data, mshadow::expr::broadcast<1>(weight, data.shape_))
-                * grad;
+        if (weight.shape_[0] == 1) {


Could you explain? I thought prelu was already supported

Since uyou are adding those I would suggest also adding a learnable ISRLU:
https://arxiv.org/abs/1710.09967

Existing PReLU had two problems:

gamma parameter was never documented and can't be passed in using kwargs.

it doesn't support scalar broadcast as was attempted in add Gluon PReLU activation layer #8912

If you meant to test for scalar than you should use weight.shape_.Size()

It's equivalent because weight is Tensor<xpu, 1>. I think Size() is safer choice in case weight changes definition.

Infershape sets it to data.shape[1].
When would weight's shape be (1,)?

If the weight parameter is shared across all axis, then the only one scalar value is shared everywhere, in which case the weight should be (1,)

But infershape doesn't allow this

@bradcar sorry that I missed your comment earlier, and thanks for sharing your work. In this PR I'd like to first focus on wrapping up the previous two PRs for activations. Since you wrote the paper, would you like to implement that in mxnet?

There are two options, either writing it in Gluon by defining hybrid_forward in python, or extending the leaky relu operator in C for better performance.

szha · 2018-02-03T07:10:45Z

@piiswrong addressed the infer shape issue. Let me know if you have more comments.

piiswrong · 2018-02-05T19:22:30Z

python/mxnet/gluon/nn/activations.py

+        return F.LeakyReLU(x, gamma=alpha, act_type='prelu', name='fwd')
+
+    def __repr__(self):
+        s = '{name}'


Can we handle these trivial cases in base class?

piiswrong · 2018-02-05T19:29:24Z

tests/python/unittest/test_gluon.py

+
+    prelu = mx.gluon.nn.PReLU()
+    prelu.initialize()
+    x = point_to_validate.reshape((1, 1, 2))


use a different input shape that can catch the infershape problem

piiswrong · 2018-02-05T19:30:36Z

src/operator/leaky_relu-inl.h

@@ -225,7 +242,11 @@ class LeakyReLUProp : public OperatorProperty {
    const TShape &dshape = in_shape->at(leakyrelu::kData);
    if (dshape.ndim() == 0) return false;
    if (param_.act_type == leakyrelu::kPReLU) {
-      in_shape->at(leakyrelu::kGamma) = TShape(Shape1(dshape[1]));
+      const TShape &gshape = in_shape->at(leakyrelu::kGamma);
+      if (gshape.Size() != 1)


gshape could be empty, in which case Size is undefined.
Also if gshape.Size is 1, it could be (1,1), which is invalid

So, I should check for both ndim and shape_[0] then. How do I check whether it’s undefined?

if gshape is empty gshape.ndim() would be 0

szha · 2018-02-07T22:15:33Z

@piiswrong, I addressed latest comments. Let me know if any further change is needed.

szha · 2018-02-09T21:55:08Z

ping @piiswrong

lvyong1943 · 2018-04-06T14:26:31Z

hello ,when i run the mlp.cpp,there is an err like this:
Symbol.ComposeKeyword argument name gamma not found.
Candidate arguments:
[0]data
when it run outputs[i] = LeakyReLU(string("act") + istr, fc, null_sym, LeakyReLUActType::kLeaky);
so,how can i solve this problem？
And I change the Symbol null_sym to Symbol null_sym("null_sym") to fix the err, this->blob_ptr_._Ptr is nullptr。

szha · 2018-04-06T16:52:00Z

@lvyong1943 https://github.com/apache/incubator-mxnet/pull/9662/files#diff-be63e28c2263dcd0e3a7116463607037R72

lvyong1943 · 2018-04-07T05:36:06Z

@szha I just run the mlp.cpp file, when it run this line：
Executor* exe = new Executor(sym_out, ctx_dev, in_args, arg_grad_store,grad_req_type, aux_states);
there is an err appeared in incubator-mxnet-master\src\executor\graph_executor.cc at line 273 .
xs.emplace_back(NodeEntry{args[i], 0, 0});
The parameter ‘i’ out of range "args".

bradcar · 2018-04-13T18:37:22Z

@piiswrong @szha what is the status of 9662 and prelu working? When I naively put PReLU (into a hybrid block (mxnet 1.2.0) and look at the source (activations.py) it seems that PReLU only has one learnable alpha per layer. Shouldn't each 'neuron' have its own learnable alpha?

szha · 2018-04-13T18:53:44Z

@bradcar the leaky relu operator in 'prelu' mode supports any broadcast-able alpha shapes. Since it's impossible to infer the shape of parameter until it sees the first input, we chose to put the simplest case in the constructor.

For your use case when you need more than one alpha parameters, you can simply use the operator.

* prelu, elu, selu, swish * update * fix infer shape * update infer shape * update

szha force-pushed the prelu branch from 9e45264 to 209ea7b Compare February 1, 2018 04:51

szha changed the title ~~[WIP] Gluon PReLU~~ Gluon PReLU, ELU, SELU, Swish Feb 1, 2018

szha force-pushed the prelu branch from 209ea7b to f5706ee Compare February 1, 2018 04:59

piiswrong reviewed Feb 1, 2018

View reviewed changes

szha force-pushed the prelu branch 7 times, most recently from 38e1e0e to f149553 Compare February 3, 2018 05:44

szha added 3 commits February 3, 2018 10:56

prelu, elu, selu, swish

0f65ead

update

e15a1d4

fix infer shape

480b3ca

szha force-pushed the prelu branch from f149553 to 480b3ca Compare February 3, 2018 18:57

piiswrong reviewed Feb 5, 2018

View reviewed changes

update infer shape

f114aa5

update

33f0941

szha force-pushed the prelu branch from 2933ff6 to 33f0941 Compare February 10, 2018 01:28

piiswrong merged commit 95bd0fc into apache:master Feb 10, 2018

szha deleted the prelu branch February 11, 2018 02:01

cjolivier01 mentioned this pull request Mar 12, 2018

Implementing a new activation function in Gluon #8422

Closed

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018

Gluon PReLU, ELU, SELU, Swish (apache#9662)

1433704

* prelu, elu, selu, swish * update * fix infer shape * update infer shape * update

zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018

Gluon PReLU, ELU, SELU, Swish (apache#9662)

547c1d0

* prelu, elu, selu, swish * update * fix infer shape * update infer shape * update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gluon PReLU, ELU, SELU, Swish #9662

Gluon PReLU, ELU, SELU, Swish #9662

szha commented Feb 1, 2018 •

edited

Loading

piiswrong Feb 1, 2018

szha Feb 1, 2018

piiswrong Feb 1, 2018

piiswrong Feb 1, 2018

piiswrong Feb 1, 2018

bradcar Feb 1, 2018

szha Feb 1, 2018

piiswrong Feb 1, 2018

szha Feb 1, 2018

piiswrong Feb 1, 2018

szha Feb 1, 2018

piiswrong Feb 2, 2018

szha Feb 3, 2018

szha Feb 3, 2018

szha commented Feb 3, 2018

piiswrong Feb 5, 2018

szha Feb 5, 2018

piiswrong Feb 5, 2018

piiswrong Feb 5, 2018

szha Feb 5, 2018

piiswrong Feb 6, 2018

szha commented Feb 7, 2018

szha commented Feb 9, 2018

lvyong1943 commented Apr 6, 2018

szha commented Apr 6, 2018

lvyong1943 commented Apr 7, 2018

bradcar commented Apr 13, 2018

szha commented Apr 13, 2018

Gluon PReLU, ELU, SELU, Swish #9662

Gluon PReLU, ELU, SELU, Swish #9662

Conversation

szha commented Feb 1, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szha commented Feb 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szha commented Feb 7, 2018

szha commented Feb 9, 2018

lvyong1943 commented Apr 6, 2018

szha commented Apr 6, 2018

lvyong1943 commented Apr 7, 2018

bradcar commented Apr 13, 2018

szha commented Apr 13, 2018

szha commented Feb 1, 2018 •

edited

Loading