Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Gluon PReLU, ELU, SELU, Swish #9662

Merged
merged 5 commits into from
Feb 10, 2018
Merged

Gluon PReLU, ELU, SELU, Swish #9662

merged 5 commits into from
Feb 10, 2018

Conversation

szha
Copy link
Member

@szha szha commented Feb 1, 2018

Description

Picking up #8912 (@joeddav), #9111 (@anjishnu)

Checklist

Essentials

  • Passed code style checking (make lint)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Fix LeakyReLU(act_type='prelu')
  • nn.PReLU and test
  • nn.ELU, nn.SELU, nn.Swish and tests

Comments

  • Fixed operator LeakyReLU(act_type='prelu'), since it didn't have the proper parameter or logic.

@szha szha changed the title [WIP] Gluon PReLU Gluon PReLU, ELU, SELU, Swish Feb 1, 2018
Outputs:
- **out**: output tensor with the same shape as `data`.
"""
def __init__(self, alpha_initializer='zeros', *args):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 0 initialization standard?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tensorflow/Keras uses zeros, Pytorch uses 0.25

Do we have a constant initializer to achieve the latter?

"""
def __init__(self, **kwargs):
super(SELU, self).__init__(**kwargs)
self.scale = 1.0507009873554804934193349852946
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._scale


def __init__(self, beta=1.0, **kwargs):
super(Swish, self).__init__(**kwargs)
self.beta = beta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_beta

grad_weight = sumall_except_dim<1>(F<prelu_grad>(data) * grad);
gdata = F<mshadow_op::xelu_grad>(data, mshadow::expr::broadcast<1>(weight, data.shape_))
* grad;
if (weight.shape_[0] == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain? I thought prelu was already supported

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since uyou are adding those I would suggest also adding a learnable ISRLU:
https://arxiv.org/abs/1710.09967

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing PReLU had two problems:

  1. gamma parameter was never documented and can't be passed in using kwargs.
  2. it doesn't support scalar broadcast as was attempted in add Gluon PReLU activation layer #8912

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you meant to test for scalar than you should use weight.shape_.Size()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's equivalent because weight is Tensor<xpu, 1>. I think Size() is safer choice in case weight changes definition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infershape sets it to data.shape[1].
When would weight's shape be (1,)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the weight parameter is shared across all axis, then the only one scalar value is shared everywhere, in which case the weight should be (1,)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But infershape doesn't allow this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradcar sorry that I missed your comment earlier, and thanks for sharing your work. In this PR I'd like to first focus on wrapping up the previous two PRs for activations. Since you wrote the paper, would you like to implement that in mxnet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two options, either writing it in Gluon by defining hybrid_forward in python, or extending the leaky relu operator in C for better performance.

@szha szha force-pushed the prelu branch 7 times, most recently from 38e1e0e to f149553 Compare February 3, 2018 05:44
@szha
Copy link
Member Author

szha commented Feb 3, 2018

@piiswrong addressed the infer shape issue. Let me know if you have more comments.

return F.LeakyReLU(x, gamma=alpha, act_type='prelu', name='fwd')

def __repr__(self):
s = '{name}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we handle these trivial cases in base class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


prelu = mx.gluon.nn.PReLU()
prelu.initialize()
x = point_to_validate.reshape((1, 1, 2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a different input shape that can catch the infershape problem

@@ -225,7 +242,11 @@ class LeakyReLUProp : public OperatorProperty {
const TShape &dshape = in_shape->at(leakyrelu::kData);
if (dshape.ndim() == 0) return false;
if (param_.act_type == leakyrelu::kPReLU) {
in_shape->at(leakyrelu::kGamma) = TShape(Shape1(dshape[1]));
const TShape &gshape = in_shape->at(leakyrelu::kGamma);
if (gshape.Size() != 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gshape could be empty, in which case Size is undefined.
Also if gshape.Size is 1, it could be (1,1), which is invalid

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I should check for both ndim and shape_[0] then. How do I check whether it’s undefined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if gshape is empty gshape.ndim() would be 0

@szha
Copy link
Member Author

szha commented Feb 7, 2018

@piiswrong, I addressed latest comments. Let me know if any further change is needed.

@szha
Copy link
Member Author

szha commented Feb 9, 2018

ping @piiswrong

@lvyong1943
Copy link

hello ,when i run the mlp.cpp,there is an err like this:
Symbol.ComposeKeyword argument name gamma not found.
Candidate arguments:
[0]data
when it run outputs[i] = LeakyReLU(string("act") + istr, fc, null_sym, LeakyReLUActType::kLeaky);
so,how can i solve this problem?
And I change the Symbol null_sym to Symbol null_sym("null_sym") to fix the err, this->blob_ptr_._Ptr is nullptr。

@lvyong1943
Copy link

@szha I just run the mlp.cpp file, when it run this line:
Executor* exe = new Executor(sym_out, ctx_dev, in_args, arg_grad_store,grad_req_type, aux_states);
there is an err appeared in incubator-mxnet-master\src\executor\graph_executor.cc at line 273 .
xs.emplace_back(NodeEntry{args[i], 0, 0});
The parameter ‘i’ out of range "args".

@bradcar
Copy link
Contributor

bradcar commented Apr 13, 2018

@piiswrong @szha what is the status of 9662 and prelu working? When I naively put PReLU (into a hybrid block (mxnet 1.2.0) and look at the source (activations.py) it seems that PReLU only has one learnable alpha per layer. Shouldn't each 'neuron' have its own learnable alpha?

@szha
Copy link
Member Author

szha commented Apr 13, 2018

@bradcar the leaky relu operator in 'prelu' mode supports any broadcast-able alpha shapes. Since it's impossible to infer the shape of parameter until it sees the first input, we chose to put the simplest case in the constructor.

For your use case when you need more than one alpha parameters, you can simply use the operator.

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018
* prelu, elu, selu, swish

* update

* fix infer shape

* update infer shape

* update
zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018
* prelu, elu, selu, swish

* update

* fix infer shape

* update infer shape

* update
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants