[Question] SELU weights and dropout #1

pnmartinez · 2021-04-27T12:26:54Z

Hi,

My name is Pablo Navarro. Your team and I have already exchanged a few mails over the wonderful paper you've made. Thanks again for the contribution.

Now that the code is released, I have a couple question over the implementation of the SELU activation function.

Weight init

For SELU, you force lecun_normal which is in turn a pass on the init_weights() function:

def init_weights(module, initialization):
    if type(module) == t.nn.Linear:
        if initialization == 'orthogonal':
            t.nn.init.orthogonal_(module.weight)
        elif initialization == 'he_uniform':
            t.nn.init.kaiming_uniform_(module.weight)
        elif initialization == 'he_normal':
            t.nn.init.kaiming_normal_(module.weight)
        elif initialization == 'glorot_uniform':
            t.nn.init.xavier_uniform_(module.weight)
        elif initialization == 'glorot_normal':
            t.nn.init.xavier_normal_(module.weight)
        elif initialization == 'lecun_normal':
            pass
        else:
            assert 1<0, f'Initialization {initialization} not found'

How come the weights are initialized as lecun_normal simply by passing? On my machine, default PyTorch initializes weights uniformly, not normally.

DropOut on SELU

I believe that in order to make SELU useful, you need to use AlphaDropout() instead of regular DropOut() layers (PyTorch docs).

I can't find anything wrapping AlphaDropOut() in your code. Can you point me in the right direction or give the rationale behind it?

Cheers and keep up the good work!

The text was updated successfully, but these errors were encountered:

kdgutier · 2021-04-27T13:08:56Z

DropOut and AlphaDropOut on SELU

Thanks for the comments.
As you mentioned from the paper of the scaled exponential linear units https://arxiv.org/abs/1706.02515, on page 6, they recommend not use dropout as the extra variance hinders the convergence of the algorithm when using normalization.
We observed some convergence issues when exploring the hyperparameter space. Although with optimal model configurations, the training procedure was stable.

One thing to keep in mind is that the two best regularization techniques we found in our experiments are early stopping and second ensembling. Since ensembling boosts accuracy from the diversity and variance of models, the interaction of AlphaDropOut with the ensemble might be something interesting to explore. Still, we will try the AlphaDropOut regularization to test the SELU paper recommendation on this regression setting.

pnmartinez changed the title ~~[Question] SELU activation function weights and dropout~~ [Question] SELU weights and dropout Apr 27, 2021

kdgutier mentioned this issue Apr 28, 2021

lecun_normal vs uniform initializer NBEATS Nixtla/neuralforecast#9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] SELU weights and dropout #1

[Question] SELU weights and dropout #1

pnmartinez commented Apr 27, 2021

kdgutier commented Apr 27, 2021

[Question] SELU weights and dropout #1

[Question] SELU weights and dropout #1

Comments

pnmartinez commented Apr 27, 2021

Weight init

DropOut on SELU

kdgutier commented Apr 27, 2021

DropOut and AlphaDropOut on SELU