Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Layer Normalization #7251

merged 21 commits into from Mar 7, 2019


Copy link

commented Mar 6, 2019

See also #7175

This adds 2 new ops in SameDiff: Standardize and LayerNorm.

Standardize will turn the examples in the given ndarray into zero mean, unit variance distributed values (as calculated along the given dimensions).

LayerNorm will then use Standardize and additionally apply the gain multiplication and optionally add a bias.

As the layer normalization paper ( says that it isn't really suitable for CNNs, this will only be available directly on DenseLayer, SimpleRNN and LSTM.

Aha! Link:

treo added 13 commits Feb 21, 2019
* Make Bias optional for layer norm
* drop support for axis as a NDArray param
* drop support for empty axis
Allow ReplaceNans to be used as a Pairwise transform op
Useful if NaN should be replaced with a value from an equally shaped array.
Handle stdev = 0 case
In forward pass 0/0 = 0 is good enough. In backward pass, gradient in
that point is undefined as lim x->-0 = -inf while lim x-> +0 = inf.
Instead of allowing inf or NaN to propagate, we set the gradient to 0 in
that case, as it should provide a more reasonable overall gradient in
the common case.
Remove accidentally committed LSTM configuration option for Layer nor…

Drop mention of LSTM in hasLayerNorm().

@treo treo requested a review from AlexDBlack Mar 6, 2019


This comment has been minimized.

Copy link

commented Mar 6, 2019

After discussing it with @AlexDBlack, I've decided to skip layer normalization support for LSTMs in this PR.

Because neither CuDNN nor MKL-DNN support layer normalization for LSTMs the performance hit for using it would be large enough that no one would be using it anyway. Also it would add a lot of complexity to what is already a pretty complex piece of code at the moment.

We can revisit layer normalization support for LSTMs once we start moving the layer implementations over to SameDiff.

@treo treo requested a review from AlexDBlack Mar 7, 2019

Copy link

left a comment


@treo treo merged commit b3345bc into master Mar 7, 2019

0 of 2 checks passed

Codacy/PR Quality Review Hang in there, Codacy is reviewing your Pull request.
codeclimate Code Climate is analyzing this code.

@treo treo deleted the treo/layer_norm branch Mar 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
2 participants
You can’t perform that action at this time.