The LayerNorm implementation #30

egg-west · 2018-10-23T08:05:19Z

I am wondering why don't you use the standard nn version of LayerNorm?
I notice the difference is the denomenator: nn.LayerNorm use the {sqrt of (variance + epsilon)} rather than {standard deviation + epsilon}

Could you clarify these 2 approaches?

codertimo · 2018-10-23T08:15:41Z

@egg-west Well the reason why I used this layer norm is "Attention All you need" implementation Annotated Transformer used this code, and just copied from there. So.. if anyone can answer this question, would be seriously awesome

briandw · 2018-10-29T23:43:31Z

I believe that they should do similar things, however there is a difference in implementation.

For a given input:
x = torch.tensor([1.,0.,0.,0.])
The Annotated Transformer version gives the output:
tensor([ 1.5000, -0.5000, -0.5000, -0.5000], grad_fn=<ThAddBackward>)

While torch.nn.LayerNorm gives:
tensor([ 1.7320, -0.5773, -0.5773, -0.5773], grad_fn=<AddcmulBackward>)

The layer_norm implementation in PyTorch is here:
https://github.com/pytorch/pytorch/blob/cca247635c6edb323176eeac7a18d3e9ab71c558/caffe2/python/helpers/normalization.py

codertimo · 2018-10-30T01:07:15Z

@egg-west Is your question is solved? 👍

egg-west · 2018-10-30T01:18:17Z

Thank you for your clarification, I guess pulling the epsilon out of sqrt may speed up the computation.
But yes, they did the same thing.

codertimo added invalid This doesn't seem right question Further information is requested labels Oct 23, 2018

egg-west closed this as completed Oct 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The LayerNorm implementation #30

The LayerNorm implementation #30

egg-west commented Oct 23, 2018

codertimo commented Oct 23, 2018 •

edited

briandw commented Oct 29, 2018

codertimo commented Oct 30, 2018

egg-west commented Oct 30, 2018

The LayerNorm implementation #30

The LayerNorm implementation #30

Comments

egg-west commented Oct 23, 2018

codertimo commented Oct 23, 2018 • edited

briandw commented Oct 29, 2018

codertimo commented Oct 30, 2018

egg-west commented Oct 30, 2018

codertimo commented Oct 23, 2018 •

edited