&emsp;&emsp;批量归一化是对一个中间层的单个神经元进行归一化操作,因此要求小批量样本的数量不能太小,
否则难以计算单个神经元的统计信息.此外,如果一个神经元的净输入的分布在神经网络中是动态变化的,比如循环神经网络,那么就
无法应用批量归一化操作.

&emsp;&emsp;层归一化(Layer Normalization)是和批量归一化非常类似的方法.和批量归一化不同的是,层归一化是对一个中间层的所有神经元进行
归一化.

In [4]:
import torch
import torch.nn as nn

In [5]:
entry = torch.ones(54, 12, 28, 29)
gn = nn.LayerNorm(normalized_shape=entry.size()[1:],  # 除mini-batch维度之外其他维度的列表(即进行层归一化的维度)
                  eps=0,  # 与BatchNorm参数类似
                  elementwise_affine=True)  # 与BatchNorm affine参数类似,即是否使用weight和bias进行仿射变换
# 注:LayerNorm没有track_running_stats参数,\mu和\sigma总由本批次的数据计算得出.故预测不需要设定gn.eval()
gn(entry).shape

torch.Size([54, 12, 28, 29])

In [6]:
# If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size.
nn.LayerNorm(normalized_shape=29)(entry)  # 对最后一个维度进行LayerNorm操作

tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]],

         [[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]],

         [[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]],

         ...,

         [[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 

In [8]:
# NLP Example
batch, sentence_length, embedding_dim = 20, 5, 10
embedding = torch.randn(batch, sentence_length, embedding_dim)
print(embedding_dim)
layer_norm = nn.LayerNorm(embedding_dim)  # 对embedding_dim层进行层归一化
# Activate module
layer_norm(embedding)


10


tensor([[[-4.9530e-01,  7.4686e-01, -1.0241e+00, -9.0744e-01,  9.3312e-01,
          -5.2040e-02,  1.2955e+00,  1.4172e+00, -2.7389e-01, -1.6399e+00],
         [-1.5209e-01,  1.1404e+00, -1.5581e+00,  3.9314e-01, -1.6713e+00,
          -4.4993e-01,  1.3674e+00,  3.0263e-01,  9.9940e-01, -3.7155e-01],
         [ 1.8817e-01, -2.2581e-01,  1.0689e+00, -5.3114e-01,  1.8473e+00,
          -9.1772e-01, -2.9300e-01, -1.7392e+00,  9.8749e-01, -3.8510e-01],
         [ 1.4070e-01, -6.6758e-01, -7.0075e-01,  3.4564e-01, -8.2954e-01,
           9.2049e-01,  7.9682e-01,  2.0441e+00, -1.5118e+00, -5.3814e-01],
         [ 4.2115e-01, -4.4939e-01, -1.3672e+00,  2.1022e+00,  8.2032e-01,
          -3.7659e-01,  9.6302e-01, -4.8389e-01, -9.3230e-01, -6.9737e-01]],

        [[ 1.5967e+00,  9.6163e-01, -1.7277e-01,  3.1451e-01,  9.0941e-01,
          -8.2004e-02, -2.2241e+00, -2.5194e-01, -5.3763e-01, -5.1383e-01],
         [ 1.2047e+00, -8.0541e-01,  3.6869e-01,  4.6472e-01,  8.2345e-01,
          -1.9574