### PyTorch归一化层参数解释

* num_features:
    * BatchNorm1d:$C$ from an expected input of size $(N, C, L)$ or $L$ from input of size $(N, L)$
    * BatchNorm2d:$C$ from an expected input of size $(N, C, H, W)$;若输入向量维度为:$()$,则具体计算如下所示:

    $$ E(X_C) = \frac{1}{N \times H \times W} \sum_{N,H,W} X_C $$

    $$ \mathrm{var}(X_C) = \frac{1}{N \times H \times W} \sum_{N,H,W} \left( X_C - E(X_C) \right)^2 $$

    * BatchNorm2d:$C$ from an expected input of size $(N, C, D, H, W)$

* eps:浮点常数$\epsilon$,为了防止计算过程中分母为零

* affine:
    * 如果affine=False,则固定$\gamma=1,\beta=0$.
    * else,this module has learnable affine parameters.

* momentum:the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average). Default: 0.1

    $$ running\_mean_{t} = (1 - momentum) \times running\_mean_{t-1} + momentum \times \hat{x}_t $$

* track_running_stats:a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: True

In [11]:
import torch
import torch.nn as nn


class bn(nn.Module):
    def __init__(self, C):
        super(bn, self).__init__()

        self.bn1 = nn.BatchNorm1d(C, eps=0, momentum=0.3,
                                  # 若track_running_stats=False,则running_mean=None,running_var=None,\mu和\sigma由本批次数据计算得出
                                  track_running_stats=True)

    def forward(self, x):
        return self.bn1(x)


net = bn(C=4) # num_features为通道C

In [12]:
entry = torch.ones((2, 4, 1))

for i in range(4):
    print(net(entry)) # 训练过程:\mu和\sigma总是由本批次数据计算得出
    print(net.bn1.running_mean) # 若track_running_stats=True,每次forward之后计算新的running_mean和running_var(模型会对其进行缓存)
    print(net.bn1.running_var)

tensor([[[0.],
         [0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [0.],
         [0.]]], grad_fn=<NativeBatchNormBackward>)
tensor([0.3000, 0.3000, 0.3000, 0.3000])
tensor([0.7000, 0.7000, 0.7000, 0.7000])
tensor([[[0.],
         [0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [0.],
         [0.]]], grad_fn=<NativeBatchNormBackward>)
tensor([0.5100, 0.5100, 0.5100, 0.5100])
tensor([0.4900, 0.4900, 0.4900, 0.4900])
tensor([[[0.],
         [0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [0.],
         [0.]]], grad_fn=<NativeBatchNormBackward>)
tensor([0.6570, 0.6570, 0.6570, 0.6570])
tensor([0.3430, 0.3430, 0.3430, 0.3430])
tensor([[[0.],
         [0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [0.],
         [0.]]], grad_fn=<NativeBatchNormBackward>)
tensor([0.7599, 0.7599, 0.7599, 0.7599])
tensor([0.2401, 0.2401, 0.2401, 0.2401])


In [13]:
net.eval()
'''
测试过程:
若self.trainning=False-->
由于用于测试的mini-batch数据仅仅是整个数据非常小的一部分,统计特性就会和全局统计特性有着较大偏差容易造成统计特性的偏移,导致预测结果的不准确,
故\mu和\sigma使用模型训练过程中得到的running_mean与running_var代替(self.trainning=False,track_running_stats=True)
'''
entry1 = torch.ones((2, 4, 1))
t_out = net(entry1)
t_out

tensor([[[0.4900],
         [0.4900],
         [0.4900],
         [0.4900]],

        [[0.4900],
         [0.4900],
         [0.4900],
         [0.4900]]], grad_fn=<NativeBatchNormBackward>)

In [14]:
entry2 = torch.ones((2, 4, 1))
t_out2 = net(entry2)
t_out2

tensor([[[0.4900],
         [0.4900],
         [0.4900],
         [0.4900]],

        [[0.4900],
         [0.4900],
         [0.4900],
         [0.4900]]], grad_fn=<NativeBatchNormBackward>)

In [15]:
net.train()
entry3 = torch.ones((2, 4, 1))
t_out3 = net(entry3) # 测试过程:若trainning=True-->此时\mu和\sigma仍由本批次的数据计算得出
t_out3

tensor([[[0.],
         [0.],
         [0.],
         [0.]],

        [[0.],
         [0.],
         [0.],
         [0.]]], grad_fn=<NativeBatchNormBackward>)

In [16]:
net.bn1.weight

Parameter containing:
tensor([1., 1., 1., 1.], requires_grad=True)

In [17]:
net.bn1.bias

Parameter containing:
tensor([0., 0., 0., 0.], requires_grad=True)

In [18]:
entry_2d = torch.ones(5, 6, 4, 4)
bn_2d = nn.BatchNorm2d(num_features=6) # num_features=C=6
bn_2d(entry_2d).shape

torch.Size([5, 6, 4, 4])

In [19]:
entry_3d = torch.ones(3, 7, 3, 4, 4)
bn_3d = nn.BatchNorm3d(num_features=7) # num_features=C=7
bn_3d(entry_3d).shape

torch.Size([3, 7, 3, 4, 4])