Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

部署时精度差异大 #23

Closed
MaeThird opened this issue Feb 3, 2021 · 8 comments
Closed

部署时精度差异大 #23

MaeThird opened this issue Feb 3, 2021 · 8 comments

Comments

@MaeThird
Copy link

MaeThird commented Feb 3, 2021

感谢大佬的作品。
使用时,我训练小模型10MFLOPS以内部署时精度损失可以忽略,但是大模型2GFLOPS时精度就对不齐了:
LOG:
deploy param: stage0.rbr_reparam.weight torch.Size([64, 1, 3, 3]) -0.048573527
deploy param: stage0.rbr_reparam.bias torch.Size([64]) 0.23182523
deploy param: stage1.0.rbr_reparam.weight torch.Size([128, 64, 3, 3]) -0.0054542203
deploy param: stage1.0.rbr_reparam.bias torch.Size([128]) 1.0140312
deploy param: stage1.1.rbr_reparam.weight torch.Size([128, 64, 3, 3]) 0.0006282824
deploy param: stage1.1.rbr_reparam.bias torch.Size([128]) 0.32761782
deploy param: stage1.2.rbr_reparam.weight torch.Size([128, 128, 3, 3]) 0.0023862773
deploy param: stage1.2.rbr_reparam.bias torch.Size([128]) 0.34976208
deploy param: stage1.3.rbr_reparam.weight torch.Size([128, 64, 3, 3]) -9.027165e-05
deploy param: stage1.3.rbr_reparam.bias torch.Size([128]) 0.0063683093
deploy param: stage2.0.rbr_reparam.weight torch.Size([256, 128, 3, 3]) -8.460902e-05
deploy param: stage2.0.rbr_reparam.bias torch.Size([256]) 0.11033552
deploy param: stage2.1.rbr_reparam.weight torch.Size([256, 128, 3, 3]) -0.00010023986
deploy param: stage2.1.rbr_reparam.bias torch.Size([256]) -0.15826604
deploy param: stage2.2.rbr_reparam.weight torch.Size([256, 256, 3, 3]) -5.3966836e-05
deploy param: stage2.2.rbr_reparam.bias torch.Size([256]) -0.15924689
deploy param: stage2.3.rbr_reparam.weight torch.Size([256, 128, 3, 3]) -6.7551824e-05
deploy param: stage2.3.rbr_reparam.bias torch.Size([256]) -0.37404576
deploy param: stage2.4.rbr_reparam.weight torch.Size([256, 256, 3, 3]) -0.00012947948
deploy param: stage2.4.rbr_reparam.bias torch.Size([256]) -0.6853457
deploy param: stage2.5.rbr_reparam.weight torch.Size([256, 128, 3, 3]) 7.473848e-05
deploy param: stage2.5.rbr_reparam.bias torch.Size([256]) -0.16874048
deploy param: stage3.0.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.000433887
deploy param: stage3.0.rbr_reparam.bias torch.Size([512]) 0.18602118
deploy param: stage3.1.rbr_reparam.weight torch.Size([512, 256, 3, 3]) 0.00048246872
deploy param: stage3.1.rbr_reparam.bias torch.Size([512]) -0.7235512
deploy param: stage3.2.rbr_reparam.weight torch.Size([512, 512, 3, 3]) 0.00021061227
deploy param: stage3.2.rbr_reparam.bias torch.Size([512]) -0.5657553
deploy param: stage3.3.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.00081703335
deploy param: stage3.3.rbr_reparam.bias torch.Size([512]) -0.37847003
deploy param: stage3.4.rbr_reparam.weight torch.Size([512, 512, 3, 3]) -0.00033185782
deploy param: stage3.4.rbr_reparam.bias torch.Size([512]) -0.57922906
deploy param: stage3.5.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.0007206367
deploy param: stage3.5.rbr_reparam.bias torch.Size([512]) -0.56909364
deploy param: stage3.6.rbr_reparam.weight torch.Size([512, 512, 3, 3]) -0.0003344199
deploy param: stage3.6.rbr_reparam.bias torch.Size([512]) -0.5628111
deploy param: stage3.7.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.00021987755
deploy param: stage3.7.rbr_reparam.bias torch.Size([512]) -0.34248477
deploy param: stage3.8.rbr_reparam.weight torch.Size([512, 512, 3, 3]) -0.00010127398
deploy param: stage3.8.rbr_reparam.bias torch.Size([512]) -0.5895205
deploy param: stage3.9.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.0005824505
deploy param: stage3.9.rbr_reparam.bias torch.Size([512]) -0.37577158
deploy param: stage3.10.rbr_reparam.weight torch.Size([512, 512, 3, 3]) -0.00012262027
deploy param: stage3.10.rbr_reparam.bias torch.Size([512]) -0.6199002
deploy param: stage3.11.rbr_reparam.weight torch.Size([512, 256, 3, 3]) 1.503076e-06
deploy param: stage3.11.rbr_reparam.bias torch.Size([512]) -0.7054796
deploy param: stage3.12.rbr_reparam.weight torch.Size([512, 512, 3, 3]) 0.0006349176
deploy param: stage3.12.rbr_reparam.bias torch.Size([512]) -1.0350925
deploy param: stage3.13.rbr_reparam.weight torch.Size([512, 256, 3, 3]) 0.00037807773
deploy param: stage3.13.rbr_reparam.bias torch.Size([512]) -1.1399512
deploy param: stage3.14.rbr_reparam.weight torch.Size([512, 512, 3, 3]) 0.00025178236
deploy param: stage3.14.rbr_reparam.bias torch.Size([512]) -0.27695537
deploy param: stage3.15.rbr_reparam.weight torch.Size([512, 256, 3, 3]) 0.00074805244
deploy param: stage3.15.rbr_reparam.bias torch.Size([512]) -0.8776718
deploy param: stage4.0.rbr_reparam.weight torch.Size([1024, 512, 3, 3]) -0.00013951868
deploy param: stage4.0.rbr_reparam.bias torch.Size([1024]) 0.021552037
deploy param: linear.weight torch.Size([372, 1024]) 0.0051029953
deploy param: linear.bias torch.Size([372]) 0.17604762

打印代码:

    deploy_model = build_func(deploy=True,**kwargs)
    for name, param in deploy_model.named_parameters():
        print('deploy param: ', name, param.size(), np.mean(converted_weights[name]))
        param.data = torch.from_numpy(converted_weights[name]).float()
@DingXiaoH
Copy link
Owner

我们实测RepVGG转换前后的等价性是可以精确到小数点后十位的,这个跟FLOPs高低没有关系。看起来这是一个customize的模型(因为最后是372类)。可否像FAQs的第一条那样check一下?

@MaeThird
Copy link
Author

MaeThird commented Feb 3, 2021

    x = torch.from_numpy(np.random.randn(1,*shape)).float()
    y = model(x)
    model_d = repvgg_model_convert(model,model_func,out_c=186*2,num_blocks=[4,6,16,1],in_c=1)
    y_d = model_d(x)
    print('diff abs: max {},\n**2:{}'.format(abs(y - y_d).max(),((y - y_d) ** 2).sum()))

输出:
diff abs: max 6.67572021484375e-06,
**2:1.419987460948846e-09
这里看正常的,但是实际训练下来,最后导出就是有之前贴出来的那么大差异。convert细节我没搞清,不好枉下结论。

@zjykzj
Copy link

zjykzj commented Feb 4, 2021

    x = torch.from_numpy(np.random.randn(1,*shape)).float()
    y = model(x)
    model_d = repvgg_model_convert(model,model_func,out_c=186*2,num_blocks=[4,6,16,1],in_c=1)
    y_d = model_d(x)
    print('diff abs: max {},\n**2:{}'.format(abs(y - y_d).max(),((y - y_d) ** 2).sum()))

输出:
diff abs: max 6.67572021484375e-06,
**2:1.419987460948846e-09
这里看正常的,但是实际训练下来,最后导出就是有之前贴出来的那么大差异。convert细节我没搞清,不好枉下结论。

我在实现RepVGG的时候观察到两个现象:

  1. 训练阶段和测试阶段的模型都需要执行eval后再进行精度比较,否则会找成较大差异;
  2. 当初始化方式如下所示:
def init_weights(modules):
    for m in modules:
        if isinstance(m, nn.Conv2d):
            nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            if m.bias is not None:
                nn.init.zeros_(m.bias)
        elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
            nn.init.constant_(m.weight, 1)
            nn.init.constant_(m.bias, 0)

会造成较大精度不对齐,而使用下述初始化则可以保证一致性

    def _init_weights(self, gamma=0.01):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, gamma)
                nn.init.constant_(m.bias, gamma)

下面是测试代码:

def test_regvgg():
    model = RepVGGRecognizer()
    model.eval()
    print(model)

    data = torch.randn(1, 3, 224, 224)
    insert_repvgg_block(model)
    model.eval()
    train_outputs = model(data)[KEY_OUTPUT]
    print(model)

    fuse_repvgg_block(model)
    model.eval()
    eval_outputs = model(data)[KEY_OUTPUT]
    print(model)

    print(torch.sqrt(torch.sum((train_outputs - eval_outputs) ** 2)))
    print(torch.allclose(train_outputs, eval_outputs, atol=1e-8))
    assert torch.allclose(train_outputs, eval_outputs, atol=1e-8)

希望能够对你有所帮助

@MaeThird
Copy link
Author

MaeThird commented Feb 4, 2021

@zjykzj 感谢🙏 我确定是eval了以后才convert的。convert似乎就是bn融合+1x1padding成3x3,我找时间再读读。

@DingXiaoH
Copy link
Owner

在任何情况下任何含有BN或dropout的模型在inference前都应该调用eval(),这一点跟RepVGG没有关系。@zjykzj请问可否给出一个造成较大精度不对齐的示例?我试过不管怎么初始化,weights的值域不管多大,实测都是完全一样的。

@mcmingchang
Copy link

上面的测试代码直接跑的是用初始化的权重,这样的话是没问题。但是加载训练过的权重的话就不行了

@zjykzj
Copy link

zjykzj commented Feb 4, 2021

在任何情况下任何含有BN或dropout的模型在inference前都应该调用eval(),这一点跟RepVGG没有关系。@zjykzj请问可否给出一个造成较大精度不对齐的示例?我试过不管怎么初始化,weights的值域不管多大,实测都是完全一样的。

我的实现和测试是在自定义仓库( ZJCV/ZCls )上完成的,目前关键代码均来自楼主的实现,所以整体实现是一样的,我会在下面贴出具体定义的模型,比较一下就了解了

可分为两个部分测试:

  1. 测试单个RepVGGBlock,也就是插入的模块;
  2. 测试RepVGG_B2G4

测试RepVGGBlock

测试代码

测试代码如下(参考test_repvgg_block.py):

# -*- coding: utf-8 -*-

"""
@date: 2021/2/1 下午8:01
@file: test_asymmetric_convolution_block.py
@author: zj
@description: 
"""

import torch
import torch.nn as nn

from zcls.config.key_word import KEY_OUTPUT
from zcls.model.recognizers.repvgg_recognizer import RepVGGRecognizer
from zcls.model.layers.repvgg_block import RepVGGBlock
from zcls.model.conv_helper import insert_repvgg_block, fuse_repvgg_block

。。。
。。。

def test_conv_helper():
    in_channels = 32
    out_channels = 64
    dilation = 1

    # 下采样 + 分组卷积
    kernel_size = 3
    stride = 2
    padding = 1
    groups = 8

    model = nn.Sequential(
        nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size,
                      stride=stride, padding=padding, dilation=dilation, groups=groups),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
        ),
        nn.BatchNorm2d(out_channels),
        nn.ReLU(inplace=True)
    )
    model.eval()
    print(model)

    data = torch.randn(1, in_channels, 56, 56)
    insert_repvgg_block(model)
    model.eval()
    train_outputs = model(data)
    print(model)

    fuse_repvgg_block(model)
    model.eval()
    eval_outputs = model(data)
    print(model)

    print(torch.sqrt(torch.sum((train_outputs - eval_outputs) ** 2)))
    print(torch.allclose(train_outputs, eval_outputs, atol=1e-8))
    assert torch.allclose(train_outputs, eval_outputs, atol=1e-8)

if __name__ == '__main__':
    test_repvgg_block()
    print('*' * 10)

测试结果

初始化方式一

使用如下方式进行初始化:

def init_weights(modules):
    for m in modules:
        if isinstance(m, nn.Conv2d):
            nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            if m.bias is not None:
                nn.init.zeros_(m.bias)
        elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
            nn.init.constant_(m.weight, 1)
            nn.init.constant_(m.bias, 0)

测试结果如下:

############################################### 这个是初始定义的模型
Sequential(
  (0): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=8)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
  )
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU(inplace=True)
)
####################################################3 这个是插入RepVGGBlock后的模型
Sequential(
  (0): Sequential(
    (0): RepVGGBlock(
      (rbr_dense): Sequential(
        (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=8, bias=False)
        (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (rbr_1x1): Sequential(
        (conv): Conv2d(32, 64, kernel_size=(1, 1), stride=(2, 2), groups=8, bias=False)
        (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Identity()
    (2): ReLU(inplace=True)
  )
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU(inplace=True)
)
##################################################### 这个是融合RepVGGBlock后的模型
Sequential(
  (0): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=8)
    (1): Identity()
    (2): ReLU(inplace=True)
  )
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU(inplace=True)
)
################################################## 计算torch.sqrt(torch.sum((train_outputs - eval_outputs) ** 2))的结果
tensor(1.1523e-05, grad_fn=<SqrtBackward>)
##################################################3333 使用torch.allclose(train_outputs, eval_outputs, atol=1e-8)比较两个输出是否一致
False
Traceback (most recent call last):
  File "/home/zj/repos/ZCls/tests/test_model/test_layer/test_repvgg_block.py", line 149, in <module>
    test_conv_helper()
  File "/home/zj/repos/ZCls/tests/test_model/test_layer/test_repvgg_block.py", line 121, in test_conv_helper
    assert torch.allclose(train_outputs, eval_outputs, atol=1e-8)
AssertionError

Process finished with exit code 1

初始化方式二

使用如下方式进行初始化:

    def _init_weights(self, gamma=0.01):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, gamma)
                nn.init.constant_(m.bias, gamma)

测试结果如下:

。。。
。。。
tensor(2.3963e-07, grad_fn=<SqrtBackward>)
True

测试RepVGG_B2G4

和上面的实现类似

测试代码

def test_regvgg():
    model = RepVGGRecognizer()
    model.eval()
    print(model)

    data = torch.randn(1, 3, 224, 224)
    insert_repvgg_block(model)
    model.eval()
    train_outputs = model(data)[KEY_OUTPUT]
    print(model)

    fuse_repvgg_block(model)
    model.eval()
    eval_outputs = model(data)[KEY_OUTPUT]
    print(model)

    print(torch.sqrt(torch.sum((train_outputs - eval_outputs) ** 2)))
    print(torch.allclose(train_outputs, eval_outputs, atol=1e-8))
    assert torch.allclose(train_outputs, eval_outputs, atol=1e-8)


if __name__ == '__main__':
    print('*' * 10)
    test_regvgg()

测试结果

初始化方式一

使用如下方式进行初始化:

def init_weights(modules):
    for m in modules:
        if isinstance(m, nn.Conv2d):
            nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            if m.bias is not None:
                nn.init.zeros_(m.bias)
        elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
            nn.init.constant_(m.weight, 1)
            nn.init.constant_(m.bias, 0)

测试结果如下:

############################################原始的模型RepVGG
RepVGGRecognizer(
  (backbone): RepVGGBackbone(
    (stage0): Sequential(
      (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (1): ReLU(inplace=True)
    )
    (stage1): Sequential(
      (0): Sequential(
        (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (3): Sequential(
        (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
    )
    (stage2): Sequential(
      (0): Sequential(
        (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (3): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (4): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (5): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
    )
    (stage3): Sequential(
      (0): Sequential(
        (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (3): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (4): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (5): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (6): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (7): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (8): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (9): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (10): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (11): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (12): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (13): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (14): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (15): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
    )
    (stage4): Sequential(
      (0): Sequential(
        (0): Conv2d(512, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
    )
  )
  (head): GeneralHead2D(
    (pool): AdaptiveAvgPool2d(output_size=(1, 1))
    (dropout): Dropout(p=0.0, inplace=True)
    (fc): Linear(in_features=2048, out_features=1000, bias=True)
  )
)
############################################ 嵌入RepVGGBlock之后的模型
RepVGGRecognizer(
  (backbone): RepVGGBackbone(
    (stage0): Sequential(
      (0): RepVGGBlock(
        (rbr_dense): Sequential(
          (conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
          (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
        (rbr_1x1): Sequential(
          (conv): Conv2d(3, 64, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): ReLU(inplace=True)
    )
    (stage1): Sequential(
      (0): Sequential(
        (0): RepVGGBlock(
          (rbr_dense): Sequential(
            (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), groups=4, bias=False)
            (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (3): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
    )
    (stage2): Sequential(
      (0): Sequential(
        (0): RepVGGBlock(
          (rbr_dense): Sequential(
            (conv): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), groups=4, bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (3): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (4): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (5): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
    )
    (stage3): Sequential(
      (0): Sequential(
        (0): RepVGGBlock(
          (rbr_dense): Sequential(
            (conv): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (3): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (4): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (5): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (6): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (7): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (8): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (9): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (10): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (11): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (12): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (13): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (14): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), groups=4, bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
      (15): Sequential(
        (0): RepVGGBlock(
          (rbr_identity): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (bn): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
    )
    (stage4): Sequential(
      (0): Sequential(
        (0): RepVGGBlock(
          (rbr_dense): Sequential(
            (conv): Conv2d(512, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
            (bn): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
          (rbr_1x1): Sequential(
            (conv): Conv2d(512, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
            (bn): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          )
        )
        (1): ReLU(inplace=True)
      )
    )
  )
  (head): GeneralHead2D(
    (pool): AdaptiveAvgPool2d(output_size=(1, 1))
    (dropout): Dropout(p=0.0, inplace=True)
    (fc): Linear(in_features=2048, out_features=1000, bias=True)
  )
)
############################################## 融合RepVGGBlock之后的模型
RepVGGRecognizer(
  (backbone): RepVGGBackbone(
    (stage0): Sequential(
      (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (1): ReLU(inplace=True)
    )
    (stage1): Sequential(
      (0): Sequential(
        (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (3): Sequential(
        (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
    )
    (stage2): Sequential(
      (0): Sequential(
        (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (3): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (4): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (5): Sequential(
        (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
    )
    (stage3): Sequential(
      (0): Sequential(
        (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (1): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (2): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (3): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (4): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (5): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (6): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (7): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (8): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (9): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (10): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )3
      (11): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (12): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (13): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
      (14): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=4)
        (1): ReLU(inplace=True)
      )
      (15): Sequential(
        (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
    )
    (stage4): Sequential(
      (0): Sequential(
        (0): Conv2d(512, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (1): ReLU(inplace=True)
      )
    )
  )
  (head): GeneralHead2D(
    (pool): AdaptiveAvgPool2d(output_size=(1, 1))
    (dropout): Dropout(p=0.0, inplace=True)
    (fc): Linear(in_features=2048, out_features=1000, bias=True)
  )
)
tensor(0.0046, grad_fn=<SqrtBackward>)      ################ 计算的结果具有较大偏差了
False
Traceback (most recent call last):
  File "/home/zj/repos/ZCls/tests/test_model/test_layer/test_repvgg_block.py", line 151, in <module>
    test_regvgg()
  File "/home/zj/repos/ZCls/tests/test_model/test_layer/test_repvgg_block.py", line 142, in test_regvgg
    assert torch.allclose(train_outputs, eval_outputs, atol=1e-8)
AssertionError

初始化方式二

使用如下方式进行初始化:

    def _init_weights(self, gamma=0.01):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, gamma)
                nn.init.constant_(m.bias, gamma)

测试结果如下:

。。。
。。。
################################ 测试结果就没问题了
tensor(5.2068e-08, grad_fn=<SqrtBackward>)
True

小结

具体现象和 @MaeThird 类似,就是小模型的误差精度小,大模型的误差精度大

@DingXiaoH
Copy link
Owner

我们的经验是这点浮点误差在任何任务上都观察不到任何差别。另外楼上说这种初始化造成误差小,那是自然的,因为gamma从1变成0.01了嘛……

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants