### TransT의 Feature Extractor 실행 구조

1. run_training.py에서 settings 객체 생성, train_settings/transt/transt.py경로의 run함수 실행
2. run 함수에서 settings 객체의 필드를 초기화, models/tracking/transt.py 경로의 transt_resnet50함수를 실행
3. transt_resnet50함수에서 backbone을 불러오기 위해 models/backbone/transt_backbone.py 경로의 build_backbone함수를 실행
4. build_backbone함수에서 backbone을 정의하기 위해 Backbone 클래스 생성자를 호출 (인스턴스화)
5. Backbone 생성자에서 backbone을 생성하기 위해 models/backbone/resnet.py 경로의 resnet50함수를 실행
6. resnet50함수에서 parameter(output_layers, pretrained)를 고려해서 모델 생성 후 반환 

- run_training.py
- train_settings/transt/transt.py (run)
- models/tracking/transt.py (transt_resnet50)
- models/backbone/transt_backbone.py (build_backbone, Backbone Class<BacknboneBase 상속>)
- models/backbone/resnet.py (resnet50) 
- 모든 경로는 TransT/ltr 디렉토리에 위치

라이브러리 로드

In [6]:
import ltr.admin.settings as ws_settings
import ltr.models.tracking.transt as transt_models
import ltr.models.backbone as backbones
import torch
import torch.nn as nn
import torch.nn.functional as F
import pytorch_model_summary

백본 네트워크 생성

In [7]:
# models/backbone/transt_backbone.py (build_backbone-Backbone, line:80)
output_layers=['layer3']
pretrained=True
frozen_layers=()
backbone = backbones.resnet50(output_layers=output_layers, pretrained=pretrained,
                                      frozen_layers=frozen_layers)

ResNet50 (수정본)
- 마지막 스테이지(layer4) 제거 
- layer3의 Conv2D를 Dilation Convolution(2)로 변경 -> Receptive Field를 증가
- 최종 output: 1024 x W/8 x H/8 [type:collections.OrderedDict]

collections.OrderedDict: [output_layer_name, tensor]로 구성된 자료구조

In [8]:
# Input: SearchRegion (256x256x3), batch_size:38
# Output: SearchFeature (32x32x1024)
input=torch.zeros(1, 3, 128, 128)
print(pytorch_model_summary.summary(backbone, input, show_input=True))
print(backbone.parameters)
print(type(backbone(input)))

-------------------------------------------------------------------------
      Layer (type)           Input Shape         Param #     Tr. Param #
          Conv2d-1      [1, 3, 128, 128]           9,408           9,408
     BatchNorm2d-2       [1, 64, 64, 64]             128             128
            ReLU-3       [1, 64, 64, 64]               0               0
       MaxPool2d-4       [1, 64, 64, 64]               0               0
      Bottleneck-5       [1, 64, 32, 32]          75,008          75,008
      Bottleneck-6      [1, 256, 32, 32]          70,400          70,400
      Bottleneck-7      [1, 256, 32, 32]          70,400          70,400
      Bottleneck-8      [1, 256, 32, 32]         379,392         379,392
      Bottleneck-9      [1, 512, 16, 16]         280,064         280,064
     Bottleneck-10      [1, 512, 16, 16]         280,064         280,064
     Bottleneck-11      [1, 512, 16, 16]         280,064         280,064
     Bottleneck-12      [1, 512, 16, 16]       1,5

ResNet-RS

In [1]:
from pytorch_resnet_rs.model import ResnetRS
import torch
import torch.nn as nn
import torch.nn.functional as F
import pytorch_model_summary

In [34]:
class MyResNetRS(nn.Module):
    def __init__(self,model):
        super().__init__()
        self.model=model
        self.upsample=nn.Upsample(scale_factor=2, mode='bicubic', align_corners=False) 

    def forward(self,x):
        x=self.model(x)
        x=self.upsample(x)
        return x

In [35]:
model=ResnetRS.create_pretrained('resnetrs50', in_ch=3, num_classes=1024,
                           drop_rate=0.)

# layer4~fc는 사용하지 않음 
model.layer4=nn.Identity()
model.avg_pool=nn.Identity()
model.fc=nn.Identity()

# upsampling 적용 
mymodel=MyResNetRS(model)

input=torch.zeros(1, 3, 128, 128)
print(pytorch_model_summary.summary(mymodel, input, show_input=True))
print(mymodel.parameters)

------------------------------------------------------------------------
      Layer (type)          Input Shape         Param #     Tr. Param #
          Resnet-1     [1, 3, 128, 128]      12,379,040      12,379,040
        Upsample-2      [1, 1024, 8, 8]               0               0
Total params: 12,379,040
Trainable params: 12,379,040
Non-trainable params: 0
------------------------------------------------------------------------
<bound method Module.parameters of MyResNetRS(
  (model): Resnet(
    (conv1): StemBlock(
      (conv1): Sequential(
        (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
        (6):

EfficientNet 실험을 위한 라이브러리 로드

In [1]:
import ltr.admin.settings as ws_settings
import ltr.models.tracking.transt as transt_models
import ltr.models.backbone as backbones
import torch
import torch.nn as nn
import torch.nn.functional as F
import pytorch_model_summary
import math,copy
from torchvision_edit import models_e
from torchvision_edit.ops.misc import Conv2dNormActivation
from efficientnet_pytorch_edit import EfficientNet,utils

EfficientNet B7 - 첫 번째 실험
: EfficientNet(R,W,D 유지)+ upsampling + Bottleneck

In [2]:
class MyEfficientNet(nn.Module):
    def __init__(self,efficientnet,eff_in_size=600,eff_out_chnl=2560):
        super().__init__()

        # EfficientNet Input Size (default:600x600)
        self.EffInSize=eff_in_size
        # EfficientNet Output Size (default:12x12)
        self.EffOutSize=math.ceil(eff_in_size/32)

        # Search Region Size
        self.SearchImageSize=256
        self.SearchFeatureSize=32
        # Template Size
        self.TemplateImageSize=128
        self.TemplateFeatureSize=16
        
        # upsampling for identical resolution
        self.upsampleSin=nn.Upsample(scale_factor=(self.EffInSize/self.SearchImageSize), mode='bilinear', align_corners=False) 
        self.upsampleSout=nn.Upsample(scale_factor=(self.SearchFeatureSize/self.EffOutSize), mode='bilinear', align_corners=False) 
        self.upsampleTin=nn.Upsample(scale_factor=(self.EffInSize/self.TemplateImageSize), mode='bilinear', align_corners=False) 
        self.upsampleTout=nn.Upsample(scale_factor=(self.TemplateFeatureSize/self.EffOutSize), mode='bilinear', align_corners=False) 

        # EfficientNet Feature Extractor
        self.effConv = efficientnet

        # output channel number is identical with resnet50 
        self.stage1=nn.Sequential(
            nn.Conv2d(in_channels=eff_out_chnl,out_channels=1024,kernel_size=1),
            nn.BatchNorm2d(1024),
            nn.ReLU()
        )

        # BottleNeck
        self.stage2=nn.Sequential(
            nn.Conv2d(in_channels=1024,out_channels=512,kernel_size=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,out_channels=512,kernel_size=3,padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,out_channels=1024,kernel_size=1),
            nn.BatchNorm2d(1024)
        )
        self.relu=nn.ReLU()

    def forward(self,x):
        # tensor x_width size 
        y=x.size(dim=3) 

        # Search Region
        if y == self.SearchImageSize:
            x=self.upsampleSin(x) # (256x256) -> (600x600)
            x = self.effConv(x) # (600x600) -> (76x76)
            x=self.upsampleSout(x) # (19x19) -> (32x32)
        # Template
        elif y==self.TemplateImageSize:
            x=self.upsampleTin(x) # (128x128) -> (600x600)
            x = self.effConv(x) # (600x600) -> (19x19)
            x=self.upsampleTout(x) # (19x19) -> (16x16)
            
        # output channel number is identical with resnet50 
        # (32x32x2560) -> (32x32x1024), (16x16x2560) -> (16x16x1024)
        x=self.stage1(x)
        
        # BottleNeck with Residual Conn256ection 
        fx=self.stage2(x) # F(x) 
        x=fx+x  # F(x)+x
        x=self.relu(x)
        
        return x

In [3]:
model=EfficientNet.from_pretrained('efficientnet-b7')
mymodel=MyEfficientNet(model)
input=torch.zeros(1, 3, 128, 128)
print(pytorch_model_summary.summary(mymodel, input, show_input=True))
print(mymodel.parameters)

Loaded pretrained weights for efficientnet-b7




-------------------------------------------------------------------------
      Layer (type)           Input Shape         Param #     Tr. Param #
        Upsample-1      [1, 3, 128, 128]               0               0
    EfficientNet-2      [1, 3, 600, 600]      63,786,960      63,786,960
        Upsample-3     [1, 2560, 19, 19]               0               0
          Conv2d-4     [1, 2560, 16, 16]       2,622,464       2,622,464
     BatchNorm2d-5     [1, 1024, 16, 16]           2,048           2,048
            ReLU-6     [1, 1024, 16, 16]               0               0
          Conv2d-7     [1, 1024, 16, 16]         524,800         524,800
     BatchNorm2d-8      [1, 512, 16, 16]           1,024           1,024
            ReLU-9      [1, 512, 16, 16]               0               0
         Conv2d-10      [1, 512, 16, 16]       2,359,808       2,359,808
    BatchNorm2d-11      [1, 512, 16, 16]           1,024           1,024
           ReLU-12      [1, 512, 16, 16]          

EfficientNet B7 - 두 번째 실험
: EfficientNet 자체를 수정 (256x256x3 -> 32x32x1024) (R,W 변경, downsampling 두 차례 안 하는 대신 Atrous Conv)

In [2]:
model=EfficientNet.from_pretrained('efficientnet-b7')

# conv2d (same padding)
Conv2dSame = utils.get_same_padding_conv2d()

# config efficientnet b7
for ct, child in enumerate(model.children()):
    if type(child) == nn.modules.container.ModuleList:
        for gct, gchild in enumerate(child.children()):
            if gct==4:
                gchild._depthwise_conv=Conv2dSame(in_channels=192, out_channels=192, kernel_size=1, dilation=2, bias=False)
            elif gct==11:
                gchild._depthwise_conv=Conv2dSame(in_channels=288, out_channels=288, kernel_size=1, dilation=2, bias=False)
model._conv_head=Conv2dSame(in_channels=640, out_channels=1024, kernel_size=1, bias=False)
model._bn1=nn.BatchNorm2d(1024, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)

input=torch.zeros(1, 3, 128, 128)
print(pytorch_model_summary.summary(model, input, show_input=True))
print(model.parameters)

Loaded pretrained weights for efficientnet-b7
-------------------------------------------------------------------------------------
                  Layer (type)           Input Shape         Param #     Tr. Param #
     Conv2dStaticSamePadding-1      [1, 3, 128, 128]           1,728           1,728
                 BatchNorm2d-2       [1, 64, 64, 64]             128             128
        MemoryEfficientSwish-3       [1, 64, 64, 64]               0               0
                 MBConvBlock-4       [1, 64, 64, 64]           4,944           4,944
                 MBConvBlock-5       [1, 32, 64, 64]           1,992           1,992
                 MBConvBlock-6       [1, 32, 64, 64]           1,992           1,992
                 MBConvBlock-7       [1, 32, 64, 64]           1,992           1,992
                 MBConvBlock-8       [1, 32, 64, 64]          56,360          56,360
                 MBConvBlock-9       [1, 48, 64, 64]          38,700          38,700
                MB

EfficientNet B7 - 세 번째 실험
: EfficientNet을 분리하고 Skip Connection을 추가 (R만 변경)

In [6]:
class MyEfficientNet(nn.Module):
    def __init__(self,model1,model2):
        super().__init__()

        # EfficientNet Feature Extractor
        self.effConv1 = model1
        self.effConv2 = model2

        # output channel number is identical with resnet50 
        self.Conv1x1_0=nn.Sequential(
            nn.Conv2d(in_channels=2560,out_channels=1024,kernel_size=1),
            nn.BatchNorm2d(1024),
            nn.ReLU()
        )

        # upsampling for identical resolution
        self.upsample=nn.Upsample(scale_factor=4, mode='bilinear', align_corners=False) 

        # BottleNeck
        self.Conv1x1_1=nn.Sequential(
            nn.Conv2d(in_channels=1024,out_channels=80,kernel_size=1),
            nn.BatchNorm2d(80),
            nn.ReLU()
        )
        self.Conv3x3_2=nn.Sequential(
            nn.Conv2d(in_channels=80,out_channels=80,kernel_size=3,padding=1),
            nn.BatchNorm2d(80),
            nn.ReLU()
        )
        self.Conv1x1_3=nn.Sequential(
            nn.Conv2d(in_channels=80,out_channels=1024,kernel_size=1),
            nn.BatchNorm2d(1024),
            nn.ReLU()
        )

    def forward(self,x):
        # efficientnet feature extractor
        x=self.effConv1(x)
        save1=copy.deepcopy(x.detach()) # save1: 32x32x80
        x=self.effConv2(x)
            
        # output channel number is identical with resnet50 
        x=self.Conv1x1_0(x)
        
        # upsample
        x=self.upsample(x) 
        save2=copy.deepcopy(x.detach()) # save2: 32x32x1024

        # BottleNeck with Residual Connection 
        x=self.Conv1x1_1(x)
        x=x+save1
        x=self.Conv3x3_2(x)
        x=self.Conv1x1_3(x)
        x=x+save2 
        
        return x

In [8]:
model=EfficientNet.from_pretrained('efficientnet-b7')
model1=copy.deepcopy(model)
model2=copy.deepcopy(model)

# model1
model1._blocks = nn.Sequential(*list(model1._blocks.children())[0:18])
model1._conv_head=nn.Identity()
model1._bn1=nn.Identity()

# model2
model2._conv_stem=nn.Identity()
model2._bn0=nn.Identity()
model2._blocks = nn.Sequential(*list(model2._blocks.children())[18:55])

mymodel=MyEfficientNet(model1, model2)
input=torch.zeros(1, 3, 256, 256)
print(pytorch_model_summary.summary(mymodel, input, show_input=True))
print(mymodel.parameters)

Loaded pretrained weights for efficientnet-b7
-------------------------------------------------------------------------
      Layer (type)           Input Shape         Param #     Tr. Param #
    EfficientNet-1      [1, 3, 256, 256]         982,268         982,268
    EfficientNet-2       [1, 80, 32, 32]      62,804,692      62,804,692
          Conv2d-3       [1, 2560, 8, 8]       2,622,464       2,622,464
     BatchNorm2d-4       [1, 1024, 8, 8]           2,048           2,048
            ReLU-5       [1, 1024, 8, 8]               0               0
        Upsample-6       [1, 1024, 8, 8]               0               0
          Conv2d-7     [1, 1024, 32, 32]          82,000          82,000
     BatchNorm2d-8       [1, 80, 32, 32]             160             160
            ReLU-9       [1, 80, 32, 32]               0               0
         Conv2d-10       [1, 80, 32, 32]          57,680          57,680
    BatchNorm2d-11       [1, 80, 32, 32]             160             160
    

### TransT의 변경된 Feature Extractor 실행 구조

1. run_training.py에서 settings 객체 생성, train_settings/transt/transt.py경로의 run함수 실행
2. run 함수에서 settings 객체의 필드를 초기화, models/tracking/transt.py 경로의 transt_effnet함수를 실행 
3. transt_effnet함수에서 backbone을 불러오기 위해 models/backbone/transt_backbone.py 경로의 build_backbone함수를 실행
4. build_backbone함수에서 backbone을 정의하기 위해 MyBackbone 클래스 생성자를 호출 (인스턴스화)
5. Backbone 생성자에서 backbone을 생성하기 위해 models/backbone/efficientnet.py 경로의 effnet함수를 실행
5. effnet함수에서 모델 생성 후 반환 

- run_training.py
- train_settings/transt/transt.py (run)
- models/tracking/transt.py (transt_effnet)
- models/backbone/transt_backbone.py (build_backbone, Backbone Class<MyBacknboneBase 상속>)
- models/backbone/efficientnet.py (effnet) 
- 모든 경로는 TransT/ltr 디렉토리에 위치

P.s. MyEfficientNet의 출력 타입과 기존 네트워크의 출력 타입이 다르므로 BackboneBase 클래스를 수정해서 MyBackboneBase 클래스를 제작함