### TransT의 Feature Extractor 실행 구조

1. run_training.py에서 settings 객체 생성, train_settings/transt/transt.py경로의 run함수 실행
2. run 함수에서 settings 객체의 필드를 초기화, models/tracking/transt.py 경로의 transt_resnet50함수를 실행
3. transt_resnet50함수에서 backbone을 불러오기 위해 models/backbone/transt_backbone.py 경로의 build_backbone함수를 실행
4. build_backbone함수에서 backbone을 정의하기 위해 Backbone 클래스 생성자를 호출 (인스턴스화)
5. Backbone 생성자에서 backbone을 생성하기 위해 models/backbone/resnet.py 경로의 resnet50함수를 실행
6. resnet50함수에서 parameter(output_layers, pretrained)를 고려해서 모델 생성 후 반환 

- run_training.py
- train_settings/transt/transt.py (run)
- models/tracking/transt.py (transt_resnet50)
- models/backbone/transt_backbone.py (build_backbone, Backbone Class<BacknboneBase 상속>)
- models/backbone/resnet.py (resnet50) 
- 모든 경로는 TransT/ltr 디렉토리에 위치

라이브러리 로드

In [13]:
import ltr.admin.settings as ws_settings
import ltr.models.tracking.transt as transt_models
import ltr.models.backbone as backbones
import torch
import torch.nn as nn
import torch.nn.functional as F
import pytorch_model_summary

백본 네트워크 생성

In [5]:
# models/backbone/transt_backbone.py (build_backbone-Backbone, line:80)
output_layers=['layer3']
pretrained=True
frozen_layers=()
backbone = backbones.resnet50(output_layers=output_layers, pretrained=pretrained,
                                      frozen_layers=frozen_layers)

ResNet50 (수정본)
- 마지막 스테이지(layer4) 제거 
- layer3의 downsampling unit에서 stride를 2에서 1로 변경 -> feature resolution을 증가
- layer3의 Conv2D를 Dilation Convolution(stride:2)로 변경 -> Receptive Field를 증가
- 최종 output: 1024 x W/8 x H/8 [type:collections.OrderedDict]

collections.OrderedDict: [output_layer_name, tensor]로 구성된 자료구조

In [6]:
# Input: SearchRegion (256x256x3), batch_size:38
# Output: SearchFeature (32x32x1024)
input=torch.zeros(38, 3, 256, 256)
print(pytorch_model_summary.summary(backbone, input, show_input=True))
print(backbone.parameters)
print(type(backbone(input)))

--------------------------------------------------------------------------
      Layer (type)            Input Shape         Param #     Tr. Param #
          Conv2d-1      [38, 3, 256, 256]           9,408           9,408
     BatchNorm2d-2     [38, 64, 128, 128]             128             128
            ReLU-3     [38, 64, 128, 128]               0               0
       MaxPool2d-4     [38, 64, 128, 128]               0               0
      Bottleneck-5       [38, 64, 64, 64]          75,008          75,008
      Bottleneck-6      [38, 256, 64, 64]          70,400          70,400
      Bottleneck-7      [38, 256, 64, 64]          70,400          70,400
      Bottleneck-8      [38, 256, 64, 64]         379,392         379,392
      Bottleneck-9      [38, 512, 32, 32]         280,064         280,064
     Bottleneck-10      [38, 512, 32, 32]         280,064         280,064
     Bottleneck-11      [38, 512, 32, 32]         280,064         280,064
     Bottleneck-12      [38, 512, 32,

EfficientNet B4
- Final linear layer 제거 (AdaptiveAvgPool2d,Dropout,Linear) -> Feature Extractor만 남김
- 수정된 EfficientNet에 bilinear interpolation을 추가함 (efficientnet의 input_resolution 일치, 기존 backbone과 output_resolution 일치) 
- BottleNeck Layer는 선택사항 
- 최종 output: 1024 x W/8 x H/8 [type:torch.Tensor]

In [2]:
import ltr.admin.settings as ws_settings
import ltr.models.tracking.transt as transt_models
import ltr.models.backbone as backbones
import torch
import torch.nn as nn
import torch.nn.functional as F
import pytorch_model_summary
import math
from efficientnet_pytorch_edit import EfficientNet

In [2]:
class MyEfficientNet(nn.Module):
    def __init__(self,efficientnet,eff_in_size=380,eff_out_chnl=1792):
        super().__init__()

        # EfficientNet Input Size (default:380x380)
        self.EffInSize=eff_in_size
        # EfficientNet Output Size (default:12x12)
        self.EffOutSize=math.ceil(eff_in_size/32)

        # Search Region Size
        self.SearchImageSize=256
        self.SearchFeatureSize=32
        # Template Size
        self.TemplateImageSize=128
        self.TemplateFeatureSize=16
        
        # upsampling for identical resolution
        self.upsampleSin=nn.Upsample(scale_factor=(self.EffInSize/self.SearchImageSize), mode='bilinear', align_corners=False) 
        self.upsampleSout=nn.Upsample(scale_factor=(self.SearchFeatureSize/self.EffOutSize), mode='bilinear', align_corners=False) 
        self.upsampleTin=nn.Upsample(scale_factor=(self.EffInSize/self.TemplateImageSize), mode='bilinear', align_corners=False) 
        self.upsampleTout=nn.Upsample(scale_factor=(self.TemplateFeatureSize/self.EffOutSize), mode='bilinear', align_corners=False) 

        # EfficientNet Feature Extractor
        self.effConv = efficientnet

        # output channel number is identical with resnet50 
        self.stage1=nn.Sequential(
            nn.Conv2d(in_channels=eff_out_chnl,out_channels=1024,kernel_size=1),
            nn.BatchNorm2d(1024),
            nn.ReLU()
        )

        # BottleNeck
        self.stage2=nn.Sequential(
            nn.Conv2d(in_channels=1024,out_channels=512,kernel_size=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,out_channels=512,kernel_size=3,padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(),
            nn.Conv2d(in_channels=512,out_channels=1024,kernel_size=1),
            nn.BatchNorm2d(1024)
        )
        self.relu=nn.ReLU()

    def forward(self,x):
        # tensor x_width size 
        y=x.size(dim=3) 

        # Search Region
        if y == self.SearchImageSize:
            x=self.upsampleSin(x) # (256x256) -> (380x380)
            x = self.effConv(x) # (380x380) -> (12x12)
            x=self.upsampleSout(x) # (12x12) -> (32x32)
        # Template
        elif y==self.TemplateImageSize:
            x=self.upsampleTin(x) # (256x256) -> (380x380)
            x = self.effConv(x) # (380x380) -> (12x12)
            x=self.upsampleTout(x) # (12x12) -> (16x16)
            
        # output channel number is identical with resnet50 
        # (32x32x1792) -> (32x32x1024), (16x16x1792) -> (16x16x1024)
        x=self.stage1(x)
        
        # BottleNeck with Residual Connection 
        fx=self.stage2(x) # F(x) 
        x=fx+x  # F(x)+x
        x=self.relu(x)
        
        return x

In [120]:
class MyEfficientNet2(nn.Module):
    def __init__(self,efficientnet):
        super().__init__()
        # EfficientNet Feature Extractor
        self.effConv = efficientnet

        # output channel number is identical with resnet50 
        self.stage=nn.Sequential(
            nn.Conv2d(in_channels=80,out_channels=160,kernel_size=1),
            nn.BatchNorm2d(160),
            nn.ReLU(),
            nn.Conv2d(in_channels=160,out_channels=320,kernel_size=3,padding=1),
            nn.BatchNorm2d(320),
            nn.ReLU(),
            nn.Conv2d(in_channels=320,out_channels=640,kernel_size=1),
            nn.BatchNorm2d(640),
            nn.ReLU(),
            nn.Conv2d(in_channels=640,out_channels=1024,kernel_size=3,padding=1),
            nn.BatchNorm2d(1024),
            nn.ReLU()
        )
    
    def forward(self,x):
        x=self.effConv(x)
        x=self.stage(x)
        return x

In [136]:
def freeze_model(model):
    # MBConv, BatchNorm -> freeze
    for ct, child in enumerate(model.children()):
        print(type(child))
        if isinstance(child,nn.modules.batchnorm.BatchNorm2d):
            for param in child.parameters():
                param.requires_grad = False
        elif isinstance(child,nn.modules.container.Sequential):
            for param in child.parameters():
                param.requires_grad = False
    return model

In [137]:
# EfficientNet B2 -> eff_in_size=260, eff_out_chnl=1408
# EfficientNet B3 -> eff_in_size=300, eff_out_chnl=1536
# EfficientNet B4 -> eff_in_size=380, eff_out_chnl=1792 [default]
model=EfficientNet.from_pretrained('efficientnet-b7')
model._blocks=nn.Sequential(*list(model._blocks.children())[:-37])
model._conv_head=nn.Identity()
model._bn1=nn.Identity()
model=freeze_model(model)
myefficientnet=MyEfficientNet2(model)
print(pytorch_model_summary.summary(myefficientnet, input, show_input=True))

Loaded pretrained weights for efficientnet-b7
<class 'efficientnet_pytorch_edit.utils.Conv2dStaticSamePadding'>
<class 'torch.nn.modules.batchnorm.BatchNorm2d'>
<class 'torch.nn.modules.container.Sequential'>
<class 'torch.nn.modules.linear.Identity'>
<class 'torch.nn.modules.linear.Identity'>
<class 'efficientnet_pytorch_edit.utils.MemoryEfficientSwish'>
-------------------------------------------------------------------------
      Layer (type)           Input Shape         Param #     Tr. Param #
    EfficientNet-1      [1, 3, 256, 256]         982,268           1,728
          Conv2d-2       [1, 80, 32, 32]          12,960          12,960
     BatchNorm2d-3      [1, 160, 32, 32]             320             320
            ReLU-4      [1, 160, 32, 32]               0               0
          Conv2d-5      [1, 160, 32, 32]         461,120         461,120
     BatchNorm2d-6      [1, 320, 32, 32]             640             640
            ReLU-7      [1, 320, 32, 32]               0 

In [131]:
# Input: SearchRegion (256x256x3), batch_size:38
# Output: SearchFeature (32x32x1024)
input=torch.zeros(1, 3, 256, 256)
print(pytorch_model_summary.summary(EfficientNet.from_pretrained('efficientnet-b7'), input, show_input=True))
print(pytorch_model_summary.summary(myefficientnet, input, show_input=True))

Loaded pretrained weights for efficientnet-b7
------------------------------------------------------------------------------------
                 Layer (type)           Input Shape         Param #     Tr. Param #
    Conv2dStaticSamePadding-1      [1, 3, 256, 256]           1,728           1,728
                BatchNorm2d-2     [1, 64, 128, 128]             128             128
       MemoryEfficientSwish-3     [1, 64, 128, 128]               0               0
                MBConvBlock-4     [1, 64, 128, 128]           4,944           4,944
                MBConvBlock-5     [1, 32, 128, 128]           1,992           1,992
                MBConvBlock-6     [1, 32, 128, 128]           1,992           1,992
                MBConvBlock-7     [1, 32, 128, 128]           1,992           1,992
                MBConvBlock-8     [1, 32, 128, 128]          21,224          21,224
                MBConvBlock-9       [1, 48, 64, 64]          38,700          38,700
               MBConvBlock-10

### TransT의 변경된 Feature Extractor 실행 구조

1. run_training.py에서 settings 객체 생성, train_settings/transt/transt.py경로의 run함수 실행
2. run 함수에서 settings 객체의 필드를 초기화, models/tracking/transt.py 경로의 transt_effnet함수를 실행 
3. transt_effnet함수에서 backbone을 불러오기 위해 models/backbone/transt_backbone.py 경로의 build_backbone함수를 실행
4. build_backbone함수에서 backbone을 정의하기 위해 MyBackbone 클래스 생성자를 호출 (인스턴스화)
5. Backbone 생성자에서 backbone을 생성하기 위해 models/backbone/efficientnet.py 경로의 effnet함수를 실행
5. effnet함수에서 모델 생성 후 반환 

- run_training.py
- train_settings/transt/transt.py (run)
- models/tracking/transt.py (transt_effnet)
- models/backbone/transt_backbone.py (build_backbone, Backbone Class<MyBacknboneBase 상속>)
- models/backbone/efficientnet.py (effnet) 
- 모든 경로는 TransT/ltr 디렉토리에 위치

P.s. MyEfficientNet의 출력 타입과 기존 네트워크의 출력 타입이 다르므로 BackboneBase 클래스를 수정해서 MyBackboneBase 클래스를 제작함