### TransT의 Feature Extractor 실행 구조

1. run_training.py에서 settings 객체 생성, train_settings/transt/transt.py경로의 run함수 실행
2. run 함수에서 settings 객체의 필드를 초기화, models/tracking/transt.py 경로의 transt_resnet50함수를 실행
3. transt_resnet50함수에서 backbone을 불러오기 위해 models/backbone/transt_backbone.py 경로의 build_backbone함수를 실행
4. build_backbone함수에서 backbone을 생성하기 위해 models/backbone/resnet.py 경로의 resnet50함수를 실행
5. resnet50함수에서 parameter(output_layers, pretrained)를 고려해서 모델 생성 후 반환 

- run_training.py
- train_settings/transt/transt.py (run)
- models/tracking/transt.py (transt_resnet50)
- models/backbone/transt_backbone.py (build_backbone) <output_layers는 'layer3'으로 설정함>
- models/backbone/resnet.py (resnet50) 
- 모든 경로는 TransT/ltr 디렉토리에 위치

라이브러리 로드

In [1]:
import ltr.admin.settings as ws_settings
import ltr.models.tracking.transt as transt_models
import ltr.models.backbone as backbones
import torch
import torch.nn as nn
import torch.nn.functional as F
import pytorch_model_summary

백본 네트워크 생성

In [14]:
# models/backbone/transt_backbone.py (build_backbone-Backbone, line:80)
output_layers=['layer3']
pretrained=True
frozen_layers=()
backbone = backbones.resnet50(output_layers=output_layers, pretrained=pretrained,
                                      frozen_layers=frozen_layers)

ResNet50 (수정본)
- 마지막 스테이지(layer4) 제거 
- layer3의 downsampling unit에서 stride를 2에서 1로 변경 -> feature resolution을 증가
- layer3의 Conv2D를 Dilation Convolution(stride:2)로 변경 -> Receptive Field를 증가
- 최종 output: 1024 x W/8 x H/8

In [15]:
# Input: SearchRegion (256x256x3), batch_size:38
# Output: SearchFeature (32x32x1024)
input=torch.zeros(38, 3, 256, 256)
print(pytorch_model_summary.summary(backbone, input, show_input=True))
print(backbone.parameters)

--------------------------------------------------------------------------
      Layer (type)            Input Shape         Param #     Tr. Param #
          Conv2d-1      [38, 3, 256, 256]           9,408           9,408
     BatchNorm2d-2     [38, 64, 128, 128]             128             128
            ReLU-3     [38, 64, 128, 128]               0               0
       MaxPool2d-4     [38, 64, 128, 128]               0               0
      Bottleneck-5       [38, 64, 64, 64]          75,008          75,008
      Bottleneck-6      [38, 256, 64, 64]          70,400          70,400
      Bottleneck-7      [38, 256, 64, 64]          70,400          70,400
      Bottleneck-8      [38, 256, 64, 64]         379,392         379,392
      Bottleneck-9      [38, 512, 32, 32]         280,064         280,064
     Bottleneck-10      [38, 512, 32, 32]         280,064         280,064
     Bottleneck-11      [38, 512, 32, 32]         280,064         280,064
     Bottleneck-12      [38, 512, 32,

EfficientNet B2 (수정본)
- Final linear layer 제거 (AdaptiveAvgPool2d,Dropout,Linear) -> Feature Extractor만 남김
- Feature Extractor의 output_channel을 1408에서 1024로 변경 
- 수정된 EfficientNet에 bilinear interpolation을 추가함 (기존 모델과 output을 일치시키기 위함, 1/8로 축소돼야 함) 
- 최종 output: 1024 x W/8 x H/8

In [18]:
from efficientnet_pytorch_edit import EfficientNet

In [21]:
class MyEfficientNet(nn.Module):
    def __init__(self,efficientnet):
        super().__init__()
        # 수정된 efficientnet
        self.effConv = efficientnet
        # 1/8로 축소돼야 하는 데 1/32로 축소되므로 scale_factor을 4로 줘서 width,height를 4배씩 증가
        self.upsample=nn.Upsample(scale_factor=4, mode='bilinear', align_corners=False)
        self.none=nn.Identity()

    def forward(self,x):
        x = self.effConv(x)
        x=self.upsample(x)
        x=self.none(x)
        return x

efficientnet=EfficientNet.from_pretrained('efficientnet-b2')
myefficientnet=MyEfficientNet(efficientnet)

Loaded pretrained weights for efficientnet-b2


In [22]:
# Input: SearchRegion (256x256x3), batch_size:38
# Output: SearchFeature (32x32x1024)
input=torch.zeros(38, 3, 256, 256)
print(pytorch_model_summary.summary(myefficientnet, input, show_input=True))
print(myefficientnet.parameters)

--------------------------------------------------------------------------
      Layer (type)            Input Shape         Param #     Tr. Param #
    EfficientNet-1      [38, 3, 256, 256]       7,565,058       7,565,058
        Upsample-2       [38, 1024, 8, 8]               0               0
        Identity-3     [38, 1024, 32, 32]               0               0
Total params: 7,565,058
Trainable params: 7,565,058
Non-trainable params: 0
--------------------------------------------------------------------------
<bound method Module.parameters of MyEfficientNet(
  (effConv): EfficientNet(
    (_conv_stem): Conv2dStaticSamePadding(
      3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False
      (static_padding): ZeroPad2d(padding=(0, 1, 0, 1), value=0.0)
    )
    (_bn0): BatchNorm2d(32, eps=0.001, momentum=0.010000000000000009, affine=True, track_running_stats=True)
    (_blocks): ModuleList(
      (0): MBConvBlock(
        (_depthwise_conv): Conv2dStaticSamePadding(
          32,

### TransT의 변경된 Feature Extractor 실행 구조

1. run_training.py에서 settings 객체 생성, train_settings/transt/transt.py경로의 run함수 실행
2. run 함수에서 settings 객체의 필드를 초기화, models/tracking/transt.py 경로의 transt_effnet함수를 실행 
3. transt_effnet함수에서 backbone을 불러오기 위해 models/backbone/transt_backbone.py 경로의 build_backbone함수를 실행
4. build_backbone함수에서 backbone을 생성하기 위해 models/backbone/efficientnet.py 경로의 effnet함수를 실행
5. effnet함수에서 모델 생성 후 반환 

- run_training.py
- train_settings/transt/transt.py (run)
- models/tracking/transt.py (transt_effnet)
- models/backbone/transt_backbone.py (build_backbone) 
- models/backbone/efficientnet.py (effnet) 
- 모든 경로는 TransT/ltr 디렉토리에 위치