### TransT의 Feature Extractor 실행 구조

1. run_training.py에서 settings 객체 생성, train_settings/transt/transt.py경로의 run함수 실행
2. run 함수에서 settings 객체의 필드를 초기화, models/tracking/transt.py 경로의 transt_resnet50함수를 실행
3. transt_resnet50함수에서 backbone을 불러오기 위해 models/backbone/transt_backbone.py 경로의 build_backbone함수를 실행
4. build_backbone함수에서 backbone을 정의하기 위해 Backbone 클래스 생성자를 호출 (인스턴스화)
5. Backbone 생성자에서 backbone을 생성하기 위해 models/backbone/resnet.py 경로의 resnet50함수를 실행
6. resnet50함수에서 parameter(output_layers, pretrained)를 고려해서 모델 생성 후 반환

- run_training.py
- train_settings/transt/transt.py (run)
- models/tracking/transt.py (transt_resnet50)
- models/backbone/transt_backbone.py (build_backbone, Backbone Class<BacknboneBase 상속>)
- models/backbone/resnet.py (resnet50)
- 모든 경로는 TransT/ltr 디렉토리에 위치

라이브러리 로드

In [1]:
import ltr.admin.settings as ws_settings
import ltr.models.tracking.transt as transt_models
from ltr.models.tracking.transt import *
import ltr.models.backbone as backbones
import torch
import torch.nn as nn
import torch.nn.functional as F
import pytorch_model_summary

Prediction Head Network

In [3]:
input=torch.zeros(1,38,1024,256)
print(pytorch_model_summary.summary(Attention_and_MLP(256, 256, 2, 3), input, show_input=True))

--------------------------------------------------------------------------
      Layer (type)            Input Shape         Param #     Tr. Param #
     SpatialGate-1     [1, 38, 1024, 256]             100             100
          Linear-2     [1, 38, 1024, 256]          65,792          65,792
          Linear-3     [1, 38, 1024, 256]          65,792          65,792
          Linear-4     [1, 38, 1024, 256]             514             514
Total params: 132,198
Trainable params: 132,198
Non-trainable params: 0
--------------------------------------------------------------------------


전체 네트워크 생성

In [4]:
settings = ws_settings.Settings()

# Transformer
settings.position_embedding = 'sine'
settings.hidden_dim = 256
settings.dropout = 0.1
settings.nheads = 8
settings.dim_feedforward = 2048
settings.featurefusion_layers = 4

In [5]:
backbone_net = my_build_backbone(settings, backbone_pretrained=True)
featurefusion_network = build_featurefusion_network(settings)
num_classes = 1

/home/seinkwon/ahnsunghyun/TransT2


  init.kaiming_normal(self.state_dict()[key], mode='fan_out')


pretrained resnet_cbam...ok!


In [6]:
model = TransT(
        backbone_net,
        featurefusion_network,
        num_classes=num_classes
    )
print(model.parameters)

<bound method Module.parameters of TransT(
  (featurefusion_network): FeatureFusionNetwork(
    (encoder): Encoder(
      (layers): ModuleList(
        (0): FeatureFusionLayer(
          (self_attn1): MultiheadAttention(
            (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
          )
          (self_attn2): MultiheadAttention(
            (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
          )
          (multihead_attn1): MultiheadAttention(
            (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
          )
          (multihead_attn2): MultiheadAttention(
            (out_proj): _LinearWithBias(in_features=256, out_features=256, bias=True)
          )
          (linear11): Linear(in_features=256, out_features=2048, bias=True)
          (dropout1): Dropout(p=0.1, inplace=False)
          (linear12): Linear(in_features=2048, out_features=256, bias=True)
          (linear21): Linear(in_features=

백본 네트워크 생성 (ResNet_CBAM)

In [2]:
backbone = backbones.myresnet(pretrained=True)

/home/seinkwon/ahnsunghyun/TransT2


  init.kaiming_normal(self.state_dict()[key], mode='fan_out')


In [3]:
# Input: SearchRegion (256x256x3), batch_size:38
# Output: SearchFeature (32x32x1024)
input=torch.zeros(38, 3, 256, 256)
print(pytorch_model_summary.summary(backbone, input, show_input=True))
print(backbone.parameters)
print(type(backbone(input)))



-------------------------------------------------------------------------
      Layer (type)           Input Shape         Param #     Tr. Param #
          ResNet-1     [38, 3, 256, 256]       9,496,196       9,496,196
Total params: 9,496,196
Trainable params: 9,496,196
Non-trainable params: 0
-------------------------------------------------------------------------
<bound method Module.parameters of Model(
  (module): ResNet(
    (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (avgpool): AvgPool2d(kernel_size=7, stride=7, padding=0)
    (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu): ReLU(inplace=True)
    (layer1): Sequential(
      (0): Bottleneck(
        (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_ru

백본 네트워크 생성

In [3]:
# models/backbone/transt_backbone.py (build_backbone-Backbone, line:80)
output_layers=['layer3']
pretrained=True
frozen_layers=()
backbone = backbones.resnet50(output_layers=output_layers, pretrained=pretrained,
                                      frozen_layers=frozen_layers)

ResNet50 (수정본)

- 마지막 스테이지(layer4) 제거
- layer3의 Conv2D를 Dilation Convolution(2)로 변경 -> Receptive Field를 증가
-최종 output: 1024 x W/8 x H/8 [type:collections.OrderedDict]

collections.OrderedDict: [output_layer_name, tensor]로 구성된 자료구조

In [4]:
# Input: SearchRegion (256x256x3), batch_size:38
# Output: SearchFeature (32x32x1024)
input=torch.zeros(38, 3, 256, 256)
print(pytorch_model_summary.summary(backbone, input, show_input=True))
print(backbone.parameters)
print(type(backbone(input)))

--------------------------------------------------------------------------
      Layer (type)            Input Shape         Param #     Tr. Param #
          Conv2d-1      [38, 3, 256, 256]           9,408           9,408
     BatchNorm2d-2     [38, 64, 128, 128]             128             128
            ReLU-3     [38, 64, 128, 128]               0               0
       MaxPool2d-4     [38, 64, 128, 128]               0               0
      Bottleneck-5       [38, 64, 64, 64]          75,008          75,008
      Bottleneck-6      [38, 256, 64, 64]          70,400          70,400
      Bottleneck-7      [38, 256, 64, 64]          70,400          70,400
      Bottleneck-8      [38, 256, 64, 64]         379,392         379,392
      Bottleneck-9      [38, 512, 32, 32]         280,064         280,064
     Bottleneck-10      [38, 512, 32, 32]         280,064         280,064
     Bottleneck-11      [38, 512, 32, 32]         280,064         280,064
     Bottleneck-12      [38, 512, 32,

ResNetRS 실험

- ResNet에 학습 전략, 규제를 변경하고 구조를 살짝 변형한 모델로 EfficientNet 수준의 성능에 더 빠른 훈련 시간을 자랑한다
- 수정된 ResNet처럼 layer4는 사용하지 않는다

In [5]:
from pytorch_resnet_rs.model import ResnetRS
import torch.nn as nn

In [6]:
class MyResNetRS(nn.Module):
    def __init__(self,model):
        super().__init__()
        self.model=model
        self.upsample=nn.Upsample(scale_factor=2, mode='bicubic', align_corners=False)

    def forward(self,x):
        x=self.model(x)
        x=self.upsample(x)
        return x

In [7]:
model=ResnetRS.create_pretrained('resnetrs50', in_ch=3, num_classes=1024,
                        drop_rate=0.)

# layer4~fc는 사용하지 않음
model.layer4=nn.Identity()
model.avg_pool=nn.Identity()
model.fc=nn.Identity()

# upsampling 적용
mymodel=MyResNetRS(model)

# 테스트
input=torch.zeros(38, 3, 256, 256)
print(pytorch_model_summary.summary(mymodel, input, show_input=True))
print(mymodel.parameters)
print(type(mymodel(input)))

--------------------------------------------------------------------------
      Layer (type)            Input Shape         Param #     Tr. Param #
          Resnet-1      [38, 3, 256, 256]      12,379,040      12,379,040
        Upsample-2     [38, 1024, 16, 16]               0               0
Total params: 12,379,040
Trainable params: 12,379,040
Non-trainable params: 0
--------------------------------------------------------------------------
<bound method Module.parameters of MyResNetRS(
  (model): Resnet(
    (conv1): StemBlock(
      (conv1): Sequential(
        (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
        (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (5): ReLU(inplace=True)
  

EfficientNet 실험을 위한 라이브러리 로드

In [None]:
import ltr.admin.settings as ws_settings
import ltr.models.tracking.transt as transt_models
import ltr.models.backbone as backbones
import torch
import torch.nn as nn
import torch.nn.functional as F
import pytorch_model_summary
import math,copy
from torchvision_edit import models_e
from torchvision_edit.ops.misc import Conv2dNormActivation
from efficientnet_pytorch_edit import EfficientNet,utils

Simple UNet + EfficientNetv2 

In [22]:
class MyEfficientNet2(nn.Module):
    def __init__(self,model):
        super().__init__()

        def CBR2d(in_channels,out_channels,kernel_size,stride=1,padding=0,bias=True):
            layers=[]
            layers+=[nn.Conv2d(in_channels=in_channels,out_channels=out_channels,
            kernel_size=kernel_size,stride=stride,padding=padding,bias=bias)]
            layers+=[nn.BatchNorm2d(num_features=out_channels)]
            layers+=[nn.ReLU()]
            conv=nn.Sequential(*layers)
            return conv
        
        # encoder
        self.enc1=CBR2d(3,16,3,padding=1)
        self.pool1=nn.MaxPool2d(kernel_size=2)
        self.enc2=CBR2d(16,32,3,padding=1)
        self.pool2=nn.MaxPool2d(kernel_size=2)
        self.enc3=CBR2d(32,64,3,padding=1)
        self.pool3=nn.MaxPool2d(kernel_size=2)
        self.enc4=CBR2d(64,128,3,padding=1)

        # decoder
        self.dec1=CBR2d(128,64,1)
        self.upsample1=nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False)
        self.dec2_1=CBR2d(2*64,64,1)
        self.dec2_2=CBR2d(64,32,3,padding=1)
        self.upsample2=nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False)
        self.dec3_1=CBR2d(2*32,32,1)
        self.dec3_2=CBR2d(32,16,3,padding=1)
        self.upsample3=nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False)
        self.dec4_1=CBR2d(2*16,16,1)
        self.dec4_2=CBR2d(16,3,3,padding=1)

        # effnet
        self.effconv=model
        self.upsample_x4=nn.Upsample(scale_factor=4, mode='bilinear', align_corners=False) 
        self.last3=CBR2d(1280,128,1)
        self.last2=CBR2d(2*128,256,3,padding=1)
        self.last1=CBR2d(256,1024,1)

    def forward(self,x): 
        # Simple UNet
        enc1=self.enc1(x)
        pool1=self.pool1(enc1)
        enc2=self.enc2(pool1)
        pool2=self.pool2(enc2)
        enc3=self.enc3(pool2)
        pool3=self.pool3(enc3)
        enc4=self.enc4(pool3)

        dec1=self.dec1(enc4)
        upsample1=self.upsample1(dec1)
        cat1=torch.cat((upsample1,enc3),dim=1)
        dec2_1=self.dec2_1(cat1)
        dec2_2=self.dec2_2(dec2_1)
        upsample2=self.upsample2(dec2_2)
        cat2=torch.cat((upsample2,enc2),dim=1)
        dec3_1=self.dec3_1(cat2)
        dec3_2=self.dec3_2(dec3_1)
        upsample3=self.upsample3(dec3_2)
        cat3=torch.cat((upsample3,enc1),dim=1)
        dec4_1=self.dec4_1(cat3)
        dec4_2=self.dec4_2(dec4_1)

        # EfficientNet and Skip Connection
        eff=self.effconv(dec4_2)
        effup=self.upsample_x4(eff)
        last3=self.last3(effup)
        cat4=torch.cat((last3,enc4),dim=1)
        last2=self.last2(cat4)
        last1=self.last1(last2)
        x=last1

        return x  

BasicNet

In [12]:
class MyEfficientNet(nn.Module):
    def __init__(self,model):
        super().__init__()

        def CBR2d(in_channels,out_channels,kernel_size,stride=1,padding=0,bias=True):
            layers=[]
            layers+=[nn.Conv2d(in_channels=in_channels,out_channels=out_channels,
            kernel_size=kernel_size,stride=stride,padding=padding,bias=bias)]
            layers+=[nn.BatchNorm2d(num_features=out_channels)]
            layers+=[nn.ReLU()]
            conv=nn.Sequential(*layers)
            return conv

        self.downsample=nn.Upsample(scale_factor=1/8, mode='bilinear', align_corners=False) 

        self.upsampleSin=nn.Upsample(scale_factor=(300/256), mode='bilinear', align_corners=False) 
        self.upsampleSout=nn.Upsample(scale_factor=(32/10), mode='bilinear', align_corners=False) 
        self.upsampleTin=nn.Upsample(scale_factor=(128/128), mode='bilinear', align_corners=False) 
        self.upsampleTout=nn.Upsample(scale_factor=(16/4), mode='bilinear', align_corners=False) 

        self.btn1=CBR2d(1280,3,1)
        self.btn2=CBR2d(2*3,6,3,padding=1)
        self.btn3=CBR2d(6,1280,1)

        self.relu=nn.ReLU()
        self.effconv=model

    def forward(self,x):
        # tensor x_width size 
        y=x.size(dim=3) 

        # downsampling (1/8)
        ds=self.downsample(x)

        # Search Region
        if y==256:
            x=self.upsampleSin(x) 
            x = self.effconv(x)
            x=self.upsampleSout(x) 
        # Template
        elif y==128:
            x=self.upsampleTin(x) 
            x = self.effconv(x) 
            x=self.upsampleTout(x)
    
        # skip connection with bottleneck
        btn1=self.btn1(x)
        cat=torch.cat((btn1,ds),dim=1)
        btn2=self.btn2(cat)
        btn3=self.btn3(btn2)
        x=x+btn3
        x=self.relu(x)

        return x

In [14]:
model=models_e.efficientnet_v2_m(weights=models_e.EfficientNet_V2_M_Weights.IMAGENET1K_V1)
mymodel=MyEfficientNet(model)
input=torch.zeros(1, 3, 128, 128)
print(pytorch_model_summary.summary(mymodel, input, show_input=True))
print(mymodel.parameters)

-------------------------------------------------------------------------
      Layer (type)           Input Shape         Param #     Tr. Param #
        Upsample-1      [1, 3, 128, 128]               0               0
        Upsample-2      [1, 3, 128, 128]               0               0
    EfficientNet-3      [1, 3, 128, 128]      54,139,356      54,139,356
        Upsample-4       [1, 1280, 4, 4]               0               0
          Conv2d-5     [1, 1280, 16, 16]           3,843           3,843
     BatchNorm2d-6        [1, 3, 16, 16]               6               6
            ReLU-7        [1, 3, 16, 16]               0               0
          Conv2d-8        [1, 6, 16, 16]             330             330
     BatchNorm2d-9        [1, 6, 16, 16]              12              12
           ReLU-10        [1, 6, 16, 16]               0               0
         Conv2d-11        [1, 6, 16, 16]           8,960           8,960
    BatchNorm2d-12     [1, 1280, 16, 16]          

### TransT의 변경된 Feature Extractor 실행 구조
1. run_training.py에서 settings 객체 생성, train_settings/transt/transt.py경로의 run함수 실행
2. run 함수에서 settings 객체의 필드를 초기화, models/tracking/transt.py 경로의 transt_effnet함수를 실행
3. transt_effnet함수에서 backbone을 불러오기 위해 models/backbone/transt_backbone.py 경로의 build_backbone함수를 실행
4. build_backbone함수에서 backbone을 정의하기 위해 MyBackbone 클래스 생성자를 호출 (인스턴스화)
5. Backbone 생성자에서 backbone을 생성하기 위해 models/backbone/efficientnet.py 경로의 effnet함수를 실행
6. effnet함수에서 모델 생성 후 반환

- run_training.py
- train_settings/transt/transt.py (run)
- models/tracking/transt.py (transt_effnet)
- models/backbone/transt_backbone.py (build_backbone, Backbone Class<MyBacknboneBase 상속>)
- models/backbone/efficientnet.py (effnet)
- 모든 경로는 TransT/ltr 디렉토리에 위치

P.s. MyEfficientNet의 출력 타입과 기존 네트워크의 출력 타입이 다르므로 BackboneBase 클래스를 수정해서 MyBackboneBase 클래스를 제작함