### SiamFC 
https://arxiv.org/pdf/1606.09549v2.pdf  
: fully-convolutional Siamese network를 오프라인으로 훈련시켜 FPS를 높인 모델

### 0. Abstract
- 2016년에 나온 모델로, 이전의 모델들은 online으로 object의 appearance를 학습하는 모델을 훈련시켜 한계가 있었음.
- 이를 해결하기 위해 end-to-end로 훈련될 수 있는 novel fully-convolutional Siamese network를 제안함

### 1. Introduction
- 더 큰 search image에서 exemplar image의 위치를 찾도록 Siamese network를 훈련시킴
- 두 input(search, exemplar)의 cross-correlation을 계산하여 dense하고 효율적인 sliding-window evaluation을 가능하도록 함

### 2. Deep similiarity learning for tracking
- x와 z의 similiarity 정도를 나타내는 $f(z, x)$를 학습하고, 이 값이 가장 큰 candidate를 선택한다. 
- object의 초기 appearance를 exemplar로 사용하며, f로 deep conv-net을 이용

<img src='img/siamfc_1.png' width='200'>
<img src='img/siamfc_2.png' width='400'>  
> - g : simple distance or similiarity metric
> - $\psi$ : embedding
> - \* : cross-correlation

### __2.1 Fully-convolutional Siamese architecture__
- 두 이미지에 대한 embedding을 cross-correlation하는 부분을 아래과 같은 식으로 표현할 수 있음
- Fully-convolutional의 장점은 더 큰 search image에 대해서 계산할 수 있다는 점이다.
- network가 symmetric (f(x, z) = f(z, x))이기 때문에, 다른 크기의 exemplar image도 사용할 수 있다. 
- Output은 a score map이다  

<img src='img/siamfc_3.png' width='150'>
<img src='img/siamfc_4.png' width='150'>
> - $L_T$ : translation operator  
(여기서, translation이란 x, y축 방향으로 일정한 양만큼 이동시키는 과정)  
> - $h$ : a function that maps signals to signals
<br/>



- $f(z, x)$에 대한 정확한 식은 아래와 같다.  
<img src='img/siamfc_5.png' width='200'>    


- Tracking 시, 이전 프레임에서의 target의 위치를 중심점으로 한 search image를 이용하며, maximum score를 가지는 위치를 현재 프레임의 위치로 추정한다. 

### __2.2 Training__
- Loss function으로 logistic loss를 이용하며, score map의 loss를 아래과 같이 정의한다.  
<img src='img/siamfc_6.png' width='200'>
<img src='img/siamfc_7.png' width='200'>   


- SGD를 이용하여 이를 최소화하는 parameter를 학습한다.
- 훈련 시, object의 class는 무시된다.
- score map의 원소들은 아래와 같이 정의된다.   
<img src='img/siamfc_8.png' width='200'>    


- 클래스 불균형을 제거하기 위해 loss의 가중합을 사용한다.

### models

#### builder.py

In [None]:
import math
import torch
from torchvision import models
import torch.nn.functional as F
import torch.nn.init as init
import torch.nn as nn
from torch.autograd import Variable
from .heads import Corr_Up, MultiRPN, DepthwiseRPN
from .backbones import AlexNet, Vgg, ResNet22, Incep22, ResNeXt22, ResNet22W, resnet50, resnet34, resnet18
from neck import AdjustLayer, AdjustAllLayer
from .utils import load_pretrain

__all__ = ['SiamFC_', 'SiamFC', 'SiamVGG', 'SiamFCRes22', 'SiamFCIncep22', 'SiamFCNext22', 'SiamFCRes22W',
           'SiamRPN', 'SiamRPNVGG', 'SiamRPNRes22', 'SiamRPNIncep22', 'SiamRPNResNeXt22', 'SiamRPNPP']


class SiamFC_(nn.Module):
    def __init__(self):
        super(SiamFC_, self).__init__()
        self.features = None
        # self.head = None

    def head(self, z, x):
        n, c, h, w = x.size()
        x = x.view(1, n * c, h, w)
        out = F.conv2d(x, z, groups=n)
        out = out.view(n, 1, out.size(-2), out.size(-1))
        return out

    def feature_extractor(self, x):
        return self.features(x)

    def connector(self, template_feature, search_feature):
        pred_score = self.head(template_feature, search_feature)
        return pred_score

    def branch(self, allin):
        allout = self.feature_extractor(allin)
        return allout

    def forward(self, template, search):
        zf = self.feature_extractor(template)
        xf = self.feature_extractor(search)
        score = self.connector(zf, xf)
        return score


class SiamFC(SiamFC_):

    def __init__(self):
        super(SiamFC, self).__init__()
        self.features = AlexNet()
        self._initialize_weights()

    def forward(self, z, x):
        zf = self.features(z)
        xf = self.features(x)
        score = self.head(zf, xf)
        return score

    def head(self, z, x):
        # fast cross correlation
        n, c, h, w = x.size()
        x = x.view(1, n * c, h, w)
        out = F.conv2d(x, z, groups=n)
        out = out.view(n, 1, out.size(-2), out.size(-1))

        # adjust the scale of responses
        out = 0.001 * out + 0.0

        return out

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                init.kaiming_normal_(m.weight.data, mode='fan_out',
                                     nonlinearity='relu')
                m.bias.data.fill_(0)
                
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()

### train.py

In [None]:
import sys
import os
import time
import json
import random
import math
import numpy as np
import argparse
import cv2
import h5py

import torch
import torch.nn as nn
from torch.autograd import Variable
from torchvision import datasets, transforms
import torch.nn.functional as F
from models.loss import *
from image import load_data, generate_anchor, load_data_rpn
from models.builder import *
import dataset
from mmcv import Config
from utils import save_checkpoint, is_valid_number, bbox_iou
from models.utils import load_pretrain
from models.lr_scheduler import *


parser = argparse.ArgumentParser(description='PyTorch SiameseX')

parser.add_argument('--config', metavar='model', default='configs/SiamRPN.py', type=str,
                    help='which model to use.')

parser.add_argument('--pre', '-p', metavar='PRETRAINED', default=None, type=str,
                    help='path to the pretrained model')

parser.add_argument('--gpu', metavar='GPU', default='0', type=str,
                    help='GPU id to use.')

In [None]:
def main():
    
    global args, best_prec1, weight, segmodel
    
    best_prec1 = 0
    prec1 = 0
    coco = 0
    temp_args = parser.parse_args()

    args = Config.fromfile(temp_args.config)
    args.pre = temp_args.pre
    args.gpu = temp_args.gpu

    with open('./data/ilsvrc_vid_new.txt', 'r') as outfile:
        args.ilsvrc = json.load(outfile)
    with open('./data/vot2018_new.txt', 'r') as outfile:
        args.vot2018 = json.load(outfile)
    if os.path.isfile('youtube_final_new.txt'):
        with open('youtube_final_new.txt', 'r') as outfile:
            args.youtube = json.load(outfile)
    else:
        args.youtube = None
    # with open('vot2018.txt', 'r') as outfile:
    #     args.vot2018 = json.load(outfile)

    os.environ['CUDA_VISIBLE_DEVICES'] = args.gpu  # 해당 GPU에만 메모리를 할당
                                        # 해주지 않으면 multi-GPU 시스템에서 모든 GPU에 메모리 할당
    torch.cuda.manual_seed(args.seed)  # gpu 연산 randomness 고정

    elif args.model == 'SiamFC':
        model = SiamFC()

    model = model.cuda()
    model = model.eval()  # eval mode (batchnorm이나 dropout layer들이 eval mode로 사용되도록)


    criterion = nn.SoftMarginLoss(size_average=False).cuda()  # for SiamFC and SiamVGG

    # -------------------------------------------------------
    # cf) soft margin loss 
    #   : two-class classification logistic loss between input tensor x and target tensor y (1/-1)

    #     - loss = sum_i (log(1 + exp(-y[i] * x[i])) / n)


    # nn.SoftMarginLoss의 옵션
    #   size_average=T : loss를 평균낼 것인지 (F면, summation)
    # -------------------------------------------------------

    if 'SiamRPNPP' in args.model:
        optimizer, lr_scheduler = build_opt_lr(model, args.start_epoch)
    else:
        optimizer = torch.optim.SGD(model.parameters(), args.lr,
                                    momentum=args.momentum,
                                    weight_decay=args.decay)

    if args.pre:
        if os.path.isfile(args.pre):
            print("=> loading checkpoint '{}'".format(args.pre))
            checkpoint = torch.load(args.pre)
            args.start_epoch = checkpoint['epoch']
            args.start_epoch = 0

            best_prec1 = checkpoint['best_prec1']
            best_prec1 = 0
            
            model.load_state_dict(checkpoint['state_dict'])
            optimizer.load_state_dict(checkpoint['optimizer'])
            print("=> loaded checkpoint '{}' (epoch {})"
                  .format(args.pre, checkpoint['epoch']))
        else:
            print("=> no checkpoint found at '{}'".format(args.pre))
    
    # -------------------------------------------------------
    # cf) checkpoint
    #   - state_dict : 모델의 매개변수를 담는 dictionary
    #                  (model의 경우, weights들 & optimizer의 경우, lr와 같은 params)
    # -------------------------------------------------------

    # prec1 = 0

    if not os.path.isdir('./cp/temp'):
        os.makedirs('./cp/temp')
    
    for epoch in range(args.start_epoch, args.epochs+1):

        if 'SiamRPNPP' in args.model:
            if args.backbone_train_epoch == epoch:
                print('start training backbone.')
                optimizer, lr_scheduler = build_opt_lr(model, epoch)

            lr_scheduler.step(epoch)  # lr 갱신
            cur_lr = lr_scheduler.get_cur_lr()

        else:
            cur_lr = adjust_learning_rate(optimizer, epoch)

        print('current learning rate : {}'.format(cur_lr))

        if args.model in ['SiamFC', 'SiamVGG', 'SiamFCRes22', 'SiamFCIncep22', 'SiamFCNext22']:
            train(model, criterion, optimizer, epoch, coco)
        elif args.model in ['SiamRPN', 'SiamRPNVGG', 'SiamRPNRes22', 'SiamRPNIncep22', 'SiamRPNResNeXt22']:
            trainRPN(model, optimizer, epoch, coco)
        elif args.model in ['SiamRPNPP']:
            trainRPNPP(model, optimizer, epoch, coco)

        # is_best = False
        
        is_best = prec1 > best_prec1
        
        best_prec1 = max(prec1, best_prec1)

        if epoch % 100 == 0:
            torch.save(model.state_dict(), './cp/temp/{}_{}.pth'.format(args.model, epoch))
        
        # print(' * best MAE {mae:.3f} '
        #       .format(mae=best_prec1))
        # print(' * MAE {mae:.3f} '
        #       .format(mae=prec1))

        # best_prec1 보다 크면 현재 모델 save, 아니면 기존의 args.model save
        save_checkpoint({
            'epoch': epoch + 1,
            'arch': args.pre,
            'state_dict': model.state_dict(),
            'best_prec1': best_prec1,
            'optimizer': optimizer.state_dict(),
        }, is_best, args.model)

            
def train(model, criterion, optimizer, epoch, coco):
    losses = AverageMeter()   # Computes and stores the average and current value (using update)
    batch_time = AverageMeter()
    data_time = AverageMeter()

    # listDataset을 이용하여 3개의 dataset를 load하는 dataloader 만들기
    train_loader = torch.utils.data.DataLoader(
        dataset.listDataset(args.ilsvrc, args.youtube, args.data_type,
                            shuffle=True,
                            transform=transforms.Compose([
                                                        transforms.ToTensor(),
                                                        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                                             std=[0.229, 0.224, 0.225])]),
                            train=True,
                            batch_size=args.batch_size,
                            num_workers=args.workers, coco=coco),
        batch_size=args.batch_size)

    model.train()
    end = time.time()
    
    for i, (z, x, template, gt_box)in enumerate(train_loader):

        # z : exemplar image
        # x : search image
        # template : 각 position에 대한 true label {1, -1}

        data_time.update(time.time() - end)
        
        z = z.cuda()
        z = Variable(z)
        x = x.cuda()
        x = Variable(x)
        template = template.type(torch.FloatTensor).cuda()  
        template = Variable(template)
        
        oup = model(z, x)

        if isinstance(model, SiamFC) or isinstance(model, SiamVGG):
            loss = criterion(oup, template)
        elif isinstance(model, SiamFCRes22):
            loss = model.train_loss(oup, template)

        losses.update(loss.item(), x.size(0))

        optimizer.zero_grad()
        loss.backward()

        if isinstance(model, SiamFCRes22):
            torch.nn.utils.clip_grad_norm_(model.parameters(), 10)  # gradient clip

        if is_valid_number(loss.item()):
            optimizer.step()  # 파라미터 update

        # optimizer.step()
        
        batch_time.update(time.time() - end)
        end = time.time()

        if i % args.print_freq == 0:
            print('Epoch: [{0}][{1}/{2}]\t'
                  'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
                  'Data {data_time.val:.3f} ({data_time.avg:.3f})\t'
                  'Loss {loss.val:.4f} ({loss.avg:.4f})\t'.format(
                   epoch, i, len(train_loader), batch_time=batch_time,
                   data_time=data_time, loss=losses))


LRs = {
    'log': LogScheduler,
    'step': StepScheduler,
    'multi-step': MultiStepScheduler,
    'linear': LinearStepScheduler,
    'cos': CosStepScheduler}


def _build_lr_scheduler(optimizer, lr_type, epochs=50, last_epoch=-1):
    return LRs[lr_type](optimizer, last_epoch=last_epoch,
                            epochs=epochs, new_allowed=True)


def _build_warm_up_scheduler(optimizer, epochs=50, last_epoch=-1):
    warmup_epoch = args.lr_warm_epoch
    sc1 = _build_lr_scheduler(optimizer, 'step', warmup_epoch, last_epoch)
    sc2 = _build_lr_scheduler(optimizer, 'log', epochs - warmup_epoch, last_epoch)
    return WarmUPScheduler(optimizer, sc1, sc2, epochs, last_epoch)


def build_lr_scheduler(optimizer, epochs=50, last_epoch=-1):
    if args.warmup:
        return _build_warm_up_scheduler(optimizer, epochs, last_epoch)
    else:
        return _build_lr_scheduler(optimizer, args.original_lr, epochs, last_epoch)


def build_opt_lr(model, current_epoch=0):
    if current_epoch >= 20:
        for layer in ['layer2', 'layer3', 'layer4']:
            for param in getattr(model.features, layer).parameters():
                param.requires_grad = True
            for m in getattr(model.features, layer).modules():
                if isinstance(m, nn.BatchNorm2d):
                    m.train()
    else:
        for param in model.features.parameters():
            param.requires_grad = False
        for m in model.features.modules():
            if isinstance(m, nn.BatchNorm2d):
                m.eval()

    trainable_params = []
    trainable_params += [{'params': filter(lambda x: x.requires_grad,
                                           model.features.parameters()),
                          'lr': 0.1 * args.original_lr}]

    trainable_params += [{'params': model.neck.parameters(),
                        'lr': args.original_lr}]

    trainable_params += [{'params': model.head.parameters(),
                          'lr': args.original_lr}]

    optimizer = torch.optim.SGD(trainable_params,
                                momentum=args.momentum,
                                weight_decay=args.decay)

    lr_scheduler = build_lr_scheduler(optimizer, epochs=args.epochs)
    lr_scheduler.step(args.start_epoch)
    return optimizer, lr_scheduler


class AverageMeter(object):
    """Computes and stores the average and current value"""
    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count    


if __name__ == '__main__':
    main()        

### demo.py

In [None]:
import torch
import argparse
import sys
import cv2
import numpy as np
import time
import demo_utils.vot as vot
from demo_utils.siamvggtracker import SiamVGGTracker


# *****************************************
# VOT: Create VOT handle at the beginning
#      Then get the initializaton region
#      and the first image
# *****************************************

parser = argparse.ArgumentParser(description='PyTorch SiameseX demo')

parser.add_argument('--model', metavar='model', default='SiamFCNext22', type=str,
                    help='which model to use.')

args = parser.parse_args()

handle = vot.VOT("rectangle")
selection = handle.region()

# Process the first frame
imagefile = handle.frame()  # get frame from client
tracker = SiamVGGTracker(args.model, imagefile, selection)

if not imagefile:
    sys.exit(0)

toc = 0

while True:
    # *****************************************
    # VOT: Call frame method to get path of the 
    #      current image frame. If the result is
    #      null, the sequence is over.
    # *****************************************

    tic = cv2.getTickCount()  # 연산시간을 구하기 위해 tick 횟수 차이를 구하고, 틱 주파수를 나눠줘야 함

    imagefile = handle.frame()
    image = cv2.imread(imagefile)
    if not imagefile:
        break
    region, confidence = tracker.track(imagefile)
    toc += cv2.getTickCount() - tic

    region = vot.Rectangle(region.x, region.y, region.width, region.height)
    
    # *****************************************
    # VOT: Report the position of the object
    #      every frame using report method.
    # *****************************************
    handle.report(region, confidence)
    cv2.rectangle(image, (int(region.x), int(region.y)), (int(region.x + region.width), int(region.y + region.height)), (0, 255, 255), 3)
    cv2.imshow('SiameseX', image)
    cv2.waitKey(1)
    # if cv2.waitKey() == 27:
    #     break

    print('Tracking Speed {:.1f}fps'.format((len(handle) - 1) / (toc / cv2.getTickFrequency())))
