#### FINETUNING TORCHVISION MODELS

<목차>
- Intro
- 입력
- 모델 학습/검증 코드
- 초기화 & 네트워크 모양 수정
- 모델 파라미터 .requires_grad 속성

#### Intro
- 이미지넷 1000으로 학습된 모델을 Finetune 해본다.
- 모든 시나리오에 잘 동작하는 상용 코드는 없으므로, 연구자는 존재하는 모델을 살펴보면서 커스텀하게 각 모델을 조정할 필요가 있다.
- finetuning 과 feature extraction 라는 두가지 전이 학습을 배운다.
    - finetuning은 모든 모델 파라미터를 기존에 학습된 weights에서 시작해 재 학습하고
    - feature extraction은 출력 레이어만 변경한다.
  
- 본 튜토에서는 아래 4가지를 배운다.
    - pretrained 모델을 초기화 하는 법
    - 출력 레이어를 Reshape 하여 커스텀 데이터에 맞추는 법
    - (학습 중에 업데이트 하고자 하는) 파라미터를 최적화 알고리즘을 정의하는 법
    - 학습 스텝을 수행하는 법

#### 입력
- 여기서는 hymenoptera_data 데이터 셋 이용한다(벌, 개미 2가지 클래스)
- num_classes, num_epochs, feature_extraction 여부 등등을 정해준다.

In [1]:
from __future__ import print_function
from __future__ import division
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy
print("PyTorch Version: ",torch.__version__)
print("Torchvision Version: ",torchvision.__version__)

# Top level data directory. Here we assume the format of the directory conforms
#   to the ImageFolder structure
data_dir = "./data/hymenoptera_data"

# Models to choose from [resnet, alexnet, vgg, squeezenet, densenet, inception]
model_name = "squeezenet"

# Number of classes in the dataset
num_classes = 2

# Batch size for training (change depending on how much memory you have)
batch_size = 8

# Number of epochs to train for
num_epochs = 15

# Flag for feature extracting. When False, we finetune the whole model,
#   when True we only update the reshaped layer params
feature_extract = True

  from .autonotebook import tqdm as notebook_tqdm


PyTorch Version:  1.10.0+cu113
Torchvision Version:  0.11.1+cu113


#### 모델 학습/검증 코드
- 헬퍼 함수를 작성한다. train_model 함수는 (모델, 로더, 로스 함수, 최적화, 에폭)을 인자로 받는다.
- 본 함수는 train 후에 val 수행
- 또한, val accuracy를 기반으로 최적의 모델을 추적한다.
- 매 에폭 후에 train/val 정확도가 출력된다.

In [None]:
# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
    'train': transforms.Compose([
        transforms.RandomResizedCrop(input_size),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize(input_size),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

print("Initializing Datasets and Dataloaders...")

# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
# Create training and validation dataloaders
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val']}

# Detect if we have a GPU available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def train_model(model, dataloaders, criterion, optimizer, num_epochs=25, is_inception=False):
    since = time.time()

    val_acc_history = []

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.seg_grad_enabled(phase=='train'):
                    # Get model outputs and calculate loss
                    # Special case for inception because in training it has an auxiliary output. In train
                    #   mode we calculate the loss by summing the final output and the auxiliary output
                    #   but in testing we only consider the final output.
                    if is_inception and phase == 'train':
                        # From https://discuss.pytorch.org/t/how-to-optimize-inception-model-with-auxiliary-classifiers/7958
                        outputs, aux_outputs = model(inputs)
                        loss1 = criterion(outputs, labels)
                        loss2 = criterion(aux_outputs, labels)
                        loss = loss1 + 0.4*loss2
                    else:
                        outputs = model(inputs)
                        loss = criterion(outputs, labels)

                    _, preds = torch.max(outputs, 1)

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            
            epoch_loss = running_loss / len(dataloaders[phase].dataset)
            epoch_acc = running_corrects.double() / len(dataloaders[phase].dataset)
            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))

            # deep copy the model
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            if phase == 'val':
                val_acc_history.append(epoch_acc)

        print()
    
    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))
    
    # load best model weights
    model.load_state_dict(best_model_wts)
    return model, val_acc_history

#### 모델 파라미터 .requires_grad 속성
- Feature Extracting 할때는 새로 초기화된 레이어 그레디언트 계산만 수행하므로 다른 매개변수는 그레이언트가 필요하지 않다.

In [None]:
def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False


#### 초기화 & 네트워크 모양 수정
- 해당 단계는 모델 마다 방법이 다르다
- 마지막 레이어 모양 변경 수행
- 인셉션_v3는 입력크기가 (299, 299) 인 반면, 다른 모델은 (224, 224) 이다.


In [None]:
# Resnet
(fc): Linear(in_features=512, out_features=1000, bias=True)
model.fc = nn.Linear(512, num_classes)

# Alexnet
(classifier): Sequential(
    ...
    (6): Linear(in_features=4096, out_features=1000, bias=True)
 )
model.classifier[6] = nn.Linear(4096, num_classes)

# SqueezeNet
(classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Conv2d(512, 1000, kernel_size=(1, 1), stride=(1, 1))
    (2): ReLU(inplace)
    (3): AvgPool2d(kernel_size=13, stride=1, padding=0)
 )
model.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1, 1), strid2=1)

# DenseNet
(classifier): Linear(in_features=1024, out_features=1000, bias=True)
model.classifier = nn.Linear(1024, num_classes)



In [None]:
def initialize_model(model_name, num_classes, feature_extract, use_pretrained=True):
    # Initialize these variables which will be set in this if statement. Each of these
    #   variables is model specific.
    model_ft = None
    input_size = 0

    if model_name == "resnet":
        """ Resnet18
        """
        model_ft = models.resnet18(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "alexnet":
        """ Alexnet
        """
        model_ft = models.alexnet(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224

    elif model_name == "vgg":
        """ VGG11_bn
        """
        model_ft = models.vgg11_bn(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier[6].in_features
        model_ft.classifier[6] = nn.Linear(num_ftrs,num_classes)
        input_size = 224

    elif model_name == "squeezenet":
        """ Squeezenet
        """
        model_ft = models.squeezenet1_0(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        model_ft.classifier[1] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))
        model_ft.num_classes = num_classes
        input_size = 224

    elif model_name == "densenet":
        """ Densenet
        """
        model_ft = models.densenet121(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        num_ftrs = model_ft.classifier.in_features
        model_ft.classifier = nn.Linear(num_ftrs, num_classes)
        input_size = 224

    elif model_name == "inception":
        """ Inception v3
        Be careful, expects (299,299) sized images and has auxiliary output
        """
        model_ft = models.inception_v3(pretrained=use_pretrained)
        set_parameter_requires_grad(model_ft, feature_extract)
        # Handle the auxilary net
        num_ftrs = model_ft.AuxLogits.fc.in_features
        model_ft.AuxLogits.fc = nn.Linear(num_ftrs, num_classes)
        # Handle the primary net
        num_ftrs = model_ft.fc.in_features
        model_ft.fc = nn.Linear(num_ftrs,num_classes)
        input_size = 299

    else:
        print("Invalid model name, exiting...")
        exit()

    return model_ft, input_size

# Initialize the model for this run
model_ft, input_size = initialize_model(model_name, num_classes, feature_extract, use_pretrained=True)

# Print the model we just instantiated
print(model_ft)

#### torch.nn.functional.cross_entropy(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean', label_smoothing=0.0)
- input: (C), (N, C), (N, C, d1, d2, ..., dk) K>=1 모양, K-차원 로스 일때??
- Target: (), (N), (N, d1, d2, ..., dk) K>=1 모양
- 즉, 라벨 값만 넣어도 되고, 확률 값을 넣어도 된다.

In [9]:
import torch
import torch.nn.functional as F

# Example of target with class indices
input = torch.randn(3, 5, requires_grad=True) # 배치는 3, 클래스 갯수는 5
target = torch.randint(5, (3,), dtype=torch.int64)
print(input,'\n' ,target)
print(input.shape, target.shape)
loss = F.cross_entropy(input, target)
print(loss)
loss.backward()
print('\n---------------\n')
# Example of target with class probabilities
input = torch.randn(3, 5, requires_grad=True)
target = torch.randn(3, 5).softmax(dim=1)
print(input,'\n' ,target) # 입력과 타겟의 모양을 맞출 땐, target이 one_hot이 아니다?? 타겟이 어떻게 확률이 될 수 있지??
print(input.shape, target.shape)
loss = F.cross_entropy(input, target)
print(loss)
loss.backward()

tensor([[ 0.7814,  2.1470, -1.3755, -0.3688,  0.2774],
        [ 0.4995,  1.0507, -0.4892, -0.5633,  0.5913],
        [ 0.0443, -0.0070, -0.0354,  0.5470,  1.2681]], requires_grad=True) 
 tensor([3, 1, 2])
torch.Size([3, 5]) torch.Size([3])
tensor(2.0160, grad_fn=<NllLossBackward0>)

---------------

tensor([[ 0.5883, -1.7306,  0.4168, -1.4277, -0.1250],
        [-0.8024,  0.1363,  0.0963, -1.0247,  0.1175],
        [-0.8248,  0.1321, -0.8134, -0.1165,  0.0946]], requires_grad=True) 
 tensor([[0.0117, 0.1871, 0.3576, 0.3298, 0.1139],
        [0.6213, 0.1484, 0.0763, 0.0207, 0.1333],
        [0.0689, 0.1283, 0.1559, 0.5043, 0.1427]])
torch.Size([3, 5]) torch.Size([3, 5])
tensor(1.8969, grad_fn=<DivBackward1>)
