# [09] 모델 학습하기

데이터, 라벨, 모델, 손실함수가 모두 준비되었기에 모델을 학습할 수 있게 되었습니다. 모델을 학습하기 위해서 우리든 다음의 요소들을 정의합니다.

- Dataloader
- Model
- Criterion
- Optimizer
- Scheduler

이외에 학습을 위해서 다양한 Hyperparameter들을 설정해야 합니다. 아래의 그림을 통해 Batch와 Epoch, Iteration에 대한 개념을 정리하고 설정된 학습 요소들을 살펴봅시다.

In [1]:
from IPython.display import HTML, display

# Image from https://www.slideshare.net/w0ong/ss-82372826
display(HTML("<img src='img/[09]epoch.png'>"))

In [2]:
import torch, time
from torch.utils.data.dataloader import DataLoader
import torch.optim as optim
import torch.optim.lr_scheduler as scheduler

from materials.DetectionNet import DetectionNet, create_prior_boxes
from materials.MultiboxLoss import MultiBoxLoss
from materials.datasets import PascalVOCDataset
from materials.utils import *

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

dataset = PascalVOCDataset(data_folder='./data/VOC', split='TRAIN')
dataloader = DataLoader(dataset=dataset, batch_size=8, shuffle=True, 
                              collate_fn=dataset.collate_fn, num_workers=4)

model = DetectionNet(n_classes=21, unfreeze_keys=['15', 'head', 'bn1'], use_bias=True).to(device)
criterion = MultiBoxLoss(priors_cxcy=create_prior_boxes(), threshold=0.5, neg_pos_ratio=0.3, alpha=1.0)

num_epochs = 20
lr, momentum, weight_decay = 1e-3, 0.9, 5e-4

biases, not_biases = [], []

for param_name, param in model.named_parameters():
    if param.requires_grad:
        if param_name.endswith('.bias'):
            biases.append(param)
        else:
            not_biases.append(param)
optimizer = torch.optim.SGD(params=[{'params': biases, 'lr': 2 * lr}, {'params': not_biases}],
                            lr=lr, momentum=momentum, weight_decay=weight_decay)

#optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum, weight_decay=weight_decay, nesterov=False)
scheduler = scheduler.CosineAnnealingLR(optimizer, T_max=num_epochs, eta_min=0.00005)


Loaded pretrained weights for efficientnet-b0


-------------------
## [Task 1] Training Loop 만들기
Training 과정에서 Batch마다, Epoch마다 어떠한 동작을 수행하는지를 지정함으로써 학습이 진행되도록 만들어 봅시다.

### ToDo: `train` 함수 완성하기

train 함수를 완성하고 학습을 시작해 봅시다.

In [3]:
def train(model, dataloader, criterion, optimizer, scheduler=None, 
          num_epochs=200, grad_clip=None, print_freq=1, 
          save_name='test', device=device):
    
    model.train()
    
    for epoch in range(num_epochs):
        batch_time, data_time, losses = AverageMeter(), AverageMeter(), AverageMeter()
        start = time.time()
    
        for i, (images, boxes, labels, _) in enumerate(dataloader):
            data_time.update(time.time()-start)

            images = images.to(device)
            boxes = [b.to(device) for b in boxes]
            labels = [l.to(device) for l in labels]

            # forward pass
            pred_locs, pred_scores = model(images)
            loss = criterion(pred_locs, pred_scores, boxes, labels)
            if loss > 100:
                continue
            # backward pass
            optimizer.zero_grad()
            loss.backward()

            if grad_clip is not None:
                clip_gradient(optimizer, grad_clip)

            optimizer.step()

            losses.update(loss.item(), images.size(0))
            batch_time.update(time.time() - start)

            start = time.time()

            # Print status
            if i % print_freq == 0:
                print('Epoch: [{0}][{1}/{2}]\t'
                      'Batch Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
                      'Data Time {data_time.val:.3f} ({data_time.avg:.3f})\t'
                      'Loss {loss.val:.4f} ({loss.avg:.4f})\t'.format(epoch, i, len(dataloader),
                                                                      batch_time=batch_time,
                                                                      data_time=data_time, loss=losses))
        if scheduler is not None:
            scheduler.step()
            
            
    torch.save(model.state_dict(), './{}_{}.pth'.format(save_name, round(losses.val, 3)))


In [4]:
import warnings
warnings.filterwarnings("ignore")

# train을 수행합니다.
train(model, dataloader, criterion, optimizer, scheduler, 
      num_epochs=1, grad_clip=None, print_freq=50, save_name='test')

Epoch: [0][0/2069]	Batch Time 0.943 (0.943)	Data Time 0.613 (0.613)	Loss 22.7096 (22.7096)	
Epoch: [0][50/2069]	Batch Time 0.113 (0.094)	Data Time 0.050 (0.024)	Loss 13.7948 (15.6236)	
Epoch: [0][100/2069]	Batch Time 0.073 (0.085)	Data Time 0.011 (0.018)	Loss 11.9240 (13.8904)	
Epoch: [0][150/2069]	Batch Time 0.066 (0.082)	Data Time 0.001 (0.016)	Loss 7.7615 (12.8175)	
Epoch: [0][200/2069]	Batch Time 0.088 (0.081)	Data Time 0.028 (0.015)	Loss 10.8786 (12.3814)	
Epoch: [0][250/2069]	Batch Time 0.095 (0.081)	Data Time 0.029 (0.015)	Loss 10.4246 (11.9725)	
Epoch: [0][300/2069]	Batch Time 0.093 (0.080)	Data Time 0.027 (0.014)	Loss 10.7510 (11.6937)	
Epoch: [0][350/2069]	Batch Time 0.089 (0.080)	Data Time 0.025 (0.014)	Loss 9.9810 (11.4010)	
Epoch: [0][400/2069]	Batch Time 0.080 (0.079)	Data Time 0.013 (0.014)	Loss 8.0719 (11.0602)	
Epoch: [0][450/2069]	Batch Time 0.069 (0.078)	Data Time 0.001 (0.013)	Loss 7.0713 (10.8552)	
Epoch: [0][500/2069]	Batch Time 0.072 (0.078)	Data Time 0.008 (0.01

---------
### <생각해 봅시다>

- Learing Rate을 달리해가며 초기 학습(< 4epoch)을 살펴봅시다. 어떤 경향을 가지고 있나요?
- Gradient를 Clip한다는 것은 어떤 의미를 가질까요?
------------