## 모델 저장과 로드  
여러분은 클라우드를 사용하면서 세션이 끊어지는 것을 한 번쯤은 경험해보셨을 것입니다.  
이때, 만약 학습한 가중치를 저장하지 않는다면 몇 시간을 학습한 것이 날아갈 것입니다.   
이번에는 학습 과정에서 모델을 저장하는 방법과, 학습 전에 모델을 불러오는 방법을 배우겠습니다.  

## Quiz (Easy)  
0) run_cnn2 파일을 만들어서 기존의 코드를 리팩터링 해봅시다.  
1) 앞에서 배웠던 argparser를 이용해 config_path, save_path, pre_trained 인자를 추가하세요  
2) 상위 폴더에 weights 폴더를 만드세요.   
3) save_path의 default 값은 './weights'이고 config_path의 default는 './configs' 입니다.  
4) pre_trained의 type은 bool이고 defaut 값은 False 입니다.  
 

In [4]:
%%writefile go.py

import yaml
import os
import argparse

parser = argparse.ArgumentParser(description = 'quiz')
parser.add_argument('-c','--config_path', type=str, default='configs/',help='config path')
parser.add_argument('--save_path', type=str, default='weights/', help='save path')
parser.add_argument('--pretrain', type=bool, default=False, help='pretrain or not')
parser.add_argument('--model_name', type=str, default="CNN", help='model name')

## 1) 구현
# 1. args 출력하기
# 2. args 들 중 config_path를 통해 yaml 파일을 config변수에 할당.
# 3. config출력하기
args = parser.parse_args()

print(args)



with open(args.config_path) as f:
    config = yaml.load(f,Loader=yaml.FullLoader)

print(config)

Overwriting go.py


In [5]:
!python3 go.py -c 'configs/cnn.yaml'

Namespace(config_path='configs/cnn.yaml', model_name='CNN', pretrain=False, save_path='weights/')
{'batch_size': 16, 'learning_rate': 0.001, 'epochs': 1, 'kernel_size': 2, 'stride': 2}


## Quiz (Easy)  
모델을 로드하고 저장하는 부분을 구현하기 위해 train, test 코드를 수정해야 합니다.  
아래에서 어떤 부분에 추가해야할까요??  

In [None]:
def train(epoch, model, loss_func, train_loader, optimizer):
    model.train()
    for batch_index, (x, y) in enumerate(train_loader):
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        y_pred = model(x)
        loss = loss_func(y_pred, y)
        loss.backward()
        optimizer.step()
        if batch_index % 100 == 0:
            print(f'Train Epoch: {epoch+1} | Batch Status: {batch_index*len(x)}/{len(train_loader.dataset)} \
            ({100. * batch_index * batch_size / len(train_loader.dataset):.0f}% | Loss: {loss.item():.6f}')
            
def test(model, loss_func, test_loader):
    model.eval()
    test_loss = 0
    correct_count = 0
    for x, y in test_loader:
        x, y = x.to(device), y.to(device)
        y_pred = model(x)
        test_loss += loss_func(y_pred, y).item()
        pred = y_pred.data.max(1, keepdim=True)[1]
        # torch.eq : Computes element-wise equality. return counts value
        correct_count += pred.eq(y.data.view_as(pred)).cpu().sum()
    
    test_loss /= len(test_loader.dataset)
    print(f'=======================\n Test set: Average loss: {test_loss:.4f}, Accuracy: {correct_count/len(test_loader.dataset):.3}')

## Save, Load  
모델의 저장과 로드는 torch.load_state_dict(), torch.load(), torch.save()를 활용합니다.  

In [41]:
from models.cnn import CNN
cnn = CNN(C=1,W=28,H=28,K=3,S=2)

# state_dict는 모델의 모든 가중치를 반환한다.

print(cnn.state_dict())

13
6
2
OrderedDict([('conv1.weight', tensor([[[[-0.0959,  0.0448, -0.0365],
          [ 0.1765, -0.1323,  0.2271],
          [ 0.1100, -0.1851, -0.2927]]],


        [[[ 0.0091, -0.2706, -0.0501],
          [ 0.2494, -0.1672,  0.1997],
          [ 0.0156, -0.0465, -0.2946]]],


        [[[ 0.3318, -0.1790, -0.3124],
          [ 0.2739, -0.2633,  0.2141],
          [ 0.0663, -0.1285, -0.1256]]],


        [[[ 0.0034, -0.2547,  0.1879],
          [-0.0933,  0.0309,  0.2450],
          [-0.1583, -0.2097,  0.2522]]],


        [[[-0.2095,  0.2651, -0.3029],
          [ 0.2819, -0.1838, -0.2278],
          [-0.0594, -0.0848, -0.1089]]],


        [[[-0.2096, -0.3184, -0.2092],
          [-0.0919,  0.3043, -0.2828],
          [ 0.3160,  0.3102, -0.0383]]],


        [[[ 0.3063, -0.1473, -0.0669],
          [-0.1923,  0.2627, -0.0993],
          [ 0.2861, -0.0781, -0.1297]]],


        [[[-0.3105, -0.2577,  0.0644],
          [ 0.2340, -0.0420,  0.1735],
          [ 0.0144,  0.1251,  0.2789]]

In [50]:
%%writefile go.py

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np


from models.cnn import CNN
from dataset.MNIST_LOADER import make_loader
import argparse
import yaml
from torch.utils.tensorboard import SummaryWriter



parser = argparse.ArgumentParser(description = 'quiz')
parser.add_argument('-c','--config_path', type=str, default='configs/',help='config path')
parser.add_argument('--save_path', type=str, default='weights/', help='save path')
parser.add_argument('--pretrain', type=bool, default=False, help='pretrain or not')
parser.add_argument('--model_name', type=str, default="CNN", help='model name')

## 1) 구현
# 1. args 출력하기
# 2. args 들 중 config_path를 통해 yaml 파일을 config변수에 할당.
# 3. config출력하기

args = parser.parse_args()

train_loader, vaild_loader, test_loader, shape = make_loader(16)
C = shape[0]
W = shape[1]
H = shape[2]


device = torch.device('cuda') if torch.cuda.is_available() else torch.device("cpu")
# device = torch.device('cpu')

cnn = CNN(C=C, W=W, H=H, K=3, S=2) 
cnn = cnn.to(device)
ce_loss = nn.CrossEntropyLoss()

# with 구문으로 파일을 불러옵니다.

with open(args.config_path) as f:
    config = yaml.load(f,Loader=yaml.FullLoader)
    print(type(config))

# Hyperparameters
batch_size = config['batch_size']
learning_rate = config['learning_rate']
epochs = config['epochs']
kernel_size = config['kernel_size']
stride = config['stride']


with open(args.config_path) as f:
    config = yaml.load(f,Loader=yaml.FullLoader)

print(config)

pre_trained = args.pretrain
save_path = args.save_path
model_name = args.model_name
model = cnn

if pre_trained:
    model_dict = torch.load(save_path+model_name)
    model.load_state_dict(model_dict)

writer = SummaryWriter('runs/cnn/')


def train(epoch, model, loss_func, train_loader, optimizer):
    model.train()
    for batch_index, (x, y) in enumerate(train_loader):
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        y_pred = model(x)
        loss = loss_func(y_pred, y)
        loss.backward()
        optimizer.step()
        writer.add_scalar("train/loss", loss, epoch*batch_size + batch_index)
        if batch_index % 100 == 0:
            print(f'Train Epoch: {epoch+1} | Batch Status: {batch_index*len(x)}/{len(train_loader.dataset)} \
            ({100. * batch_index * batch_size / len(train_loader.dataset):.0f}% | Loss: {loss.item():.6f}')
            torch.save(model.state_dict(), save_path + model_name+str(epoch))
            

def test(model, loss_func, test_loader):
    model.eval()
    test_loss = 0
    correct_count = 0
    for x, y in test_loader:
        x, y = x.to(device), y.to(device)
        y_pred = model(x)
        test_loss += loss_func(y_pred, y).item()
        pred = y_pred.data.max(1, keepdim=True)[1]
        # torch.eq : Computes element-wise equality. return counts value
        correct_count += pred.eq(y.data.view_as(pred)).cpu().sum()
    
    test_loss /= len(test_loader.dataset)
    print(f'=======================\n Test set: Average loss: {test_loss:.4f}, Accuracy: {correct_count/len(test_loader.dataset):.3}')


optimizer = optim.Adam(cnn.parameters(), lr=learning_rate)

for epoch in range(epochs):
    train(epoch, cnn, ce_loss, train_loader, optimizer)


test(cnn, ce_loss, test_loader)

Overwriting go.py


In [51]:
!python3 go.py -c 'configs/cnn.yaml'

channel: 1, width: 28, height: 28
13
6
2
<class 'dict'>
{'batch_size': 16, 'learning_rate': 0.001, 'epochs': 1, 'kernel_size': 2, 'stride': 2}
2021-11-19 17:23:36.801076: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-19 17:23:36.801884: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
  return F.log_softmax(x)
Train Epoch: 1 | Batch Status: 0/60000             (0% | Loss: 2.404390
Train Epoch: 1 | Batch Status: 1600/60000             (3% | Loss: 1.469152
Train Epoch: 1 | Batch Status: 3200/60000             (5% | Loss: 1.192706
Train Epoch: 1 | Batch Status: 4800/60000             (8% | Loss: 1.292445
Train Epoch: 1 | Batch Status: 6400/60000             (11% | Loss: 1.463641
Train Epoch: 1 | Batch Status: 8000/60000             (13% | Loss: 1.2

In [44]:
# 모델 불러오기
# torch.Load()
import torch
import os

save_path = 'weights/'
model_name = 'CNN'
state_dict = torch.load(save_path+model_name)



In [43]:
cnn.load_state_dict(state_dict)
# cnn.conv1.weight.shape

<All keys matched successfully>

## 모델 불러오기

## Tensorboard  
tensorboard는 모델학습 과정의 loss나 기타 지표를 확인해서 학습이 잘되고 있는지, 모델 테스트 성능이  
어떻게 나오는지를 시각화해줍니다.   

In [30]:
!pip install tensorboard



In [31]:
from torch.utils.tensorboard import SummaryWriter

In [None]:
# writer 를 정의
writer = SummaryWriter('runs/cnn/')

# writer.ad.scalar를 통해서 손실함수 값, 또는 정확도를 기록할 수 있습니다.

writer.add_scalar("그룹/변수명", 변수, iter)
# ex: 그룹 = train or valid 변수명 : Loss or acc


writer.close()

먼저 runs 폴더를 만들고 그 안에 cnn 폴더를 만들어주세요.  

## Quiz (Normal)  
add_scalar는 train, test함수에서 어느 줄에 삽입해야 할까요?  

In [None]:
if pre_trained:
    model_dict = torch.load(save_path+model_name)
    model.load_state_dict(model_dict)

def train(epoch, model, loss_func, train_loader, valid_loader, optimizer):
    model.train()
    for batch_index, (x, y) in enumerate(train_loader):
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        y_pred = model(x)
        train_loss = loss_func(y_pred, y)
        train_loss.backward()
        optimizer.step()
        if batch_index % 100 == 0:
            print(f'Train Epoch: {epoch+1} | Batch Status: {batch_index*len(x)}/{len(train_loader.dataset)} \
            ({100. * batch_index * batch_size / len(train_loader.dataset):.0f}% | Loss: {train_loss.item():.6f}')
            torch.save(model.state_dict(), save_path + model_name)

    for batch_index, (x, y) in enumerate(valid_loader):
        x, y = x.to(device), y.to(device)
        y_pred = model(x)
        val_loss = loss_func(y_pred, y)
        
def test(model, loss_func, test_loader):
    model.eval()
    test_loss = 0
    correct_count = 0
    for x, y in test_loader:
        x, y = x.to(device), y.to(device)
        y_pred = model(x)
        test_loss += loss_func(y_pred, y).item()
        pred = y_pred.data.max(1, keepdim=True)[1]
        # torch.eq : Computes element-wise equality. return counts value
        correct_count += pred.eq(y.data.view_as(pred)).cpu().sum()
    
    test_loss /= len(test_loader.dataset)
    print(f'=======================\n Test set: Average loss: {test_loss:.4f}, Accuracy: {correct_count/len(test_loader.dataset):.3}')

In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

from torch.utils.data import DataLoader, Dataset 
from torchvision import datasets, transforms

# Hyperparameters
batch_size = 32
learning_rate = 0.001
epochs = 5
kernel_size = 3
stride = 2
pre_trained = False

In [8]:
save_path = './weights/'
config_path = './configs/'
model_name = 'cnn.pth'

In [9]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
cnn = CNN(C=1, W=28, H=28, K=3, S=2) 
cnn = cnn.to(device)
ce_loss = nn.CrossEntropyLoss()
optimizer = optim.Adam(cnn.parameters(), lr=0.001)
writer = SummaryWriter('runs/cnn/')

13
6
2


In [54]:
!tensorboard --logdir "runs/cnn"


2021-11-19 17:32:38.503560: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-19 17:32:38.503638: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-11-19 17:32:42.035206: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-11-19 17:32:42.035384: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2021-11-19 17:32:42.035465: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (Gift-of-Mia): /proc/driver/nvidia/version does not exist

NOTE: Using experimental fast data loading logic. To disable, pa