## 모델 설명 
: 이미지 분류를 위해 Vision Transformer 모델을 사용하여 모델을 설계하였습니다. 

: 또한 데이터셋의 크기가 작기 때문에 과적합 방지를 위해 노력하였습니다.

## 모델 사용 이유
: 해당 모델은 이미지 분류에서 SOTA를 달성함으로써 성능을 입증하여 사용하였으며, 최근 SOTA를 달성한 NFNet 및 Resnet, EfficientNet등을 사용하여 비교 실험한 결과 Vision Transformer 모델이 가장 우수한 성능을 보였습니다.

## 과적합 방지를 위한 기법
#### 1. K-Fold(5) 교차 검증
    : 주어진 데이터셋의 크기가 작기 때문에, 테스트셋에 대한 성능 평가의 신뢰도가 떨어지는 것을 해결하기 위해 사용하였습니다.
#### 2. Data Augmentation
    : 테스트셋의 데이터가 뒤집히거나 90도로 회전한 사진이 포함되었으며, 데이터셋의 크기가 작기 때문에 오버피팅 방지를 위해 사용하였습니다.
#### 3. 앙상블(Cross Validation 활용)
    : 5개의 폴드로 나눈 후 5개의 모델을 학습시킨 다음, 테스트셋을 모델에 각각 통과시켜 5개의 output에 대한 평균값을 추출하여 최종 이미지 분류를 수행하였습니다.
#### 4. Test Time Augmentation(TTA)
    : 테스트셋에 대해 Augmentation을 수행하여, 1개의 테스트 이미지를 10개로 Augmentation한 뒤 평균을 구하여 이미지 분류를 수행하였습니다.

In [1]:
import torch
from torch.optim import Adam
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
import os
import ttach as tta
import pandas as pd
from sklearn.model_selection import StratifiedKFold
import cv2
import numpy as np
from albumentations import Compose, RandomRotate90, Resize, Normalize, HorizontalFlip, VerticalFlip, RandomCrop
from albumentations.pytorch import ToTensorV2
from glob import glob
import timm
import torchsummary

In [2]:
class CFG:
    model_name = 'vit_base_patch16_224'
    dataset_path = './inputs/train/train/*'
    test_dataset_path = './inputs/test/test/0/*'
    model_save_path = './output2/' + model_name + '/'
    saved_model_path = './output2/vit_base_patch16_224/*'
    device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
    print_freq = 20
    num_workers = 0
    img_size = 224
    epochs = 10
    lr = 1e-4
    batch_size = 110
    weight_decay = 0
    max_grad_norm = 5
    dropout = 0.5
    seed = 42
    n_fold = 5
    trn_fold = [0, 1, 2, 3, 4]
    train = True
    hidden_node = 512
    output_node = 7

In [3]:
class DevModel(nn.Module):
    def __init__(self, CFG=None, pretrained=False):
        super(DevModel, self).__init__()
        self.cnn = timm.create_model(CFG.model_name, pretrained=pretrained)
        self.classification = nn.Linear(768, CFG.output_node)

        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        self.softmax = nn.Softmax()
        print("Load Model BackBorn")

    def forward(self, x):
        features = self.cnn.forward_features(x)
        features = self.classification(features)
        features = self.softmax(features)

        return features

    def predict(self, x):
        x = self.forward(x)
        x = torch.max(x, dim=1)[1]
        return x

In [4]:
def torch_seed(seed=42):
    torch.cuda.manual_seed(seed)
    torch.manual_seed(seed)


def load_dataset(file_path):
    x = []
    y = []
    paths = glob(file_path)
    print(paths)
    
    for e, path in enumerate(paths):
        file = glob(path + '\\*')
        for f in file:
            x.append(f)
            y.append(e)

    x = np.array(x)
    y = np.array(y)
    print("Load DataSet")

    return x, y


In [5]:
def k_fold(x_data, y_data, n_fold, seed):
    train_dataset = []
    valid_dataset = []

    skf = StratifiedKFold(n_splits=n_fold, shuffle=True, random_state=seed)
    for train_index, valid_index in skf.split(x_data, y_data):
        # print(train_index, valid_index)
        x_train, x_valid = x_data[train_index], x_data[valid_index]
        y_train, y_valid = y_data[train_index], y_data[valid_index]
        train_dataset.append((x_train, y_train))
        valid_dataset.append((x_valid, y_valid))
        # print(x_train, x_valid)
        # print(y_train, y_valid)
        # print()
    print("K Fold")
    return train_dataset, valid_dataset

In [6]:
class TrainDataset(Dataset):
    def __init__(self, file_path, labels, transform=None):
        super(TrainDataset, self).__init__()
        self.file_path = file_path
        self.labels = [np.int32(i) for i in labels]
        self.transform = transform

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, item):
        image = cv2.imread(self.file_path[item])
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32)

        if self.transform:
            augmented = self.transform(image=image)
            image = augmented['image']

        return image, self.labels[item]


class ValidDataset(Dataset):
    def __init__(self, file_path, labels, transform=None):
        super(ValidDataset, self).__init__()
        self.file_path = file_path
        self.labels = [np.int32(i) for i in labels]
        self.transform = transform

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, item):
        image = cv2.imread(self.file_path[item])
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32)

        if self.transform:
            augmented = self.transform(image=image)
            image = augmented['image']

        return image, self.labels[item]


class TestDataset(Dataset):
    def __init__(self, file_path, transform=None):
        super(TestDataset, self).__init__()
        self.file_path = file_path
        self.transform = transform

    def __len__(self):
        return len(self.file_path)

    def __getitem__(self, item):
        image = cv2.imread(self.file_path[item])
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32)

        if self.transform:
            augmented = self.transform(image=image)
            image = augmented['image']

        return image

In [7]:
def get_transform(*, data, img_size):
    if data == 'train':
        return Compose([
            Resize(260, 260),
            RandomRotate90(p=.5),
            RandomCrop(img_size, img_size),
            HorizontalFlip(p=.5),
            VerticalFlip(p=.5),
            Normalize(
                # mean=[0.485, 0.456, 0.406],
                # std=[0.229, 0.224, 0.225]
                mean=[0.5, 0.5, 0.5],
                std=[0.5, 0.5, 0.5]
            ),
            ToTensorV2()  # numpy -> torch tensor
        ])

    elif data == 'valid' or data == 'test':
        return Compose([
            # RandomRotate90(p=.5),
            Resize(img_size, img_size),
            Normalize(
                # mean=[0.485, 0.456, 0.406],
                # std=[0.229, 0.224, 0.225]
                mean=[0.5, 0.5, 0.5],
                std=[0.5, 0.5, 0.5]
            ),
            ToTensorV2()
        ])
    

def calc_accuracy(X, Y):
    """accuracy 계산"""
    max_vals, max_indices = torch.max(X, 1)
    train_acc = (max_indices == Y).sum().data.cpu().numpy() / max_indices.size()[0]
    return train_acc

In [8]:
def train(train_dataLoader, dev_model, optimizer, loss_func, device, epoch, CFG):
    dev_model.train()
    train_loss = 0.0
    train_acc = 0.0
    
    for batch, (inputs, labels) in enumerate(train_dataLoader):
        inputs = inputs.to(device)
        labels = labels.long().to(device)
        output = dev_model.forward(inputs)

        loss = loss_func(output, labels)
        # loss = loss_func(output, torch.max(labels, 1)[1])

        train_loss += loss.item()
        train_acc += calc_accuracy(output, labels)

        loss.backward()

        grad = torch.nn.utils.clip_grad_norm_(dev_model.parameters(), CFG.max_grad_norm)

        optimizer.step()
        optimizer.zero_grad()

        if batch % CFG.print_freq == 1:
            print('| Epoch : {:3d} | batch : {:3d} | loss : {:.4f} | acc : {:.4f} |'.format(
                epoch + 1, batch, (train_loss / (batch + 1)), (train_acc / (batch + 1))
            ))

    return dev_model, train_loss / len(train_dataLoader), train_acc / len(train_dataLoader)


def valid(valid_dataLoader, dev_model, loss_func, device, epoch, CFG):
    dev_model.eval()
    valid_loss = 0.0
    valid_acc = 0.0

    with torch.no_grad():
        for batch, (inputs, labels) in enumerate(valid_dataLoader):
            inputs = inputs.to(device)
            labels = labels.long().to(device)
            output = dev_model.forward(inputs)

            loss = loss_func(output, labels)
            # loss = loss_func(output, torch.max(labels, 1)[1])

            valid_loss += loss.item()
            valid_acc += calc_accuracy(output, labels)

            if batch % CFG.print_freq == 1:
                print('| Epoch : {:3d} | batch : {:3d} | loss : {:.4f} | acc : {:.4f} |'.format(
                    epoch + 1, batch, (valid_loss / (batch + 1)), (valid_acc / (batch + 1))
                ))

    return dev_model, valid_loss / len(valid_dataLoader), valid_acc / len(valid_dataLoader)


In [9]:
def main_loop(CFG):
    torch_seed(CFG.seed)

    if not os.path.exists(CFG.model_save_path):
        os.mkdir(CFG.model_save_path)

    # Load Dataset
    x, y = load_dataset(CFG.dataset_path)

    # K Fold
    train_fold, valid_fold = k_fold(x, y, CFG.n_fold, CFG.seed)

    # train, valid Fold
    for fold_num, (trainSet, validSet) in enumerate(zip(train_fold, valid_fold)):
        if fold_num == len(CFG.trn_fold):
            break

        print("=" * 80)
        # create model
        dev_model = DevModel(CFG, pretrained=True).to(CFG.device)

        x_train, y_train = trainSet[0], trainSet[1]
        x_valid, y_valid = validSet[0], validSet[1]
        print(f"train.shape : {x_train.shape}, valid.shape : {x_valid.shape}\n")
        print(f"total batch : {len(x_train) / CFG.batch_size}")
        print(f"fold_num : {fold_num + 1} / {CFG.n_fold}")

        train_dataset = TrainDataset(x_train, y_train, transform=get_transform(data='train', img_size=CFG.img_size))
        valid_dataset = ValidDataset(x_valid, y_valid, transform=get_transform(data='valid', img_size=CFG.img_size))
        train_dataLoader = DataLoader(train_dataset, batch_size=CFG.batch_size, shuffle=True,
                                      num_workers=CFG.num_workers)
        valid_dataLoader = DataLoader(valid_dataset, batch_size=CFG.batch_size, shuffle=False,
                                      num_workers=CFG.num_workers)

        optimizer = Adam(dev_model.parameters(), lr=CFG.lr, weight_decay=CFG.weight_decay)
        # scheduler =
        loss_func = nn.CrossEntropyLoss()

        for epoch in range(CFG.epochs):
            print(f'epoch : {epoch + 1} / {CFG.epochs}')

            print(f'{"=" * 30} Train {"=" * 30}')
            dev_model, avg_train_loss, avg_train_acc = train(train_dataLoader, dev_model, optimizer, loss_func,
                                                             CFG.device, epoch, CFG)
            print("{} Epoch train loss = {:.4f} acc = {:.4f}\n".format((epoch + 1), avg_train_loss, avg_train_acc))

            print(f'{"=" * 30} Valid {"=" * 30}')
            dev_model, avg_valid_loss, avg_valid_acc = valid(valid_dataLoader, dev_model, loss_func, CFG.device, epoch,
                                                             CFG)
            print("{} Epoch valid loss = {:.4f} acc = {:.4f}".format((epoch + 1), avg_valid_loss, avg_valid_acc))
            print()

        torch.save(dev_model, CFG.model_save_path + f'{CFG.model_name}_fold{fold_num + 1}_model.pth')
        print(f"Save Model fold{fold_num + 1}\n")

In [10]:
main_loop(CFG)

['./inputs/train/train\\dog', './inputs/train/train\\elephant', './inputs/train/train\\giraffe', './inputs/train/train\\guitar', './inputs/train/train\\horse', './inputs/train/train\\house', './inputs/train/train\\person']
Load DataSet
K Fold
Load Model BackBorn
train.shape : (1358,), valid.shape : (340,)

total batch : 12.345454545454546
fold_num : 1 / 5
epoch : 1 / 10


  features = self.softmax(features)


| Epoch :   1 | batch :   1 | loss : 1.9153 | acc : 0.2682 |
1 Epoch train loss = 1.6603 acc = 0.5840

| Epoch :   1 | batch :   1 | loss : 1.4910 | acc : 0.6955 |
1 Epoch valid loss = 1.3484 acc = 0.8455

epoch : 2 / 10
| Epoch :   2 | batch :   1 | loss : 1.3910 | acc : 0.8045 |
2 Epoch train loss = 1.3474 acc = 0.8526

| Epoch :   2 | batch :   1 | loss : 1.2526 | acc : 0.9455 |
2 Epoch valid loss = 1.2317 acc = 0.9659

epoch : 3 / 10
| Epoch :   3 | batch :   1 | loss : 1.2363 | acc : 0.9545 |
3 Epoch train loss = 1.2431 acc = 0.9367

| Epoch :   3 | batch :   1 | loss : 1.2281 | acc : 0.9455 |
3 Epoch valid loss = 1.2165 acc = 0.9636

epoch : 4 / 10
| Epoch :   4 | batch :   1 | loss : 1.2037 | acc : 0.9773 |
4 Epoch train loss = 1.2173 acc = 0.9527

| Epoch :   4 | batch :   1 | loss : 1.2012 | acc : 0.9727 |
4 Epoch valid loss = 1.2309 acc = 0.9477

epoch : 5 / 10
| Epoch :   5 | batch :   1 | loss : 1.2111 | acc : 0.9545 |
5 Epoch train loss = 1.2081 acc = 0.9623

| Epoch :   5

| Epoch :   3 | batch :   1 | loss : 1.2785 | acc : 0.9273 |
3 Epoch train loss = 1.2530 acc = 0.9365

| Epoch :   3 | batch :   1 | loss : 1.2180 | acc : 0.9636 |
3 Epoch valid loss = 1.2555 acc = 0.9182

epoch : 4 / 10
| Epoch :   4 | batch :   1 | loss : 1.2336 | acc : 0.9409 |
4 Epoch train loss = 1.2253 acc = 0.9532

| Epoch :   4 | batch :   1 | loss : 1.2558 | acc : 0.9091 |
4 Epoch valid loss = 1.2689 acc = 0.9136

epoch : 5 / 10
| Epoch :   5 | batch :   1 | loss : 1.2126 | acc : 0.9591 |
5 Epoch train loss = 1.2098 acc = 0.9601

| Epoch :   5 | batch :   1 | loss : 1.2272 | acc : 0.9455 |
5 Epoch valid loss = 1.2040 acc = 0.9682

epoch : 6 / 10
| Epoch :   6 | batch :   1 | loss : 1.1950 | acc : 0.9773 |
6 Epoch train loss = 1.1987 acc = 0.9692

| Epoch :   6 | batch :   1 | loss : 1.2179 | acc : 0.9591 |
6 Epoch valid loss = 1.1957 acc = 0.9773

epoch : 7 / 10
| Epoch :   7 | batch :   1 | loss : 1.1815 | acc : 0.9818 |
7 Epoch train loss = 1.1959 acc = 0.9722

| Epoch :   7

| Epoch :   5 | batch :   1 | loss : 1.1863 | acc : 0.9773 |
5 Epoch train loss = 1.1940 acc = 0.9748

| Epoch :   5 | batch :   1 | loss : 1.2160 | acc : 0.9591 |
5 Epoch valid loss = 1.2030 acc = 0.9773

epoch : 6 / 10
| Epoch :   6 | batch :   1 | loss : 1.1784 | acc : 0.9864 |
6 Epoch train loss = 1.1922 acc = 0.9738

| Epoch :   6 | batch :   1 | loss : 1.2159 | acc : 0.9591 |
6 Epoch valid loss = 1.2006 acc = 0.9773

epoch : 7 / 10
| Epoch :   7 | batch :   1 | loss : 1.1877 | acc : 0.9818 |
7 Epoch train loss = 1.1833 acc = 0.9839

| Epoch :   7 | batch :   1 | loss : 1.2223 | acc : 0.9409 |
7 Epoch valid loss = 1.2157 acc = 0.9404

epoch : 8 / 10
| Epoch :   8 | batch :   1 | loss : 1.1932 | acc : 0.9727 |
8 Epoch train loss = 1.1897 acc = 0.9777

| Epoch :   8 | batch :   1 | loss : 1.2265 | acc : 0.9455 |
8 Epoch valid loss = 1.2238 acc = 0.9404

epoch : 9 / 10
| Epoch :   9 | batch :   1 | loss : 1.1905 | acc : 0.9818 |
9 Epoch train loss = 1.1858 acc = 0.9832

| Epoch :   9

In [19]:
def load_test_dataset(file_path):
    x = np.array([path for path in glob(file_path)])
    return x

In [20]:
def tta_test(test_dataLoader, dev_model, device, CFG):
    dev_model.eval()
    predictions = []

    with torch.no_grad():
        for batch, inputs in enumerate(test_dataLoader):
            inputs = inputs.to(device)

            prediction = dev_model.forward(inputs)
            predictions.extend(prediction.data.cpu().numpy())

            if batch % CFG.print_freq == 1:
                print('| Test batch : {:3d} |'.format(batch))

    return np.array(predictions)

In [23]:
def tta_test_loop(CFG):
    # Load Models
    model_list = []
    models_path = glob(CFG.saved_model_path)
    print("model paths")
    for i in models_path:
        print(i)

    for path in models_path:
        dev_model = torch.load(path)
        dev_model = tta.ClassificationTTAWrapper(dev_model.to(CFG.device),
                                                 tta.aliases.ten_crop_transform(CFG.img_size, CFG.img_size),
                                                 merge_mode='mean')
        model_list.append(dev_model)
    print("Load Models")

    # Load Test Dataset
    x_test = load_test_dataset(CFG.test_dataset_path)
    test_dataset = TestDataset(x_test, transform=get_transform(data='test', img_size=CFG.img_size))
    test_dataLoader = DataLoader(test_dataset, batch_size=CFG.batch_size, shuffle=False, num_workers=CFG.num_workers)

    print(f"test.shape : {x_test.shape}\n")
    print(f"total batch : {len(x_test) / CFG.batch_size}")

    predictions = []
    for model_num, model in enumerate(model_list):
        print(f'Model {model_num + 1}')
        prediction = tta_test(test_dataLoader, model, CFG.device, CFG)
        print(prediction.shape)
        predictions.append(prediction)

    init_shape = predictions[0].shape
    pred = np.zeros(init_shape)

    for prediction in predictions:
        pred += prediction
    pred /= len(predictions)
    print(pred)

    answer = []
    for i in pred:
        answer.append(np.argmax(i))
    print(answer)

    submission = pd.read_csv('./inputs/test_answer_sample_.csv')
    submission['value'] = answer
    submission.to_csv('./output2/submission.csv', index=False)
    print(submission)

In [24]:
tta_test_loop(CFG)

model paths
./output2/vit_base_patch16_224\vit_base_patch16_224_fold1_model.pth
./output2/vit_base_patch16_224\vit_base_patch16_224_fold2_model.pth
./output2/vit_base_patch16_224\vit_base_patch16_224_fold3_model.pth
./output2/vit_base_patch16_224\vit_base_patch16_224_fold4_model.pth
./output2/vit_base_patch16_224\vit_base_patch16_224_fold5_model.pth
Load Models
test.shape : (350,)

total batch : 3.1818181818181817
Model 1


  features = self.softmax(features)


| Test batch :   1 |
(350, 7)
Model 2
| Test batch :   1 |
(350, 7)
Model 3
| Test batch :   1 |
(350, 7)
Model 4
| Test batch :   1 |
(350, 7)
Model 5
| Test batch :   1 |
(350, 7)
[[1.56645239e-02 2.04436310e-04 8.05255120e-01 ... 4.49721672e-03
  5.37445530e-04 1.73396765e-01]
 [1.49383458e-04 1.31620850e-04 1.06106001e-04 ... 1.23183691e-04
  1.02047197e-04 1.20681246e-04]
 [3.77318231e-02 1.19618943e-02 4.68780664e-04 ... 2.22932029e-03
  7.31876500e-04 3.56435147e-04]
 ...
 [1.06883264e-04 1.91615612e-04 6.11903984e-04 ... 1.18428718e-04
  4.72767513e-04 1.04982245e-04]
 [2.90978854e-04 1.91580755e-04 1.58546942e-04 ... 3.10497727e-04
  9.96809685e-01 1.12790081e-03]
 [1.72657115e-04 9.99392903e-01 1.37287961e-04 ... 1.11551471e-04
  4.50471085e-05 6.22059194e-05]]
[2, 3, 3, 3, 3, 3, 4, 4, 3, 1, 6, 2, 0, 2, 3, 1, 2, 0, 6, 3, 3, 5, 2, 3, 0, 5, 1, 2, 0, 5, 1, 5, 6, 2, 0, 5, 5, 4, 2, 1, 4, 0, 2, 3, 1, 3, 0, 5, 5, 2, 6, 5, 4, 1, 5, 0, 4, 5, 1, 1, 5, 0, 6, 1, 1, 2, 4, 1, 1, 3, 0, 3, 0