# **차량 공유업체의 차량 파손 여부 분류하기**

## 0.미션

* 1) 미션1 : Data Preprocessing
    - **과제 수행 목표**
        - 본인의 구글 드라이브에 모델링 수행을 위해 적절한 폴더 및 파일로 **일관성 있게 정리**해야 합니다.
        - 제공된 데이터 : Car_Images.zip
            * Car_Images : 차량의 정상/파손 이미지 무작위 수집

* 2) 미션2 : CNN 모델링
    - **과제 수행 목표**
        - Tensorflow Keras를 이용하여 모델을 3개 이상 생성하세요.
            - 모델 구조와 파라미터는 자유롭게 구성하세요.
            - 단, 세부 목차에서 명시한 부분은 지켜주세요.

* 3) 미션3 : Data Argumentation & Transfer Learning
    - **과제 수행 목표**
        - 성능 개선을 위해 다음의 두가지를 시도하세요.
            * Data Augmentation을 적용하세요.(Image Generator)
            * Transfer Learning(VGG16)


## 1.환경설정 

### (1) 데이터셋 폴더 생성
- **세부요구사항**
    - C드라이브에 Datasets라는 폴더를 만드세요.
        - 구글드라이브를 사용하는경우 드라이브 첫 화면에 Datasets 라는 폴더를 만드세요. ('/content/drive/MyDrive/Datasets/')
    - 해당 폴더 안에 Car_Images.zip 파일을 넣으세요.

* 구글 Colab을 이용하는 경우

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### (2) 데이터셋 불러오기 
- **세부요구사항**
    - Car_Images.zip 파일을 C:/Datasets/ 경로에 압축 해제합니다.
    - zipfile 모듈을 이용하거나 다른 방식을 사용해도 됩니다.
        - 참고 자료 : [zipfile document](https://docs.python.org/3/library/zipfile.html#zipfile-objects)
    - 폴더구조(로컬)
        * C:/Datasets/ : 압축파일
        * C:/Datasets/Car_Images_train/ : 압축 해제한 이미지 저장소
    - 폴더구조(구글드라이브브)
        * /content/drive/MyDrive/Datasets/ : 압축파일
        * /content/drive/MyDrive/Datasets/Car_Images_train/ : 압축 해제한 이미지 저장소
    - 압축을 해제하면 다음과 같은 두 하위 폴더가 생성됩니다.
        * normal, abnormal : 각 폴더에는 이미지들이 있습니다.
        * 이후 단계에서 해당 경로로 부터 validation, test 셋을 추출하게 됩니다.
        

In [2]:
import zipfile

In [3]:
# 압축파일 경로
# 구글 드라이브인 경우 경로에 맞게 지정하세요.
# dataset_path  = '/content/drive/MyDrive/Datasets/'
dataset_path = '/content/drive/MyDrive/Datasets/'

file_path = dataset_path + 'Car_Images.zip'

In [None]:
# 압축 해제

data = zipfile.ZipFile(file_path)
data.extractall('/content/drive/MyDrive/my_data/Car_Images_train/')

### (3) 이미지 저장을 위한 폴더 생성
- **세부요구사항**
    - train, validation, test 을 위해 각각 하위 폴더 normal과 abnormal를 준비합니다.
        - train
            * 정상 이미지 저장소 : C:/Datasets/Car_Images_train/normal/ 
                * 구글드라이브 :   /content/drive/MyDrive/Datasets/Car_Images_train/normal/
            * 파손 이미지 저장소 : C:/Datasets/Car_Images_train/abnormal/
                * 구글드라이브 : /content/drive/MyDrive/Datasets/Car_Images_train/abnormal/
        - val, test 역시 동일한 구조로 생성합니다.
    - 직접 탐색기에서 폴더를 생성할 수도 있고, os 모듈을 이용하여 코드로 작성할 수도 있습니다.
        - 참고 자료 : [os document](https://docs.python.org/3/library/os.html)

In [4]:
# 각각 경로 지정
tr_n_path = '/content/drive/MyDrive/my_data/Car_Images_train/normal/'
tr_ab_path = '/content/drive/MyDrive/my_data/Car_Images_train/abnormal/'

val_path_n = '/content/drive/MyDrive/my_data/Car_Images_val/normal/'
val_path_an = '/content/drive/MyDrive/my_data/Car_Images_val/abnormal/'

test_path_n = '/content/drive/MyDrive/my_data/Car_Images_test/normal/'
test_path_an = '/content/drive/MyDrive/my_data/Car_Images_test/abnormal/'

In [None]:
# train 폴더는 압축을 해제하면서 이미 생성 되어 있습니다.

# test 폴더 만들기 os.mkdir()

# validation 폴더 만들기

import os

pass_list = [val_path_n, val_path_an, test_path_n, test_path_an]

for path in pass_list:
    os.makedirs(path, exist_ok=True)

## 2.데이터 전처리

### (1) 데이터 분할 : Training set | Validation set | Test set 생성
- **세부요구사항**
    - Training set, Validation set, Test set을 만듭니다.
        * size
            * test : 전체에서 20%를 추출합니다.
            * validation : test를 떼어낸 나머지에서 다시 20%를 추출합니다.
        * 데이터는 랜덤하게 추출해야 합니다.
            - random, shutil 모듈을 이용하여 랜덤하게 추출할 수 있습니다.
                - [random document](https://docs.python.org/3/library/random.html) | [shutil document](https://docs.python.org/3/library/shutil.html)
            * 해당 모듈 이외에 자신이 잘 알고 있는 방법을 사용해도 됩니다.
---

#### 1) test, validation 크기를 지정

In [5]:
import random, shutil

In [None]:
# 전체 이미지 갯수를 확인합니다.
len(os.listdir(tr_n_path)) , len(os.listdir(tr_ab_path))

(302, 303)

In [None]:
# test 사이즈 : 전체 이미지의 20%
te_data_num = [round(len(os.listdir(tr_n_path))*0.2), round(len(os.listdir(tr_ab_path))*0.2)]
print(te_data_num)

# validation 사이즈 : test를 제외한 나머지 중에서 20%
val_data_num = [ round((len(os.listdir(tr_n_path))-te_data_num[0])*0.2) , round((len(os.listdir(tr_n_path))-te_data_num[1])*0.2) ]
print(val_data_num)

# train 사이즈
train_data_num = [len(os.listdir(tr_n_path)) - te_data_num[0] - val_data_num[0],
                  len(os.listdir(tr_ab_path))- te_data_num[1] - val_data_num[1]]

[60, 61]
[48, 48]


#### 2) test 셋 추출

In [6]:
import shutil

In [None]:
print(len(os.listdir(tr_n_path)), len(os.listdir(tr_ab_path)))
print(len(os.listdir(test_path_n)), len(os.listdir(test_path_an)))

302 303
0 0


In [None]:
files = os.listdir(tr_n_path)
random.seed(2023)
random.shuffle(files)
print(files[0])

# new_path = test_path_n
print('test_n 옮김: ', te_data_num[0])

for file in files[:te_data_num[0]]:
    shutil.move(tr_n_path + file,test_path_n + file )
    # print('{} has been mobed to new folder!'.format(file))

DALLíñE 2023-03-11 14.32.58 - part of a car.png
test_n 옮김:  60


In [None]:
files = os.listdir(tr_ab_path)
random.seed(2023)
random.shuffle(files)
print(files[0])

print('test_ab 옮김: ', te_data_num[1])

for file in files[:te_data_num[1]]:
    shutil.move(tr_ab_path + file,test_path_an + file )
    # print('{} has been mobed to new folder!'.format(file))

DALLíñE 2023-03-11 15.08.05 - dents of a car.png
test_ab 옮김:  61


In [None]:
# 추출 후 이미지 갯수 확인

print(len(os.listdir(tr_n_path)), len(os.listdir(tr_ab_path)))
print(len(os.listdir(test_path_n)), len(os.listdir(test_path_an)))

242 242
60 61


#### 3) validation 셋 추출

In [None]:
files = os.listdir(tr_n_path)
random.seed(2023)
random.shuffle(files)
print(files[0])

print('val_n 옮김: ', val_data_num[0])

for file in files[:val_data_num[0]]:
    shutil.move(tr_n_path + file,val_path_n + file )
    # print('{} has been mobed to new folder!'.format(file))

DALLíñE 2023-03-10 23.55.59 - a part of car without blemish.png
test_n 옮김:  48


In [None]:
files = os.listdir(tr_ab_path)
random.seed(2023)
random.shuffle(files)
print(files[0])

# new_path = test_path_n
print('val_n 옮김: ', val_data_num[1])

for file in files[:val_data_num[1]]:
    shutil.move(tr_ab_path + file,val_path_an + file )
    # print('{} has been mobed to new folder!'.format(file))

DALLíñE 2023-03-11 01.30.43 - a little bit scratched car.png
test_n 옮김:  48


In [None]:
# 추출 후 이미지 갯수 확인

print(len(os.listdir(tr_n_path)), len(os.listdir(tr_ab_path)))
print(len(os.listdir(val_path_n)), len(os.listdir(val_path_an)))

194 194
48 48


### (2) 데이터 복사 및 이동
- **세부요구사항**
    - 분할된 데이터를 복사 이동합니다.
        - 새로운 폴더에 저장하는 데이터로 "3.모델링I"에서 사용합니다.
        - 기존 폴더는 "4.모델링II > (1) Data Augmentation"에서 사용합니다.
    - Training set | Validation set | Test set의 데이터를 **새로운 폴더**에 복사하세요.
        - 새로운 폴더 명
            * copy_images/trainset
            * copy_images/validset
            * copy_images/testset
        - 새로운 폴더에는 normal, abnormal 파일 모두를 복사합니다. 
            * 파일을 구분하기 위해 abnormal 파일들은 파일명 앞에 접두사 'ab_'를 붙입시다.
        - os, shutil 모듈을 활용하세요.

#### 1) abnormal 파일 복사

* 복사하기 : shutil.copytree()

In [None]:
copy_path = '/content/drive/MyDrive/my_data/copy_images/'

shutil.copytree(tr_ab_path, copy_path+'trainset')
shutil.copytree(test_path_an, copy_path+'testset')
shutil.copytree(val_path_an, copy_path+'validset')


print(len(os.listdir(copy_path+'trainset')))
print(len(os.listdir(copy_path+'validset')))
print(len(os.listdir(copy_path+'testset')))

194
61
48


* abnormal 이미지 이름의 접두어 "ab_" 붙이기 : os.rename

In [None]:
def changeName(path, cName):
    for filename in os.listdir(path):
        # print(path+filename, '=>', path+str(cName)+filename)
        os.rename(path+filename, path+str(cName)+filename)
 
changeName(copy_path+'trainset/','ab_')
changeName(copy_path+'validset/','ab_')
changeName(copy_path+'testset/','ab_')

print(os.listdir(copy_path+'trainset')[0])
print(os.listdir(copy_path+'validset')[0])
print(os.listdir(copy_path+'testset')[0])

ab_DALLíñE 2023-03-10 18.51.24 - scratched car.png
ab_DALLíñE 2023-03-10 18.51.26 - scratched car.png
ab_DALLíñE 2023-03-10 18.51.32 - scratched car.png


#### 2) normal 파일 복사

In [None]:
def copy_file(path, status):
    copy_path = '/content/drive/MyDrive/my_data/copy_images/'
    files = os.listdir(path)

    for file in files:
        shutil.copy(path + file, copy_path + status + file )

copy_file(tr_n_path, 'trainset/')
copy_file(test_path_n, 'testset/')
copy_file(val_path_n, 'validset/')

print(len(os.listdir(copy_path+'trainset')))
print(len(os.listdir(copy_path+'validset')))
print(len(os.listdir(copy_path+'testset')))

print(os.listdir(copy_path+'trainset')[-1])
print(os.listdir(copy_path+'validset')[-1])
print(os.listdir(copy_path+'testset')[-1])

388
121
96
DALLíñE 2023-03-11 17.09.48 - a part of a car.png
DALLíñE 2023-03-11 14.45.14 - photo of part of a car.png
DALLíñE 2023-03-11 14.41.37 - photo of part of a car.png


* 데이터 갯수 조회

In [None]:
print(len(os.listdir(dataset_path+'copy_images/trainset/')))
print(len(os.listdir(dataset_path+'copy_images/validset/')))
print(len(os.listdir(dataset_path+'copy_images/testset/')))

388
121
96


## 3.모델링 I
* **세부요구사항**
    * 모델링을 위한 데이터 구조 만들기
        * x : 이미지를 array로 변환합니다.
        * y : 이미지 갯수만큼 normal - 0, abnormal - 1 로 array를 만듭니다.
    * 모델을 최소 3개 이상 만들고 성능을 비교합니다.
        * 모델 학습 과정에 알맞은 보조 지표를 사용하세요.
        * 전처리 과정에서 생성한 Validation set을 적절하게 사용하세요.
        * Early Stopping을 반드시 사용하세요.
            * 최적의 가중치를 모델에 적용하세요.

In [11]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### (1) X : image to array
- **세부요구사항**
    * 모델링을 위해서는 np.array 형태로 데이터셋을 만들어야 합니다.
    * Training set / Validation set / Test set의 X는 이미지 형태로 되어있습니다. 
    * 이미지 파일을 불러와 train, valid, test 각각 array 형태로 변환해 봅시다.
        * 각 폴더로 부터 이미지 목록을 만들고
        * 이미지 한장씩 적절한 크기로 로딩하여 (keras.utils.load_img)
            * 이미지가 너무 크면 학습시간이 많이 걸리고, 메모리 부족현상이 발생될 수 있습니다.
            * 이미지 크기를 280 * 280 * 3 이내의 크기를 설정하여 로딩하시오.
            * array로 변환 (keras.utils.img_to_array, np.expand_dims)
        * 데이터셋에 추가합니다.(데이터셋도 array)

#### 1) 이미지 목록 만들기
* train, validation, test 폴더로 부터 이미지 목록을 생성합니다.

In [8]:
import os

In [10]:
# 압축파일 경로
# 구글 드라이브인 경우 경로에 맞게 지정하세요.
dataset_path  = '/content/drive/MyDrive/my_data/'
# dataset_path = 'C:/Datasets/'

In [None]:
# 이미지 목록 저장
img_train_list = os.listdir(dataset_path+'copy_images/trainset/')
img_valid_list = os.listdir(dataset_path+'copy_images/validset/')
img_test_list = os.listdir(dataset_path+'copy_images/testset/')

(os.listdir(tr_n_path)[0])

'DALLíñE 2023-03-10 18.50.11 - photo of a part of car.png'

In [9]:
# 메모리, 처리시간을 위해서 이미지 크기 조정
img_size = 280 ## 사이즈 조정 가능
# img_size = 224

#### 2) 이미지들을 배열 데이터셋으로 만들기

In [None]:
from keras.utils import load_img, img_to_array


def to_array(img_path):
    x = []
    files = os.listdir(img_path)
    for file in files:
        img = load_img(img_path + file, target_size=(img_size, img_size))
        # print(type(img))
        img_tensor = img_to_array(img)
        # print(type(img_tensor))
        # print(img)
        x.append(img_tensor)

    x_np = np.array(x)
    
    return x_np


In [None]:
train_x = to_array(dataset_path+'copy_images/trainset/')
print(train_x.shape)

(388, 280, 280, 3)


In [None]:
val_x = to_array(dataset_path+'copy_images/validset/')
print(val_x.shape)

(96, 280, 280, 3)


In [None]:
test_x = to_array(dataset_path+'copy_images/testset/')
print(test_x.shape)

(121, 280, 280, 3)


### (2) y : 클래스 만들기
- **세부요구사항**
    - Training set / Validation set / Test set의 y를 생성합니다.
        - 각각 normal, abnormal 데이터의 갯수를 다시 확인하고
        - normal을 0, abnormal을 1로 지정합니다.

In [None]:
# 데이터 갯수 확인
print( len(img_train_list) )
print( len([val for val in img_train_list if val.startswith('ab_')]) )
print('---')
print( len(img_valid_list) )
print( len([val for val in img_valid_list if val.startswith('ab_')]) )
print('---')
print( len(img_test_list) )
print( len([val for val in img_test_list if val.startswith('ab_')]) )

388
194
---
96
48
---
121
61


* y_train, y_valid, y_test 만들기
    * normal, abnormal 데이터의 갯수를 다시 확인하고 normal을 0, abnormal을 1로 지정합니다.

In [None]:
def get_y(dlist):
    y = []
    for val in dlist:
        if val.startswith('ab_'):
            y.append(1)
        else:
            y.append(0)
    y_np = np.array(y)

    return y_np

In [None]:
train_y = get_y(img_train_list)
train_y.shape

(388,)

In [None]:
val_y = get_y(img_valid_list)
val_y.shape

(96,)

In [None]:
test_y = get_y(img_test_list)
test_y.shape

(121,)

In [None]:
print(f'max: {train_x.max()}, min: {train_x.min()}')

max: 255.0, min: 0.0


In [None]:
# train_x.shape, train_y.shape, test_x.shape, test_y.shape
mean_x = train_x.mean()
std_x = train_x.std()

mean_x, std_x

(127.13476, 65.64915)

In [None]:
max_x = train_x.max()
min_x = train_x.min()

In [None]:
train_x_s = (train_x - mean_x) / std_x
val_x_s = (val_x - mean_x) / std_x
test_x_s = (test_x - mean_x) / std_x

In [None]:
test_x_minmax = (test_x - min_x) / (max_x - min_x)

In [None]:
print(f'max: {test_x_minmax.max()}, min: {test_x_minmax.min()}')

max: 1.0, min: 0.0


In [None]:
train_x_s.mean(), train_x_s.std()

(-1.8170322e-06, 0.9999961)

### (3) 모델1
- **세부요구사항**
    - Conv2D, MaxPooling2D, Flatten, Dense 레이어들을 이용하여 모델을 설계
    - 학습시 validation_data로 validation set을 사용하시오.
    - 반드시 Early Stopping 적용
    - 평가시, confusion matrix, accuracy, recall, precision, f1 score 등을 이용하시오.

#### 1) 구조 설계

In [None]:
print(train_x_s.shape, val_x_s.shape, test_x_s.shape)
print(train_y.shape, val_y.shape, test_y.shape)

(388, 224, 224, 3) (96, 224, 224, 3) (121, 224, 224, 3)
(388,) (96,) (121,)


In [None]:
# 1. session_clear
keras.backend.clear_session()

# 2. sequential model 선언
model = keras.models.Sequential()

# 3. layer 하나씩 쌓기
# input layer
model.add(keras.layers.Input(shape=(280, 280, 3)))
# Convolution filter
model.add(keras.layers.Conv2D(filters=64,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

model.add(keras.layers.Conv2D(filters=64,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

# BatchNormalization
model.add( keras.layers.BatchNormalization())
# Max Pooling
model.add( keras.layers.MaxPool2D(pool_size=(2,2)))

model.add( keras.layers.Dropout(0.25))

# Convolution filter
model.add(keras.layers.Conv2D(filters=32,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))
# BatchNormalization
model.add( keras.layers.BatchNormalization())

# Convolution filter
model.add(keras.layers.Conv2D(filters=64,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))
# BatchNormalization
model.add( keras.layers.BatchNormalization())

# Max Pooling
model.add( keras.layers.MaxPool2D(pool_size=(2,2)))
model.add( keras.layers.Dropout(0.25))

# Convolution filter
model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))
# BatchNormalization
model.add( keras.layers.BatchNormalization())
# Max Pooling
model.add( keras.layers.Dropout(0.25))

# Flatten
model.add( keras.layers.Flatten())
model.add( keras.layers.global())

# Dense
model.add( keras.layers.Dense(1, activation='sigmoid'))

# 4. compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer='adam')

# 5. summary
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 280, 280, 64)      1792      
                                                                 
 conv2d_1 (Conv2D)           (None, 280, 280, 64)      36928     
                                                                 
 batch_normalization (BatchN  (None, 280, 280, 64)     256       
 ormalization)                                                   
                                                                 
 max_pooling2d (MaxPooling2D  (None, 140, 140, 64)     0         
 )                                                               
                                                                 
 dropout (Dropout)           (None, 140, 140, 64)      0         
                                                                 
 conv2d_2 (Conv2D)           (None, 140, 140, 32)      1

#### 2) 학습
* EarlyStopping 설정하고 학습시키기

In [None]:
es = EarlyStopping(monitor = 'val_loss',
                   min_delta = 0,
                   patience = 5,
                   verbose = 1,
                   restore_best_weights = True)

In [None]:
hist = model.fit(train_x, train_y, validation_data=(val_x, val_y),
                 batch_size=32, epochs=1000, callbacks=[es], verbose=1)

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 7: early stopping


#### 3) test set으로 예측하고 평가하기
* 평가는 confusion_matrix, classification_report 활용

In [None]:
y_pred = model.predict(test_x)



In [None]:
print(test_y[:10])
print(y_pred[:10])

[1 1 1 1 1 1 1 1 1 1]
[[3.8222346e-23]
 [3.3873271e-19]
 [5.2595750e-21]
 [1.4843350e-21]
 [3.8292855e-11]
 [1.1446760e-05]
 [1.0392097e-21]
 [9.5180787e-11]
 [1.0000000e+00]
 [2.4906730e-02]]


In [None]:
performance_test = model.evaluate(test_x, test_y, batch_size=100)



In [None]:
print('Test Loss : {:.6f},  Test Accuracy : {:.3f}%'.format(performance_test[0], performance_test[1]*100))

Test Loss : 13.899045,  Test Accuracy : 48.958%


In [None]:
preds_1d = y_pred.flatten() # 차원 펴주기
pred_class = np.where(preds_1d > 0.5, 1 , 0) #0.5보다크면 2, 작으면 1

In [None]:
pred_class

array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 1])

In [None]:
test_y

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0])

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

print(confusion_matrix(test_y, pred_class))
print(classification_report(test_y, pred_class))

[[43  5]
 [44  4]]
              precision    recall  f1-score   support

           0       0.49      0.90      0.64        48
           1       0.44      0.08      0.14        48

    accuracy                           0.49        96
   macro avg       0.47      0.49      0.39        96
weighted avg       0.47      0.49      0.39        96



### (4) 모델2
- **세부요구사항**
    - Conv2D, MaxPooling2D, Flatten, Dense 레이어들을 이용하여 모델을 설계
    - 학습시 validation_data로 validation set을 사용하시오.
    - 반드시 Early Stopping 적용
    - 평가시, confusion matrix, accuracy, recall, precision, f1 score 등을 이용하시오.

#### 1) 구조 설계

In [None]:
print(train_x_s.shape, val_x_s.shape, test_x_s.shape)
print(train_y.shape, val_y.shape, test_y.shape)

(388, 280, 280, 3) (121, 280, 280, 3) (96, 280, 280, 3)
(388,) (121,) (96,)


In [None]:
# 1. session_clear
keras.backend.clear_session()

# 2. sequential model 선언
model = keras.models.Sequential()

# 3. layer 하나씩 쌓기
# input layer
model.add(keras.layers.Input(shape=(280, 280, 3)))

# Convolution filter
model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

# BatchNormalization
model.add( keras.layers.BatchNormalization())
# Max Pooling
model.add( keras.layers.MaxPool2D(pool_size=(2,2)))

model.add( keras.layers.Dropout(0.05))

# Convolution filter
model.add(keras.layers.Conv2D(filters=256,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))
# BatchNormalization
model.add( keras.layers.BatchNormalization())

# Convolution filter
model.add(keras.layers.Conv2D(filters=256,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))
# BatchNormalization
model.add( keras.layers.BatchNormalization())

# Max Pooling
model.add( keras.layers.MaxPool2D(pool_size=(2,2)))
model.add( keras.layers.Dropout(0.1))

# Convolution filter
model.add(keras.layers.Conv2D(filters=64,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))
# BatchNormalization
model.add( keras.layers.BatchNormalization())
# Max Pooling
model.add( keras.layers.Dropout(0.25))

# Flatten
model.add( keras.layers.Flatten())
# model.add( keras.layers.global())

# Dense
model.add( keras.layers.Dense(1, activation='sigmoid'))

# 4. compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer='adam')

# 5. summary
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 280, 280, 128)     3584      
                                                                 
 conv2d_1 (Conv2D)           (None, 280, 280, 128)     147584    
                                                                 
 batch_normalization (BatchN  (None, 280, 280, 128)    512       
 ormalization)                                                   
                                                                 
 max_pooling2d (MaxPooling2D  (None, 140, 140, 128)    0         
 )                                                               
                                                                 
 dropout (Dropout)           (None, 140, 140, 128)     0         
                                                                 
 conv2d_2 (Conv2D)           (None, 140, 140, 256)     2

#### 2) 학습
* EarlyStopping 설정하고 학습시키기

In [None]:
es = EarlyStopping(monitor = 'val_loss',
                   min_delta = 0,
                   patience = 7,
                   verbose = 1,
                   restore_best_weights = True)

In [None]:
hist = model.fit(train_x, train_y, validation_data=(val_x, val_y),
                 batch_size=32, epochs=1000, callbacks=[es], verbose=1)

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 24: early stopping


#### 3) test set으로 예측하고 평가하기
* 평가는 confusion_matrix, classification_report 활용

In [None]:
test_y.shape

(96,)

In [None]:
y_pred = model.predict(test_x)



In [None]:
performance_score = model.evaluate(test_x, test_y, batch_size=32)



In [None]:
preds_1d = y_pred.flatten() # 차원 펴주기
pred_class = np.where(preds_1d > 0.5, 1 , 0) #0.5보다크면 2, 작으면 1

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

In [None]:
print(confusion_matrix(test_y, pred_class))
print(classification_report(test_y, pred_class))

[[36 12]
 [ 3 45]]
              precision    recall  f1-score   support

           0       0.92      0.75      0.83        48
           1       0.79      0.94      0.86        48

    accuracy                           0.84        96
   macro avg       0.86      0.84      0.84        96
weighted avg       0.86      0.84      0.84        96



In [None]:

# 1. session_clear
keras.backend.clear_session()

# 2. sequential model 선언
model = keras.models.Sequential()

# 3. layer 하나씩 쌓기
# input layer
model.add(keras.layers.Input(shape=(280, 280, 3)))

# Convolution filter
model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

# Convolution filter
model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

# Convolution filter
model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))
# Convolution filter
model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

# Convolution filter
model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

# Convolution filter
model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

# Convolution filter
model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

# Convolution filter
model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

model.add( keras.layers.MaxPool2D(pool_size=(2,2)))

# Flatten
model.add( keras.layers.Flatten())
# model.add( keras.layers.global())

# Dense
model.add( keras.layers.Dense(128))
model.add( keras.layers.Dense(128))
model.add( keras.layers.Dense(1, activation='sigmoid'))

# 4. compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer='adam')

# 5. summary
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 280, 280, 128)     3584      
                                                                 
 conv2d_1 (Conv2D)           (None, 280, 280, 128)     147584    
                                                                 
 conv2d_2 (Conv2D)           (None, 280, 280, 128)     147584    
                                                                 
 conv2d_3 (Conv2D)           (None, 280, 280, 128)     147584    
                                                                 
 conv2d_4 (Conv2D)           (None, 280, 280, 128)     147584    
                                                                 
 conv2d_5 (Conv2D)           (None, 280, 280, 128)     147584    
                                                                 
 conv2d_6 (Conv2D)           (None, 280, 280, 128)     1

In [None]:
y_pred = model.predict(test_x)



In [None]:
performance_score = model.evaluate(test_x, test_y, batch_size=32)



In [None]:
preds_1d = y_pred.flatten() # 차원 펴주기
pred_class = np.where(preds_1d > 0.5, 1 , 0) #0.5보다크면 2, 작으면 1

In [None]:
from sklearn.metrics import confusion_matrix, classification_report
print(confusion_matrix(test_y, pred_class))
print(classification_report(test_y, pred_class))

[[ 1 47]
 [ 0 48]]
              precision    recall  f1-score   support

           0       1.00      0.02      0.04        48
           1       0.51      1.00      0.67        48

    accuracy                           0.51        96
   macro avg       0.75      0.51      0.36        96
weighted avg       0.75      0.51      0.36        96



### (5) 모델3
- **세부요구사항**
    - Conv2D, MaxPooling2D, Flatten, Dense 레이어들을 이용하여 모델을 설계
    - 학습시 validation_data로 validation set을 사용하시오.
    - 반드시 Early Stopping 적용
    - 평가시, confusion matrix, accuracy, recall, precision, f1 score 등을 이용하시오.

#### 1) 구조 설계

In [None]:
# VGGNet

# 1. session_clear
keras.backend.clear_session()

# 2. sequential model 선언
model = keras.models.Sequential()

# 3. layer 하나씩 쌓기
# input layer
model.add(keras.layers.Input(shape=(280, 280, 3)))

# Convolution filter
model.add(keras.layers.Conv2D(filters=64, 
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

model.add(keras.layers.Conv2D(filters=64,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

# BatchNormalization
model.add( keras.layers.BatchNormalization())
# Max Pooling
model.add( keras.layers.MaxPool2D(pool_size=(2,2)))

# Convolution filter
model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

model.add(keras.layers.Conv2D(filters=128,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

# BatchNormalization
model.add( keras.layers.BatchNormalization())
# MaxPool
model.add( keras.layers.MaxPool2D(pool_size=(2,2)))

model.add(keras.layers.Conv2D(filters=256,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

model.add(keras.layers.Conv2D(filters=256,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

model.add(keras.layers.Conv2D(filters=256,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

# MaxPool
model.add( keras.layers.MaxPool2D(pool_size=(2,2)))

model.add(keras.layers.Conv2D(filters=512,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

model.add(keras.layers.Conv2D(filters=512,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

model.add(keras.layers.Conv2D(filters=512,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

# MaxPool
model.add( keras.layers.MaxPool2D(pool_size=(2,2)))
# 여기서 사이즈기 35*35 가 되는데 다음 maxpooling 시 어떻게 처리되는 거지?

model.add(keras.layers.Conv2D(filters=512,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

model.add(keras.layers.Conv2D(filters=512,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

model.add(keras.layers.Conv2D(filters=512,
                              kernel_size=(3, 3),
                              padding='same',
                              strides=(1,1),
                              activation='relu'))

model.add( keras.layers.MaxPool2D(pool_size=(2,2)))

# Flatten
# model.add( keras.layers.Flatten())
# model.add( keras.layers.global())
model.add(tf.keras.layers.GlobalAveragePooling2D())

# Dense
# model.add( keras.layers.Dense(4096))
# model.add( keras.layers.Dense(4096))
model.add( keras.layers.Dense(1, activation='sigmoid'))

# 4. compile
model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer='adam')

# 5. summary
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 280, 280, 64)      1792      
                                                                 
 conv2d_1 (Conv2D)           (None, 280, 280, 64)      36928     
                                                                 
 batch_normalization (BatchN  (None, 280, 280, 64)     256       
 ormalization)                                                   
                                                                 
 max_pooling2d (MaxPooling2D  (None, 140, 140, 64)     0         
 )                                                               
                                                                 
 conv2d_2 (Conv2D)           (None, 140, 140, 128)     73856     
                                                                 
 conv2d_3 (Conv2D)           (None, 140, 140, 128)     1

#### 2) 학습
* EarlyStopping 설정하고 학습시키기

In [None]:
es = EarlyStopping(monitor = 'val_loss',
                   min_delta = 0,
                   patience = 8,
                   verbose = 1,
                   restore_best_weights = True)

In [None]:
es2 = EarlyStopping(monitor = 'val_accuracy',
                   min_delta = 0,
                   patience = 8,
                   verbose = 1,
                   restore_best_weights = True)

In [None]:
hist = model.fit(train_x, train_y, validation_data=(val_x, val_y),
                 batch_size=32, epochs=1000, callbacks=[es2], verbose=1)

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 17: early stopping


#### 3) test set으로 예측하고 평가하기
* 평가는 confusion_matrix, classification_report 활용

In [None]:
y_pred = model.predict(test_x)



In [None]:
performance_test = model.evaluate(test_x, test_y, batch_size=32)



In [None]:
print('Test Loss : {:.6f},  Test Accuracy : {:.3f}%'.format(performance_test[0], performance_test[1]*100))

Test Loss : 0.402522,  Test Accuracy : 84.298%


In [None]:
preds_1d = y_pred.flatten() # 차원 펴주기
pred_class = np.where(preds_1d > 0.5, 1 , 0) #0.5보다크면 2, 작으면 1

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

print(confusion_matrix(test_y, pred_class))
print(classification_report(test_y, pred_class))

[[54  6]
 [13 48]]
              precision    recall  f1-score   support

           0       0.81      0.90      0.85        60
           1       0.89      0.79      0.83        61

    accuracy                           0.84       121
   macro avg       0.85      0.84      0.84       121
weighted avg       0.85      0.84      0.84       121



scaling data

In [None]:
# scaling data
hist = model.fit(train_x_s, train_y, validation_data=(val_x_s, val_y),
                 batch_size=32, epochs=1000, callbacks=[es], verbose=1)

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 9: early stopping


In [None]:
# scaling data
y_pred_s = model.predict(test_x_s)



In [None]:
# scaling data
performance_test_s = model.evaluate(test_x_s, test_y, batch_size=32)



In [None]:
# scaling data
print('Test Loss : {:.6f},  Test Accuracy : {:.3f}%'.format(performance_test_s[0], performance_test_s[1]*100))

Test Loss : 0.708392,  Test Accuracy : 52.066%


In [None]:
# scaling
preds_1d_s = y_pred_s.flatten() # 차원 펴주기
pred_class_s = np.where(preds_1d_s > 0.5, 1 , 0) #0.5보다크면 2, 작으면 1

In [None]:
# scaling
from sklearn.metrics import confusion_matrix, classification_report

print(confusion_matrix(test_y, pred_class_s))
print(classification_report(test_y, pred_class_s))

[[44  4]
 [ 9 39]]
              precision    recall  f1-score   support

           0       0.83      0.92      0.87        48
           1       0.91      0.81      0.86        48

    accuracy                           0.86        96
   macro avg       0.87      0.86      0.86        96
weighted avg       0.87      0.86      0.86        96



In [12]:
from tensorflow.keras import datasets, layers, models, losses, Model

In [None]:
# googLeNet

def inception(x, filters_1x1, filters_3x3_reduce, filters_3x3, filters_5x5_reduce, filters_5x5, filters_pool):
    path1 = layers.Conv2D(filters_1x1, (1, 1), padding='same',    activation='relu')(x)
    path2 = layers.Conv2D(filters_3x3_reduce, (1, 1), padding='same', activation='relu')(x)
    path2 = layers.Conv2D(filters_3x3, (1, 1), padding='same', activation='relu')(path2)
    path3 = layers.Conv2D(filters_5x5_reduce, (1, 1), padding='same', activation='relu')(x)
    path3 = layers.Conv2D(filters_5x5, (1, 1), padding='same', activation='relu')(path3)
    path4 = layers.MaxPool2D((3, 3), strides=(1, 1), padding='same')(x)
    path4 = layers.Conv2D(filters_pool, (1, 1), padding='same', activation='relu')(path4)
    
    return tf.concat([path1, path2, path3, path4], axis=3)

In [None]:
inp = layers.Input(shape=(280, 280, 3))
input_tensor = layers.experimental.preprocessing.Resizing(224, 224, interpolation="bilinear", input_shape=train_x.shape[1:])(inp)
x = layers.Conv2D(64, 7, strides=2, padding='same', activation='relu')(input_tensor)
x = layers.MaxPooling2D(3, strides=2)(x)
x = layers.Conv2D(64, 1, strides=1, padding='same', activation='relu')(x)
x = layers.Conv2D(192, 3, strides=1, padding='same', activation='relu')(x)
x = layers.MaxPooling2D(3, strides=2)(x)
x = inception(x, filters_1x1=64, filters_3x3_reduce=96, filters_3x3=128, filters_5x5_reduce=16, filters_5x5=32, filters_pool=32)
x = inception(x, filters_1x1=128, filters_3x3_reduce=128, filters_3x3=192, filters_5x5_reduce=32, filters_5x5=96, filters_pool=64)
x = layers.MaxPooling2D(3, strides=2)(x)
x = inception(x, filters_1x1=192, filters_3x3_reduce=96, filters_3x3=208, filters_5x5_reduce=16, filters_5x5=48, filters_pool=64)
aux1 = layers.AveragePooling2D((5, 5), strides=3)(x)
aux1 =layers.Conv2D(128, 1, padding='same', activation='relu')(aux1)
aux1 = layers.Flatten()(aux1)
aux1 = layers.Dense(1024, activation='relu')(aux1)
aux1 = layers.Dropout(0.7)(aux1)
aux1 = layers.Dense(1, activation='sigmoid')(aux1)
x = inception(x, filters_1x1=160, filters_3x3_reduce=112, filters_3x3=224, filters_5x5_reduce=24, filters_5x5=64, filters_pool=64)
x = inception(x, filters_1x1=128, filters_3x3_reduce=128, filters_3x3=256, filters_5x5_reduce=24, filters_5x5=64, filters_pool=64)
x = inception(x, filters_1x1=112, filters_3x3_reduce=144, filters_3x3=288, filters_5x5_reduce=32, filters_5x5=64, filters_pool=64)
aux2 = layers.AveragePooling2D((5, 5), strides=3)(x)
aux2 =layers.Conv2D(128, 1, padding='same', activation='relu')(aux2)
aux2 = layers.Flatten()(aux2)
aux2 = layers.Dense(1024, activation='relu')(aux2)
aux2 = layers.Dropout(0.7)(aux2) 
aux2 = layers.Dense(1, activation='sigmoid')(aux2)
x = inception(x, filters_1x1=256, filters_3x3_reduce=160, filters_3x3=320, filters_5x5_reduce=32, filters_5x5=128, filters_pool=128)
x = layers.MaxPooling2D(3, strides=2)(x)
x = inception(x, filters_1x1=256, filters_3x3_reduce=160, filters_3x3=320, filters_5x5_reduce=32, filters_5x5=128, filters_pool=128)
x = inception(x, filters_1x1=384, filters_3x3_reduce=192, filters_3x3=384, filters_5x5_reduce=48, filters_5x5=128, filters_pool=128)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.4)(x)
out = layers.Dense(1, activation='sigmoid')(x)

In [None]:
model_googLeNet = Model(inputs = inp, outputs = [out, aux1, aux2])

In [None]:
# model.compile(loss='binary_crossentropy', metrics=['accuracy'], optimizer='adam')

model_googLeNet.compile(optimizer='adam', 
              loss=['binary_crossentropy', 'binary_crossentropy', 'binary_crossentropy'],
              loss_weights=[1, 0.3, 0.3],
              metrics=['accuracy'])

model_googLeNet.summary()


Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 280, 280, 3  0           []                               
                                )]                                                                
                                                                                                  
 resizing (Resizing)            (None, 224, 224, 3)  0           ['input_1[0][0]']                
                                                                                                  
 conv2d (Conv2D)                (None, 112, 112, 64  9472        ['resizing[0][0]']               
                                )                                                                 
                                                                                            

In [None]:
es = EarlyStopping(monitor = 'val_loss',
                   min_delta = 0,
                   patience = 8,
                   verbose = 1,
                   restore_best_weights = True)

In [None]:
# scaling data
# hist = model.fit(train_x_s, train_y, validation_data=(val_x_s, val_y),
#                  batch_size=32, epochs=1000, callbacks=[es], verbose=1)

history = model.fit(train_x, [train_y, train_y, train_y], validation_data=(val_x, [val_y, val_y, val_y]), batch_size=32, epochs=40, callbacks=[es], verbose=1)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


In [None]:
y_pred = model.predict(test_x)



In [None]:
performance_test = model.evaluate(test_x, test_y, batch_size=32)



In [None]:
performance_test

[0.5483642220497131,
 0.2988683879375458,
 0.4557562470436096,
 0.3758964538574219,
 0.8677685856819153,
 0.8264462947845459,
 0.8429751992225647]

In [None]:
print('Test Loss : {:.6f},  Test Accuracy : {:.3f}%'.format(performance_test[0], performance_test[1]*100))

Test Loss : 0.548364,  Test Accuracy : 29.887%


In [None]:
len(y_pred)

3

In [None]:
preds_1d = y_pred[0].flatten() # 차원 펴주기
pred_class = np.where(preds_1d > 0.5, 1 , 0) #0.5보다크면 2, 작으면 1

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

print(confusion_matrix(test_y, pred_class))
print(classification_report(test_y, pred_class))

[[50 10]
 [ 6 55]]
              precision    recall  f1-score   support

           0       0.89      0.83      0.86        60
           1       0.85      0.90      0.87        61

    accuracy                           0.87       121
   macro avg       0.87      0.87      0.87       121
weighted avg       0.87      0.87      0.87       121



## 4.모델링 II
* **세부요구사항**
    - 성능을 높이기 위해서 다음의 두가지를 시도해 봅시다.
        - Data Augmentation을 통해 데이터를 증가 시킵니다.
            - ImageDataGenerator를 사용합니다.
        - 사전 학습된 모델(Transfer Learning)을 가져다 사용해 봅시다.
            - VGG16(이미지넷)을 사용해 봅시다.

### (1) Data Augmentation
- **세부요구사항**
    * 모델 학습에 이용할 이미지 데이터를 증강시키세요.
    * Keras의 ImageDataGenerator를 이용
        - [ImageDataGenerator document](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator)

    * image generator를 이용하여 학습
        * 모델 구조는 이미 생성한 1,2,3 중 하나를 선택하여 학습


In [13]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [17]:
img_size = 280 ## 사이즈 조정 가능
dataset_path = '/content/drive/MyDrive/Datasets/'
train_path = dataset_path+'Car_Image_train/'
valid_path = dataset_path+'Car_Image_val/'
test_path = dataset_path+'Car_Image_test/'

#### 1) ImageGenerator 생성
* ImageDataGenerator 함수 사용
    * 주요 옵션
        * rotation_range: 무작위 회전을 적용할 각도 범위
        * zoom_range: 무작위 줌을 적용할 범위 [1-zoom_range, 1+zoom_range]
        * horizontal_flip: 무작위 좌우반전을 적용할지 여부
        * vertical_flip: 무작위 상하반전을 적용할지 여부
        * rescale: 텐서의 모든 값을 rescale 값으로 나누어줌 (이 경우에는 255로 나누어서 0~1사이의 값으로 변경)

In [15]:
train_datagen = ImageDataGenerator(
    rotation_range = 20,
    zoom_range = 0.1,
    width_shift_range = 0.1,
    height_shift_range = 0.1,
    horizontal_flip = True,
    vertical_flip = True,
    rescale = 1/255.
)


valid_datagen = ImageDataGenerator(
    rescale = 1/255.
)

####
test_datagen = ImageDataGenerator(
    rescale = 1/255
)

#### 2) 경로로 부터 이미지 불러 올 준비
* .flow_from_directory 이용
    * 디렉토리에서 이미지를 가져와서 데이터 증강을 적용하고 batch 단위로 제공하는 generator를 생성합니다.
    * 이미지를 불러올 때 target_size로 크기를 맞추고, 
    * class_mode로 이진 분류(binary)를 수행하도록 지정합니다.


In [18]:
train_generator = train_datagen.flow_from_directory(directory=train_path, target_size=(280, 280), class_mode='binary', batch_size=32, shuffle=True)

valid_generator = valid_datagen.flow_from_directory(directory=valid_path, target_size=(280, 280), class_mode='binary', batch_size=32, shuffle=True)

test_generator = test_datagen.flow_from_directory(directory=test_path, target_size=(280, 280), class_mode='binary', batch_size=32, shuffle=True)

Found 388 images belonging to 2 classes.
Found 96 images belonging to 2 classes.
Found 121 images belonging to 2 classes.


#### 3) 학습
- **세부요구사항**
    - Conv2D, MaxPooling2D, Flatten, Dense 레이어들을 이용하여 모델을 설계
    - 학습시 train_generator 이용. 
    - validation_data = valid_generator 지정
    - Early Stopping 적용
    - 평가시, confusion matrix, accuracy, recall, precision, f1 score 등을 이용하시오.

* 구조 설계

In [19]:
from tensorflow.keras.applications import Xception
from tensorflow.keras.layers import Input, GlobalAveragePooling2D, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

In [20]:
def create_model(verbose=False):
    input_tensor = Input(shape=(280, 280, 3))
    pretrained_model = Xception(input_tensor=input_tensor, include_top=False, weights='imagenet')
    pretrained_output = pretrained_model.output

    # customize Classifier layer
    x = GlobalAveragePooling2D()(pretrained_output)
    x = Dense(units=2, activation='relu')(x)
    output = Dense(units=1, activation='sigmoid')(x)

    model = Model(inputs=input_tensor, outputs=output)
    if verbose:
        model.summary()
    return model

In [21]:
# 모델 정의
model_Xception = create_model(verbose=False)
# 모델 compile
model_Xception.compile(optimizer=Adam(learning_rate=0.0001), loss='binary_crossentropy', metrics=['accuracy'])

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/xception/xception_weights_tf_dim_ordering_tf_kernels_notop.h5


* 학습
    * EarlyStopping 설정하기
    * 학습 데이터에 train_generator, validation_data=valid_generator 사용

In [22]:
# es
es = EarlyStopping(monitor='val_loss',
                   min_delta=0,
                   patience=5,
                   verbose=1,
                   restore_best_weights=True)

In [23]:
# 모델 학습(fit)
train_hist = model_Xception.fit(train_generator, validation_data=valid_generator,
                                batch_size=32, epochs=500, callbacks=[es], verbose=1)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 20: early stopping


In [25]:
# model_googLeNet
# 모델 학습(fit)
train_hist = model_googLeNet.fit(train_generator, validation_data=valid_generator,
                                batch_size=32, epochs=500, callbacks=[es], verbose=1)

NameError: ignored

#### 4) 성능 평가
* 평가는 confusion_matrix, classification_report 활용

In [26]:
y_pred_Xception = model_Xception.predict(test_x)



In [None]:
y_pred_Xception.shape

(121, 1)

In [27]:
performance_test = model_Xception.evaluate(test_generator)



In [28]:
y_pred = model_Xception.predict(x_test)

NameError: ignored

In [None]:
print('Test Loss : {:.6f},  Test Accuracy : {:.3f}%'.format(performance_test[0], performance_test[1]*100))

Test Loss : 0.099654,  Test Accuracy : 98.347%


In [None]:
performance_test

[4.364559650421143, 0.11570248007774353]

In [None]:
preds_1d = y_pred_Xception.flatten() # 차원 펴주기
pred_class = np.where(preds_1d > 0.5, 1 , 0) #0.5보다크면 2, 작으면 1

In [None]:
pred_class

array([0])

In [None]:
from sklearn.metrics import confusion_matrix, classification_report

print(confusion_matrix(test_y, pred_class))
print(classification_report(test_y, pred_class))

[[ 0 60]
 [47 14]]
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        60
           1       0.19      0.23      0.21        61

    accuracy                           0.12       121
   macro avg       0.09      0.11      0.10       121
weighted avg       0.10      0.12      0.10       121



In [None]:
# googLeNet 결과
y_pred_goog = model_googLeNet.predict(test_x_minmax)



In [None]:
y_pred_goog[0].shape

(121, 1)

In [None]:
model_Xception.evaluate(test_generator)



[0.5708385705947876, 0.8842975497245789]

In [None]:
model_googLeNet.evaluate(test_generator)



[0.5901301503181458,
 0.36965054273605347,
 0.3731386661529541,
 0.36179351806640625,
 0.8099173307418823,
 0.8264462947845459,
 0.8264462947845459]

In [None]:
performance_test = model_googLeNet.evaluate(test_x_minmax, test_y, batch_size=32)



In [None]:
performance_test

[3.6194119453430176,
 2.232811212539673,
 2.087596893310547,
 2.534405469894409,
 0.1900826394557953,
 0.1735537201166153,
 0.1735537201166153]

### (2) Transfer Learning
- **세부요구사항**
    * VGG16 모델은 1000개의 클래스를 분류하는 데 사용된 ImageNet 데이터셋을 기반으로 사전 학습된 가중치를 가지고 있습니다. 
        * 따라서 이 모델은 이미지 분류 문제에 대한 높은 성능을 보입니다.
        * 이 모델은 보통 전이학습(transfer learning)에서 기본적으로 사용되며, 특히 대규모 데이터셋이 없을 때는 기본 모델로 사용되어 fine-tuning을 수행합니다.
    * VGG16 함수로 부터 base_model 저장


In [None]:
from tensorflow.keras.applications import VGG16

#### 1) VGG16 불러와서 저장하기
* include_top=False로 설정하여 분류기를 제외하고 미리 학습된 가중치 imagenet을 로드합니다.
* .trainable을 True로 설정하여 모델의 모든 레이어들이 fine-tuning에 대해 업데이트되도록 합니다.


In [None]:
base_model = VGG16(                 )




#### 2) VGG16과 연결한 구조 설계
* VGG16을 불러와서 Flatten, Dense 등으로 레이어 연결하기

#### 3) 학습
- **세부요구사항**
    - 모델 학습 과정에 알맞은 보조 지표를 사용하세요.
    - 데이터
        * Image Generator를 연결하거나
        * 기존 train, validation 셋을 이용해도 됩니다.
        - Early Stopping을 반드시 사용하세요.
        - 최적의 가중치를 모델에 적용하세요.

#### 4) 성능 평가