

```
# Cats vs Dogs 분류

* Convolution Neural network 활용한 분류 모델 (Classification)
* tensorflow-datasets 를 활용한 데이터 전처리

In [1]:
import tensorflow_datasets as tfds
import tensorflow as tf

from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import ModelCheckpoint

## Load Dataset

**tensorflow-datasets**를 활용

* [Cats vs Dogs 데이터셋 문서](https://www.tensorflow.org/datasets/catalog/cats_vs_dogs?hl=ko)

* [tensorflow-datasets](https://www.tensorflow.org/datasets/splits?hl=ko)

In [2]:
dataset_name = 'cats_vs_dogs'

# 처음 80%의 데이터만 사용
train_dataset = tfds.load(name=dataset_name, split='train[:80%]')

# 최근 20%의 데이터만 사용
valid_dataset = tfds.load(name=dataset_name, split='train[80%:]')

[1mDownloading and preparing dataset cats_vs_dogs/4.0.0 (download: 786.68 MiB, generated: Unknown size, total: 786.68 MiB) to /root/tensorflow_datasets/cats_vs_dogs/4.0.0...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]





0 examples [00:00, ? examples/s]



Shuffling and writing examples to /root/tensorflow_datasets/cats_vs_dogs/4.0.0.incomplete1BW4WU/cats_vs_dogs-train.tfrecord


  0%|          | 0/23262 [00:00<?, ? examples/s]

[1mDataset cats_vs_dogs downloaded and prepared to /root/tensorflow_datasets/cats_vs_dogs/4.0.0. Subsequent calls will reuse this data.[0m


1. 이미지 정규화 (Normalization)
2. 이미지 사이즈 맞추기: (224 X 224) 
3. image(x), label(y)를 분할

In [3]:
train_dataset

<PrefetchDataset element_spec={'image': TensorSpec(shape=(None, None, 3), dtype=tf.uint8, name=None), 'image/filename': TensorSpec(shape=(), dtype=tf.string, name=None), 'label': TensorSpec(shape=(), dtype=tf.int64, name=None)}>

In [4]:
for data in train_dataset.take(3):
  x = data['image'] / 255
  y = data['label']
  x = tf.image.resize(x, size=(224, 224))
  print(x.shape)
  print(y)

(224, 224, 3)
tf.Tensor(1, shape=(), dtype=int64)
(224, 224, 3)
tf.Tensor(1, shape=(), dtype=int64)
(224, 224, 3)
tf.Tensor(1, shape=(), dtype=int64)


numpy=1 => 원핫인코딩 X, [0,1] 등이 나와야 원핫인코딩 O

In [7]:
def preprocess(data):
    # x, y 데이터를 정의
    x = data['image']
    y = data['label']
    # image 정규화(Normalization)
    x = x / 255
    # 사이즈를 (224, 224)로 변환
    x = tf.image.resize(x, size=(224, 224))
    # x, y  데이터를 return
    return x, y

만든 전처리 함수(preprocessing)를 **dataset에 mapping**하고, **batch_size도 지정**

In [6]:
batch_size=32

In [8]:
train_data = train_dataset.map(preprocess).batch(batch_size)
valid_data = valid_dataset.map(preprocess).batch(batch_size)

## 모델 정의 (Sequential)

Modeling

1. `input_shape`는 (height, width, color_channel)
cats vs dogs 에서는 (224, 224, 3)
2. 깊은 출력층과 더 많은 Layer를 쌓는다.
3. Dense Layer에 `activation='relu'`를 적용한다.
4. 분류(Classification)의 마지막 층의 출력 숫자는 분류하고자 하는 클래스 갯수와 **같아야** 한다.


In [9]:
model = Sequential([
    Conv2D(64, (3, 3), input_shape=(224, 224, 3), activation='relu'),
    MaxPooling2D(2, 2),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),
    Conv2D(128, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),
    Conv2D(128, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),
    Conv2D(256, (3, 3), activation='relu'),
    MaxPooling2D(2, 2),
    Flatten(),
    Dropout(0.5),
    Dense(512, activation='relu'),
    Dense(128, activation='relu'),
    Dense(2, activation='softmax'),
])

In [10]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 222, 222, 64)      1792      
                                                                 
 max_pooling2d (MaxPooling2D  (None, 111, 111, 64)     0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 109, 109, 64)      36928     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 54, 54, 64)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 52, 52, 128)       73856     
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 26, 26, 128)      0

## 컴파일 (compile)

1. `optimizer`는 가장 최적화가 잘되는 알고리즘 'adam'을 사용
2. `loss`설정
  * 출력층 activation이 `sigmoid` 인 경우: `binary_crossentropy`
  * 출력층 activation이 `softmax` 인 경우: 
    * 원핫인코딩(O): `categorical_crossentropy`
    * 원핫인코딩(X): `sparse_categorical_crossentropy`)
3. `metrics`를 'acc' 혹은 'accuracy'로 지정하면, 학습시 정확도를 모니터링 할 수 있다.

model.compile()

In [11]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])

## ModelCheckpoint: 체크포인트 생성

`val_loss` 기준으로 epoch 마다 최적의 모델을 저장하기 위하여, ModelCheckpoint 설정
* `checkpoint_path`는 모델이 저장될 파일 명을 설정한다.
* `ModelCheckpoint`을 선언하고, 적절한 옵션 값을 지정한다.

In [12]:
checkpoint_path = "my_checkpoint.ckpt"
checkpoint = ModelCheckpoint(filepath=checkpoint_path, 
                             save_weights_only=True, 
                             save_best_only=True, 
                             monitor='val_loss', 
                             verbose=1)

## 학습 (fit)

In [13]:
model.fit(train_data,
          validation_data=(valid_data),
          epochs=20,
          callbacks=[checkpoint],
          )

Epoch 1/20
Epoch 1: val_loss improved from inf to 0.69120, saving model to my_checkpoint.ckpt
Epoch 2/20
Epoch 2: val_loss improved from 0.69120 to 0.65981, saving model to my_checkpoint.ckpt
Epoch 3/20
Epoch 3: val_loss improved from 0.65981 to 0.54826, saving model to my_checkpoint.ckpt
Epoch 4/20
Epoch 4: val_loss improved from 0.54826 to 0.46697, saving model to my_checkpoint.ckpt
Epoch 5/20
Epoch 5: val_loss improved from 0.46697 to 0.39888, saving model to my_checkpoint.ckpt
Epoch 6/20
Epoch 6: val_loss did not improve from 0.39888
Epoch 7/20
Epoch 7: val_loss improved from 0.39888 to 0.32936, saving model to my_checkpoint.ckpt
Epoch 8/20
Epoch 8: val_loss improved from 0.32936 to 0.32132, saving model to my_checkpoint.ckpt
Epoch 9/20
Epoch 9: val_loss did not improve from 0.32132
Epoch 10/20
Epoch 10: val_loss improved from 0.32132 to 0.31856, saving model to my_checkpoint.ckpt
Epoch 11/20
Epoch 11: val_loss improved from 0.31856 to 0.31508, saving model to my_checkpoint.ckpt
Ep

<keras.callbacks.History at 0x7fecb7282f10>

## 학습 완료 후 Load Weights (ModelCheckpoint)

학습이 완료된 후에는 반드시 `load_weights`를 해주어야 한다.

In [14]:
# checkpoint 를 저장한 파일명을 입력합니다.
model.load_weights(checkpoint_path)

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7fecb71bdcd0>