<a href="https://colab.research.google.com/github/alangkim/fchollet/blob/main/%EB%94%A5%EB%9F%AC%EB%8B%9D_%EA%B8%B0%EB%A7%90%EA%B3%A0%EC%82%AC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ch8. Introduction to deep learning for computer vision

1. Introduction to
convnets

2. Training a
convnet from scratch on a small dataset

3. Leveraging a
pretrained model

## 1. Introduction to convnets

Stack of Conv2D and MaxPooling2D layers

In [None]:
# Instantiating a small convnet

from tensorflow import keras
from tensorflow.keras import layers

inputs = keras.Input(shape=(28, 28, 1))                                     # MNIST dataset을 이용하기 위해 28*28 사용
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs)     # Conv2D
x = layers.MaxPooling2D(pool_size=2)(x)                                     # MaxPooling2D
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)                                                     # Flatten all the information
outputs = layers.Dense(10, activation="softmax")(x)                         # connect Dense layer

model = keras.Model(inputs=inputs, outputs=outputs)                         # making model by functional API

In [None]:
model.summary()

In [None]:
# Training the convnet on MNIST images

from tensorflow.keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1)) # CNN을 이용하기 위해서 channel dimension은 필수적이다.
# Convnet is running on the original shape of the image.
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype("float32") / 255

model.compile(optimizer="rmsprop",
    loss="sparse_categorical_crossentropy", # multi class classification
    metrics=["accuracy"])

model.fit(train_images, train_labels, epochs=5, batch_size=64)

In [None]:
# Evaluating the convnet

test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc:.3f}")

### The convolution operation

* 'Dense layers' learn 'global patterns' in their input feature space whereas 'convolution layers' learn 'local patterns'

* The patterns they learn are
translation invariant

* They can learn spatial hierarchies of patterns

* Convolution preserves the spatial relationship between pixels by learning image
features using small squares (depending on the filter size) of input data

* Convolution: multiplying elementwise by filter and summing the multiplication
outputs

* Ex) a 3x3 kernel or 3x3x1 filter acts on a 5x6 input image with stride 1 and outputs
a 3x4 feature map.

* In fully connected sense, we need unshared 30(=5x6)x12(=3x4) weights (input size x output size)

* 9 vs 360. So using convolution filter is far more efficient.

Convolution on MxNx3 image with 3x3x3 filter producing 1 feature map by taking dot products between the filter and 3x3x3 piecies of the image.

Depth part is decided based on the input feature map.

### Why convolution?

* Fully Connected -> 1000x1000 images, 10000 hidden nodes, 10^10 parameters
* Convolution     -> 1000x1000 images, 10x10 filter size, 100 filters, 10^4 parameters

* If you are dealing with image dataset, it's highly recommend to use convolution layers in modeling.



### How convolution filter works?

Different values of the filter matrix produce different
feature maps for the same input image.

CNN learns the values of filters during training

The more filters, the more features are extracted

### Feature map


4 parameters of feature map

1. filter size
2. depth
3. stride
4. zero-padding

### The max pooling operation


Role
of max pooling: to aggressively downsample feature maps

Transformed via a hardcoded max
tensor operation

We need the features from the last
convolution layer to contain
information about the totality of the
input

The final feature map has 22
× 22 ×
128 = 61,952 total coefficients per
sample

This is far too large for such a
small model and would result in
intense overfitting

In [None]:
# max-pooling이 없는 경우
inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(10, activation="softmax")(x)
model_no_max_pool = keras.Model(inputs=inputs, outputs=outputs)

In [None]:
model_no_max_pool.summary()
# 모델의 크기에 비해 parameters가 너무 많다.

In [None]:
# max-pooling은 없지만 stride를 2로 지정한 경우
inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(filters=32, kernel_size=3, strides = 2, activation="relu")(inputs) # stride = 2 로 지정.
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(10, activation="softmax")(x)
model_no_max_pool = keras.Model(inputs=inputs, outputs=outputs)

In [None]:
model_no_max_pool.summary()
# parameters가 많이 줄어들었으나 max-pooling의 결과가 더 좋다.
# 일반적으로 classification에서는 stride보다 max-pooling을 자주 사용한다.
# 경험적으로 대부분 average-pooling보다 max-poolng이 좋다.

## 2. Training a convnet from scratch on a small dataset

Downloading a
Kaggle dataset in Google Colaboratory

Access to the API is restricted to
Kaggle users, you need to authenticate yourself.

The
kaggle package will look for your login credentials in a JSON file located at
kaggle kaggle.json

First, you need to create a
Kaggle API key and download it to your local machine
Login
--> My Account --> Account settings --> API
Click the Create New API Token
button


Second, go to your
Colab notebook, and upload the API’s key JSON file to your
Colab session by running the following code in a notebook cell:

### 데이터 불러오기

In [None]:
from google.colab import files
files.upload()

In [None]:
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

In [None]:
!kaggle competitions download -c dogs-vs-cats

In [None]:
import os
os.listdir()

In [None]:
!unzip -qq dogs-vs-cats.zip

In [None]:
os.listdir()

In [None]:
!unzip -qq train.zip

In [None]:
os.listdir()

In [None]:
os.listdir('train')

### Copying images to training, validation, and test directories

복잡하게 나열되어있는 data를 train, validation, test로 나누고 각각 1000개, 500개, 1000개의 data를 넣는 전처리

In [None]:
import os, shutil, pathlib

original_dir = pathlib.Path("train")
# original dataset이 풀려있는 directory
new_base_dir = pathlib.Path("cats_vs_dogs_small")
# smaller dataset을 저장할 directory

def make_subset(subset_name, start_index, end_index):
    for category in ("cat", "dog"):
        dir = new_base_dir / subset_name / category
        os.makedirs(dir)
        # 새로운 directory 만들기 ex) cats_vs_dogs_small/train/dog
        fnames = [f"{category}.{i}.jpg" for i in range(start_index, end_index)]
        # 파일 이름 만들기
        for fname in fnames:
            shutil.copyfile(src=original_dir / fname,
                            dst=dir / fname)
            # src : source, dst : destination

make_subset("train", start_index=0, end_index=1000)
# 처음 1000개로 train set을 만듦
make_subset("validation", start_index=1000, end_index=1500)
# 그 다음 500개로 validation set을 만듦
make_subset("test", start_index=1500, end_index=2500)
# 그 다음 1000개로 test set을 만듦

In [None]:
os.listdir(new_base_dir)

In [None]:
# 위 코드와 동일
os.listdir('cats_vs_dogs_small')

In [None]:
os.listdir('cats_vs_dogs_small/test')

In [None]:
os.listdir('cats_vs_dogs_small/test/dog')
# 1500~2500 index를 가진 dog 파일이 들어가있음

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

inputs = keras.Input(shape=(180, 180, 3))
# 180x180 size를 가진 RGB image
x = layers.Rescaling(1./255)(inputs)
# rescale
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
# binary classification이라 activation은 sigmoid
model = keras.Model(inputs=inputs, outputs=outputs)

In [None]:
model.summary()

# height, width는 점점 작아지고 depth는 점점 깊어진다.

In [None]:
model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

### Data preprocessing

1. Read the picture files.
2. Decode the JPEG content to RGB grids of pixels
3. Convert these into floating
point tensors
4. Resize them to a shared size (we’ll use 180
× 180)
5. Pack them into batches (we’ll use batches of 32 images)

In [None]:
# Using image_dataset_from_directory to read images

from tensorflow.keras.utils import image_dataset_from_directory

train_dataset = image_dataset_from_directory(
    new_base_dir / "train",
    image_size=(180, 180),
    batch_size=32)
validation_dataset = image_dataset_from_directory(
    new_base_dir / "validation",
    image_size=(180, 180),
    batch_size=32)
test_dataset = image_dataset_from_directory(
    new_base_dir / "test",
    image_size=(180, 180),
    batch_size=32)

### Example

#### Understanding TensorFlow Dataset objects



TensorFlow
makes available the tf.data API to create efficient input pipelines

The Dataset class handles many key features that would otherwise be
cumbersome to implement yourself in particular, asynchronous data prefetching

The Dataset class also exposes a functional
style API for modifying datasets

In [None]:
import numpy as np
import tensorflow as tf
random_numbers = np.random.normal(size=(1000, 16))
dataset = tf.data.Dataset.from_tensor_slices(random_numbers)
# from_tensor_slices() class can be used to create a Dataset from a NumPy array

In [None]:
# Yielding single samples

for i, element in enumerate(dataset):
    print(element.shape)
    if i >= 2:
        break

In [None]:
# We can use .batch() method to batch the data

batched_dataset = dataset.batch(32)
for i, element in enumerate(batched_dataset):
    print(element.shape)
    if i >= 2:
        break

#### Range of useful dataset methods

* .shuffle(buffer_size) : Shuffles elements within a buffer
* .prefetch (buffer_size) : Prefetches a buffer of elements in GPU memory to achieve
better device utilization.
* .map(callable) : Applies an arbitrary transformation to each element of the dataset

In [None]:
reshaped_dataset = dataset.map(lambda x: tf.reshape(x, (4, 4)))
for i, element in enumerate(reshaped_dataset):
    print(element.shape)
    if i >= 2:
        break

In [None]:
reshaped_dataset = dataset.map(lambda x: tf.reshape(x, (4, 4))).batch(32)
for i, element in enumerate(reshaped_dataset):
    print(element.shape)
    if i >= 2:
        break

### 다시 원래 문제로 돌아가자

In [None]:
# Displaying the shapes of the data and labels yielded by the Dataset

for data_batch, labels_batch in train_dataset:
    print("data batch shape:", data_batch.shape)
    print("labels batch shape:", labels_batch.shape)
    break

In [None]:
# Fitting the model using a Dataset

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="convnet_from_scratch.keras",
        save_best_only=True,
        monitor="val_loss")
]
history = model.fit(
    train_dataset,
    epochs=30,
    validation_data=validation_dataset,
    callbacks=callbacks)

In [None]:
# Displaying curves of loss and accuracy during training

import matplotlib.pyplot as plt

accuracy = history.history["accuracy"]
val_accuracy = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(accuracy) + 1)
plt.plot(epochs, accuracy, "bo", label="Training accuracy")
plt.plot(epochs, val_accuracy, "b", label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()
plt.show()

In [None]:
# Evaluating the model on the test set
# sample이 2000개로 너무 적어 overfitting이 나타날 것이다.

test_model = keras.models.load_model("convnet_from_scratch.keras")
test_loss, test_acc = test_model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")

### Using data augmentation to prevent overfitting

* **Data augmentation**
takes the approach of generating more training data
from existing training samples by **augmenting the samples via a number of random transformations**
that yield believable looking images

* In
Keras , this can be done by adding a number of data augmentation layers at
the start of your model.

In [None]:
# 모델에 다음과 같이 data_augmentation을 삽입할 수 있다.

data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)

**RandomFlip**("horizontal")
is for randomly flipping half the images horizontally

**RandomRotation**(0.1)
Rotates the input images by a random value in the range [ -10%, +10%]

**RandomZoom**(0.2)
Zooms in or out of the image by a random factor in the range [ -20%, +20%]

In [None]:
plt.figure(figsize=(10, 10))
for images, _ in train_dataset.take(1):
# We can use .take(N) to only sample N batches from the dataset. This is equivalent to inserting a break in the loop after the Nth batch
    for i in range(9):
        augmented_images = data_augmentation(images)
        # apply the augmentation
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(augmented_images[0].numpy().astype("uint8"))
        # Display the first image in the output batch.
        # For each of the 9 iteration, this is a different augmentation of the same image
        plt.axis("off")

# augmentation을 통해 dataset이 많아지면 overfitting을 prevent할 수 있다.

### Defining a new convnet

In [None]:
# New convnet includes Image augmentation and dropout

inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs) # augmentation
x = layers.Rescaling(1./255)(x)
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
x = layers.Dropout(0.5)(x) # dropout
# dropout을 convolution layer에 사용하는 것은 좋지 않다.
# 일반적인 Dropout은 convolution layer에 사용하지 않는다.
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

In [None]:
# Training the regularized convnet

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="convnet_from_scratch_with_augmentation.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(
    train_dataset,
    epochs=100,
    validation_data=validation_dataset,
    callbacks=callbacks)

In [None]:
# Evaluating the model on the test set

test_model = keras.models.load_model(
    "convnet_from_scratch_with_augmentation.keras")
test_loss, test_acc = test_model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")
# dropout과 augmentation이 없는 것보다 결과가 훨씬 좋다.

## 8.3. Leveraging a pretrained model

* A common and highly effective approach to deep learning on small image datasets
is to use a pretrained model

* **Pretrained network** is a saved network that was previously trained on a large
dataset

* Motivations:

    Lots of data, time, resources needed to train and tune a neural network from
scratch

    Cheaper, faster way of adapting a neural network by exploiting their
generalization properties

1. Take top performing pre-trained networks(convolutional base)
2. If we have small amount of data

    Freeze all Networks + New softmax layer for cats and dogs

    Training에 New softmax layer for cats and dogs만 사용한다.

3. If we have larger data

    Freeze some Networks + New softmax layer for cats and dogs

    Training에 top performing pre-trained networks의 일부도 사용한다.

* List of image classification models (all pretrained on the ImageNet dataset) that are available as part of keras : Xception
, Inception V3, ResNet50, VGG16, VGG19, MobileNet

* More available from
tensorflow hub

In [None]:
# Instantiating the VGG16 convolutional base

conv_base = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False, # classifier part는 제외하고 convolutional base만 가져온다.
    input_shape=(180, 180, 3))

In [None]:
conv_base.summary()

### Fast feature extraction without data augmentation

We’ll start by extracting features as
NumPy arrays by calling the predict()
method of the conv_base model on our training

In [None]:
# Extracting the VGG16 features and corresponding labels

def get_features_and_labels(dataset):
    all_features = []
    all_labels = []
    for images, labels in dataset:
        preprocessed_images = keras.applications.vgg16.preprocess_input(images)
        # vgg16 pretrained network
        features = conv_base.predict(preprocessed_images)
        all_features.append(features)
        all_labels.append(labels)
    return np.concatenate(all_features), np.concatenate(all_labels)

train_features, train_labels =  get_features_and_labels(train_dataset)
val_features, val_labels =  get_features_and_labels(validation_dataset)
test_features, test_labels =  get_features_and_labels(test_dataset)

In [None]:
train_features.shape

In [None]:
# Defining and training the densely connected classifier
# add last layer
# training is very fast because we only have to deal with two dense layers

inputs = keras.Input(shape=(5, 5, 512))
x = layers.Flatten()(inputs)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
      filepath="feature_extraction.keras",
      save_best_only=True,
      monitor="val_loss")
]
history = model.fit(
    train_features, train_labels,
    epochs=20,
    validation_data=(val_features, val_labels),
    callbacks=callbacks)

In [None]:
import matplotlib.pyplot as plt
acc = history.history["accuracy"]
val_acc = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, "bo", label="Training accuracy")
plt.plot(epochs, val_acc, "b", label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()
plt.show()

# 2 dense layer만 사용했음에도 불구하고 결과가 좋다.

### Fast feature extraction with data augmentation

Create a new model that chains together: 

1) data augmentation

2) freezing convolutional base

3) a dense classifier

In [None]:
# Instantiating and freezing the VGG16 convolutional base

conv_base  = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False) # only get convolutional base part
conv_base.trainable = False # conv_base는 이미 잘 훈련되어있는거라 훈련시키지 않는다.

Printing the list of trainable weights before and after freezing

In [None]:
conv_base.trainable = True
print("This is the number of trainable weights "
      "before freezing the conv base:", len(conv_base.trainable_weights))

In [None]:
conv_base.trainable = False
print("This is the number of trainable weights "
      "after freezing the conv base:", len(conv_base.trainable_weights))

In [None]:
# Adding a data augmentation stage and a classifier to the convolutional base

data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
    ]
)

inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs) # apply data augmentation
x = keras.applications.vgg16.preprocess_input(x) # apply input value scaling
x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

In [None]:
callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="feature_extraction_with_data_augmentation.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(
    train_dataset,
    epochs=50,
    validation_data=validation_dataset,
    callbacks=callbacks)

In [None]:
# Evaluating the model on the test set

test_model = keras.models.load_model(
    "feature_extraction_with_data_augmentation.keras")
test_loss, test_acc = test_model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")

# 이전보다 결과가 아주 조금 좋아졌다.

### Fine tuning a pretrained model

Fine
tuning consists of unfreezing a few of the top
layers of a frozen model base used for feature
extraction, and jointly training both the newly added
part of the model

last convolution block을 unfreeze하고 같이 훈련시키다.

#### step

1. Add your custom network on top of an already
trained base network
2. Freeze the base network
3. Train the part you added
4. Unfreeze some layers in the base network
5. Jointly train both these layers and the part you added

In [None]:
# Freezing all layers until the fourth from the last

conv_base.trainable = True
for layer in conv_base.layers[:-4]:
    layer.trainable = False

In [None]:
model.compile(loss="binary_crossentropy",
              optimizer=keras.optimizers.RMSprop(learning_rate=1e-5),
              # we use smaller lr
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="fine_tuning.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(
    train_dataset,
    epochs=30,
    validation_data=validation_dataset,
    callbacks=callbacks)

In [None]:
model = keras.models.load_model("fine_tuning.keras")
test_loss, test_acc = model.evaluate(test_dataset)
print(f"Test accuracy: {test_acc:.3f}")

# Many times it will improve the results

1. Convnets
are the best type of machine learning models for
computer vision
2. On a small dataset, overfitting will be the main issue. Data
augmentation is a powerful way
3. It’s easy to reuse an existing
convnet on a new dataset via
transfer learning
4. As a complement to feature extraction, you can use fine
tuning

# Ch9. Advanced deep learning for computer vision

1. Three essential computer vision tasks
2. An image segmentation example
3. Modern
convnet architecture patterns
4. Interpreting what
convnets learn

## 9.1. Three essential computer vision tasks

1. **Image classification**
: assign one or
more labels to an image
2. **Image segmentation**
: goal is to
“segment” or “partition” an image into
different areas, with each area usually
representing a category
3. **Object detection**
: goal is to draw
rectangles (called bounding boxes)
around objects of interest in an image,
and associate each rectangle with a

## 9.2. Image segmentation example

Image segmentation with deep learning is about using a model to assign a class
to each pixel in an image (such as “background” and “foreground,” or “road,”
“car,” and “sidewalk"

* **Semantic segmentation**, where each pixel is independently classified into a
semantic category

* **Instance segmentation**, which seeks not only to classify image pixels by
category, but also to parse out individual object instances

## Oxford IIIT Pets dataset

Contains 7,390 pictures of various breeds of cats and dogs, together with
foreground background segmentation masks

**Segmentation mask**
is the image segmentation equivalent of a label: it’s an
image the same size as the input image, with a single color channel where each
integer value corresponds to the class: 1 (foreground), 2 (background), and
3(contour)

In [None]:
# download data

!wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz
!wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz
!tar -xf images.tar.gz
!tar -xf annotations.tar.gz

# !wget : download file from the website
# !tar : unzip file

In [None]:
# directory 안에 있는 file 확인

!ls

In [None]:
# directory 안에 있는 file 확인

import os
os.listdir()

In [None]:
os.listdir('Images')

In [None]:
fnms1 = os.listdir('Images')
len(fnms1)

In [None]:
os.listdir('annotations')
# annotation : 주석

In [None]:
!cat annotations/README

In [None]:
os.listdir('annotations/trimaps/')

In [None]:
fnms2 = os.listdir('annotations/trimaps/')
len(fnms2)
# fnms1보다 크다 : 중복 파일이 존재한다는 의미

In [None]:
import os

input_dir = "images/"
target_dir = "annotations/trimaps/"

input_img_paths = sorted(
    [os.path.join(input_dir, fname)     # join해라
     for fname in os.listdir(input_dir) # input_dir에 있는 fname을
     if fname.endswith(".jpg")])        # fname이 .jpg로 끝나면

target_paths = sorted(
    [os.path.join(target_dir, 
                  fname)
     for fname in os.listdir(target_dir)
     if fname.endswith(".png") and not fname.startswith(".")]) # 중복 파일 제거

In [None]:
input_img_paths[:5]

In [None]:
target_paths[:5]

In [None]:
len(input_img_paths)

In [None]:
len(target_paths)
# 중복 파일 제거 성공

In [None]:
# 10번째 이미지

import matplotlib.pyplot as plt
from tensorflow.keras.utils import load_img, img_to_array

plt.axis("off")
plt.imshow(load_img(input_img_paths[9]))

In [None]:
# annotation

def display_target(target_array):
    normalized_array = (target_array.astype("uint8") - 1) * 127
    plt.axis("off")
    plt.imshow(normalized_array[:, :, 0])

img = img_to_array(load_img(target_paths[9], color_mode="grayscale"))
display_target(img)

In [None]:
# Load our inputs and targets into two NumPy arrays

import numpy as np
import random

img_size = (200, 200)
# resize everything
num_imgs = len(input_img_paths)
# total number of samples in the data

random.Random(1337).shuffle(input_img_paths)
random.Random(1337).shuffle(target_paths)
# seed number를 1337로 동일하게 지정해줘서 input과 target이 same order를 가지면서 shuffle 될 수 있다.

def path_to_input_image(path):
    return img_to_array(load_img(path, target_size=img_size))

def path_to_target(path):
    img = img_to_array(
        load_img(path, target_size=img_size, color_mode="grayscale"))
    img = img.astype("uint8") - 1
    return img

input_imgs = np.zeros((num_imgs,) + img_size + (3,), dtype="float32")
# (num_imgs,)는 7000, img_size는 위에서 resize한 대로 (200, 200), RGB라서 (3,)
# 따라서 결론적으로 (7000, 200, 200, 3)
targets = np.zeros((num_imgs,) + img_size + (1,), dtype="uint8")
# (7000, 200, 200, 1)
# 마지막 1은 1 or 2 or 3 셋 중에 한 숫자가 들어감
for i in range(num_imgs):
    input_imgs[i] = path_to_input_image(input_img_paths[i])
    targets[i] = path_to_target(target_paths[i])

# validation을 위한 1000개의 sample
num_val_samples = 1000

# split the data into training and validation
train_input_imgs = input_imgs[:-num_val_samples]
train_targets = targets[:-num_val_samples]
val_input_imgs = input_imgs[-num_val_samples:]
val_targets = targets[-num_val_samples:]

In [None]:
input_imgs.shape

In [None]:
targets.shape

In [None]:
# modeling

from tensorflow import keras
from tensorflow.keras import layers

def get_model(img_size, num_classes):
    inputs = keras.Input(shape=img_size + (3,)) # (200, 200, 3)
    x = layers.Rescaling(1./255)(inputs) # rescale

    x = layers.Conv2D(64, 3, strides=2, activation="relu", padding="same")(x)
    x = layers.Conv2D(64, 3, activation="relu", padding="same")(x)
    x = layers.Conv2D(128, 3, strides=2, activation="relu", padding="same")(x)
    x = layers.Conv2D(128, 3, activation="relu", padding="same")(x)
    x = layers.Conv2D(256, 3, strides=2, padding="same", activation="relu")(x)
    x = layers.Conv2D(256, 3, activation="relu", padding="same")(x)
    # maxpooling을 사용하지 않고 stride 사용

    x = layers.Conv2DTranspose(256, 3, activation="relu", padding="same")(x)
    x = layers.Conv2DTranspose(256, 3, activation="relu", padding="same", strides=2)(x)
    x = layers.Conv2DTranspose(128, 3, activation="relu", padding="same")(x)
    x = layers.Conv2DTranspose(128, 3, activation="relu", padding="same", strides=2)(x)
    x = layers.Conv2DTranspose(64, 3, activation="relu", padding="same")(x)
    x = layers.Conv2DTranspose(64, 3, activation="relu", padding="same", strides=2)(x)

    outputs = layers.Conv2D(num_classes, 3, activation="softmax", padding="same")(x)

    model = keras.Model(inputs, outputs)
    return model

In [None]:
model = get_model(img_size=img_size, num_classes=3)
model.summary()

#### The first half
of the model closely resembles the kind of
convnet you’d use for image classification

Encode the images into smaller feature maps that contain
spatial information about original image

Downsample
by adding strides rather than using
maxpooling because we care a lot about the spatial location
of information, **maxpooling destroy location information** (stride는 spatial location information이 남아있다.)

#### The second half
of the model is a stack of
Conv2DTranspose layers, inverse of the transformations

Transformation going in the opposite direction of
convolutions

### Up sampling

Motivation : Need a transformation going in the opposite direction of convolutions

* Generating images involving up sampling from low resolution to high resolution

* Decoding layer of a convolutional auto encoder

Neural network up
samplings: Transposed convolution, Fractionally strided
convolution

### Transposed convolution

* Going backward of a convolution operation such that it has the similar positional
connectivity and forms a one to many relationship

* We can express a convolution
operation using a convolution
matrix, which is nothing but a
rearranged matrix

* We similarly express a transposed
convolution using a transposed
convolution matrix, whose layout is
a transposed shape but in which
the actual weight values does not
have to come from the original
convolution matrix

In [None]:
# compile and fit

model.compile(optimizer="rmsprop", loss="sparse_categorical_crossentropy")
# 원 핫 인코딩을 한다면 loss에 categorical_crossentropy도 사용 가능
# 현재는 targets이 0, 1, 2의 값을 갖기 때문에 sparse_categorical_crossentropy 사용

callbacks = [
    keras.callbacks.ModelCheckpoint("oxford_segmentation.keras",
                                    save_best_only=True)
]

history = model.fit(train_input_imgs, train_targets,
                    epochs=50,
                    callbacks=callbacks,
                    batch_size=64,
                    validation_data=(val_input_imgs, val_targets))

In [None]:
epochs = range(1, len(history.history["loss"]) + 1)
loss = history.history["loss"]
val_loss = history.history["val_loss"]
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()

Reload our best performing model according to the validation loss,
and demonstrate how to use it to predict a segmentation mask

In [None]:
from tensorflow.keras.utils import array_to_img

model = keras.models.load_model("oxford_segmentation.keras")

i = 4
test_image = val_input_imgs[i]
plt.axis("off")
plt.imshow(array_to_img(test_image))

mask = model.predict(np.expand_dims(test_image, 0))[0]

def display_mask(pred):
    mask = np.argmax(pred, axis=-1)
    mask *= 127
    plt.axis("off")
    plt.imshow(mask)

display_mask(mask)

## 9.3 Modern convnet architecture patterns

A good model architecture is one that
reduces the size of the search space or
otherwise makes it easier to converge to a good point of the search space

Model architecture is more an art than a science. Experienced machine learning
engineers are able to
intuitively cobble together high performing models on
their first try, while beginners often struggle to create a model that trains at all

You’ll develop your own
intuition throughout this book

In the following sections, we’ll review a few essential
convnet architecture best
practices:
**residual connections , batch normalization , and separable convolutions**

We will apply them to our cat vs. dog classification problem

### Rdsidual connections