---
**Chapter 08**
# **Introduction to deep learning for computer vision**
---


In [None]:
# System Libraries
import importlib
import numpy as np
import sys
import os

sys.path.append("../")

# TensorFlow Libraries
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "1"
from tensorflow import keras
import tensorflow as tf


# User Libraries
from modules import Chapter_01
from modules import Chapter_08
from modules import Common

# Reload Libraries
importlib.reload(Chapter_01)
importlib.reload(Chapter_08)
importlib.reload(Common)

# Check GPU
tf.config.list_physical_devices()


# Module variables
batch_size = 2048
epochs = 1

---
# **Convolution theory**
---


### <ins />**Convolution layers**
  - Dense layers learn patterns of global input feature map
  - Conv layers learn patterns of local input patch
  - Convolution classification network has two parts:
     - Convolution base: Conv2D + Maxpooling
     - Logit: Dense
  - Convolution base:
    - Lower layers extract specific feature maps
    - Upper layers extract highly generic feature maps
  - Translation invariant patterns
    - After learning a pattern once, Conv layer can infer it anywhere in the image
  - Spatial hierarchical patterns
    - Upper layers learn generic patterns
    - Lower layers learn specific patterns
  - For problems where object location matters, dense layer is useless
  - Dense layer is also called fully connected layer

### <ins />**Feature map**
  - Height, Width, Channels (H, W, D)
  - Feature map size: HxW
  - Feature map depth: D
  - For larger images, we increase number of layers:
    - To increase feature map depth
    - To decrease feature map size
    - To increase model capacity
  - In a convnet
    - Feature map depth increases with each layer
    - Feature map size decreases with each layer
  
### <ins />**Input feature map (3D)**
  - 28x28x1 matrix input image

### <ins />**Filter**
  - Depth of output feature map
  - 32, 64, etc.

### <ins />**Kernel (2D)**
  - 3x3 matrix aka convolution kernel / structuring element
  - Same kernel is used with all input patches

### <ins />**Input patch (2D)**
  - 3x3 matrix patch of input feature map

### <ins />**Reponse map (1D)**
  - 3x1 Tensorproduct of kernel and input patch

### <ins />**Output feature map (3D)**
  - Combine all response maps into matrix
  
### <ins />**Convolution operation (sliding window)**
  1. Slide the kernel on input feature map
  2. Extract input patch
  3. Tensor product the kernel and input patch to get response map vector
  4. Combine all response map vectors (one per patch) to get output feature map
  5. Output map size is less than input map size due to convolution border effect and stride

### <ins />**Padding**
  - To counter border effect i.e. in order to have output map size equals input map size
  - Valid padding
    - No padding applied (default)
    - Output map size < Input map size
  - Same padding
    - Padding applied
    - Output map size = Input map size
    - Padding size depends on kernel size, not on input patch size
    - Padding for 3x3 kernel:
      - 1x row on top. 1x row on bottom
      - 1x column on right. 1x column on left
    - Padding for 5x5 kernel:
      - 2 rows on top. 2 rows on bottom
      - 2 columns on right. 2 columns on left

### <ins />**Stride**
  - Distance between two consecutive convolution windows
  - Not used in classification convnets (see maxpooling)
  - Downsampling mechanism
  - Stride=1 (default)

### <ins />**Maxpooling**
  - Max patch: 2x2 matrix of max value of input patch
  - Tensorproduct of max patch and input patch
  - Used in classification convnets 
  - Downsampling mechanism

### <ins />**Convolution vs Maxpooling vs Avgpooling**
  - Maxpooling:
    - Kernel=2x2 (pool size)
    - Stride=2
    - **Kernel uses max value of input patch**
    - Feature map size is reduced by **size / 2**
  - Avgpooling:
    - Kernel=2x2 (pool size)
    - Stride=2
    - **Kernel uses average value of input patch**
    - Feature map size is reduced by **size / 2**
  - Convolution:
    - Kernel=3x3 (kernel size)
    - Stride=1
    - **Kernel uses fixed value irrespective of input patch**
    - Feature map size is reduced by **size - 2**

### <ins />**Why downsample (stride / maxpooling / avgpooling)**
  - Reduce number of coefficients and overfit
    - Without maxpooling = 61952 coefficients 
    - With maxpooling = 1152 coefficients  
  - Increase channel-to-height/width ratio
    - Without maxpooling = 24x24x64
    - With maxpooling = 11x11x64

### <ins />**Data Augmentation**
   - Remix the already available information
   - No new information created
   - To further reduce overfit, use dropout
   - Augmentatiom and dropout layers have no effect during inference

### <ins />**Transfer learning**
  - Models trained on large dataset (different classes) serve as generic model of visual world
  - E.g. Model trained on ImageNet (animals) used as base model for bottle detection
  - Deep learning with convet is effective for small dataset due to  transfer learning
  - Types of transfer learning: 
    - Feature extraction
    - Fine tuning

### <ins />**Feature extraction**
  - Take convolution base of another model and add a new fully connected (dense) layer
  - Using dense layer from another model **should be avoided**
  - New dataset has similar classes:
    - Use all layers of convolution base except dense layer
  - New dataset has different classes:
    - Use upper layers of convolution base
  - Types of feature extraction:
    - Conv base not part of training:
      - Use conv base to extract features once before training
      - Feed conv base output to Dense classifier during training
      - Data augmentation layer can not be used
      - Faster
    - Conv base part of training
      - Data augmentation layer can be used
      - Slower

### <ins />**Fine tuning**
  - In fine tuning, only few upper layers are frozen 
  - In feature extractions, all layers are frozen

### <ins />**Maxpooling and padding use cases**

##### **Case-1: No padding. No maxpooling**

In [None]:
inputs = keras.Input(shape=(28, 28, 1))
x = keras.layers.Conv2D(filters=32, kernel_size=3, activation=tf.nn.relu)(inputs)
x = keras.layers.Conv2D(filters=64, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.Conv2D(filters=128, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.Flatten()(x)
outputs = keras.layers.Dense(units=10, activation=tf.nn.softmax)(x)
keras.Model(inputs=inputs, outputs=outputs).summary()

##### **Case-2: With padding. No maxpooling**

In [None]:
inputs = keras.Input(shape=(28, 28, 1))
x = keras.layers.Conv2D(filters=32, kernel_size=3, padding="same", activation=tf.nn.relu)(inputs)
x = keras.layers.Conv2D(filters=64, kernel_size=3, padding="same", activation=tf.nn.relu)(x)
x = keras.layers.Conv2D(filters=128, kernel_size=3, padding="same", activation=tf.nn.relu)(x)
x = keras.layers.Flatten()(x)
outputs = keras.layers.Dense(units=10, activation=tf.nn.softmax)(x)
keras.Model(inputs=inputs, outputs=outputs).summary()

##### **Case-3: No padding. With maxpooling (recommended)**

In [None]:
inputs = keras.Input(shape=(28, 28, 1))
x = keras.layers.Conv2D(filters=32, kernel_size=3, activation=tf.nn.relu)(inputs)
x = keras.layers.MaxPooling2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=64, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.MaxPooling2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=128, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.Flatten()(x)
outputs = keras.layers.Dense(units=10, activation=tf.nn.softmax)(x)
keras.Model(inputs=inputs, outputs=outputs).summary()

##### **Case-4: Model with padding, with maxpooling**

In [None]:
inputs = keras.Input(shape=(28, 28, 1))
x = keras.layers.Conv2D(filters=32, kernel_size=3, padding="same", activation=tf.nn.relu)(inputs)
x = keras.layers.MaxPooling2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=64, kernel_size=3, padding="same", activation=tf.nn.relu)(x)
x = keras.layers.MaxPooling2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=128, kernel_size=3, padding="same", activation=tf.nn.relu)(x)
x = keras.layers.Flatten()(x)
outputs = keras.layers.Dense(units=10, activation=tf.nn.softmax)(x)
keras.Model(inputs=inputs, outputs=outputs).summary()

---
# **Tensorflow dataset api**
---

- Efficient input batch pipeline
- Asynchronous data prefetching (fetch new batch while previous batch is being handled by model)
- normal vs uniform distributions?

In [None]:
dataset = [x for x in range(0, 20)]
dataset = tf.data.Dataset.from_tensor_slices(dataset)
# -----------------------------------------
# Shuffle
# -----------------------------------------
dataset = dataset.shuffle(len(dataset))
# -----------------------------------------
# Batch
# -----------------------------------------
dataset = dataset.batch(5)
for batch in dataset:
    print("Original: ", batch)
# -----------------------------------------
# Reduce
# -----------------------------------------
reduced = dataset.reduce(initial_state=0, reduce_func=lambda x, y: x + y)
print("Reduced: ", reduced)
# -----------------------------------------
# Map
# -----------------------------------------
dataset = dataset.map(lambda x: x * 0)
for batch in dataset:
    print("Mapped: ", batch)

---
# **Simple convolution network**
---

### <ins />**Dataset**

In [None]:
(x_train, y_train), (x_test, y_test) = Chapter_08.dataset()

### <ins />**Model**

In [None]:
inputs = keras.Input(shape=(28, 28, 1))
x = keras.layers.Rescaling(1.0 / 255)(inputs)
x = keras.layers.Conv2D(filters=32, kernel_size=3, activation=tf.nn.relu)(inputs)
x = keras.layers.MaxPooling2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=64, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.MaxPooling2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=128, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.Flatten()(x)
outputs = keras.layers.Dense(units=10, activation=tf.nn.softmax)(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model = Chapter_01.compile(model=model)

### <ins />**Train**

In [None]:
history = Chapter_01.train(
    x=x_train, y=y_train, model=model, epochs=epochs, batch_size=batch_size, callbacks=Common.callbacks()
)
Chapter_01.evaluate(x=x_test, y=y_test, model=model)
Common.plot(data=[history], labels=["Convnet"])

---
# **Kaggle cats-vs-dog dataset**
---

### <ins />**Dataset**

In [None]:
new_base_dir, train_dataset, val_dataset, test_dataset = Chapter_08.dataset_batches()
model_dir = "../resources/models/cats_dogs/"

---
# **Training from scratch without augmentation**
---

### <ins />**Model**

In [None]:
inputs = keras.Input(shape=(180, 180, 3))
x = keras.layers.Rescaling(1.0 / 255)(inputs)
x = keras.layers.Conv2D(filters=32, kernel_size=3, activation=tf.nn.relu)(inputs)
x = keras.layers.MaxPooling2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=64, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.MaxPooling2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=128, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.MaxPooling2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=256, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.MaxPooling2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=256, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.Flatten()(x)
outputs = keras.layers.Dense(units=1, activation=tf.nn.sigmoid)(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model = Chapter_01.compile(model=model)

### <ins />**Train**

In [None]:
callbacks = Common.callbacks(model_dir=model_dir + "01_train_scratch.keras")
history = Chapter_01.train_batch(train_dataset, val_dataset, model, callbacks, batch_size=batch_size)
Common.plot(data=[history], labels=["Scratch"], start_index=1)
Chapter_01.evaluate_batch(test_dataset, model)

---
# **Training from scratch with augmentation**
---

### <ins />**Augmentation**

In [None]:
def get_augmentation_layers():
    return keras.Sequential(
        [
            keras.layers.RandomFlip("horizontal"),
            keras.layers.RandomRotation(0.1),
            keras.layers.RandomZoom(0.2),
        ]
    )

### <ins />**Model**

In [None]:
inputs = keras.Input(shape=(180, 180, 3))
x = get_augmentation_layers()(inputs)  # New
x = keras.layers.Rescaling(1.0 / 255)(x)
x = keras.layers.Conv2D(filters=32, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.MaxPool2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=64, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.MaxPool2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=128, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.MaxPool2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=256, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.MaxPool2D(pool_size=2)(x)
x = keras.layers.Conv2D(filters=256, kernel_size=3, activation=tf.nn.relu)(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dropout(rate=0.5)(x)  # New
outputs = keras.layers.Dense(units=1, activation=tf.nn.sigmoid)(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model = Chapter_01.compile(model=model)

### <ins />**Train**

In [None]:
callbacks = Common.callbacks(model_dir=model_dir + "01_train_scratch.keras")
history = Chapter_01.train_batch(train_dataset, val_dataset, model, callbacks, batch_size=batch_size, epochs=epochs)
Common.plot(data=[history], labels=["Augmentation"], start_index=1)
Chapter_01.evaluate_batch(test_dataset, model)

---
# **Training using feature extraction without augmentation**
---

1. Pretrained convolution base is used for feature extraction
2. Extracted features are used as input to Dense classifier

##### **Convolution base**

In [None]:
conv_base = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False,
    input_shape=(180, 180, 3),
)
print("Weights: ", len(conv_base.trainable_weights))

##### **Features**

In [None]:
def extract_features(conv_base, dataset):
    features = []
    labels = []
    for batch_images, batch_labels in dataset:
        preprocessed_images = keras.applications.vgg16.preprocess_input(batch_images)
        batch_features = conv_base.predict(preprocessed_images, verbose=False)
        features.append(batch_features)
        labels.append(batch_labels)
    return np.concatenate(features), np.concatenate(labels)


train_features, train_labels = extract_features(conv_base, train_dataset)
val_features, val_labels = extract_features(conv_base, val_dataset)
test_features, test_labels = extract_features(conv_base, test_dataset)

##### **Model**

In [None]:
inputs = keras.Input(shape=(5, 5, 512))
x = keras.layers.Flatten()(inputs)
x = keras.layers.Dense(units=256, activation=None)(x)  # New
x = keras.layers.Dropout(rate=0.5)(x)
outputs = keras.layers.Dense(units=1, activation=tf.nn.sigmoid)(x)
model = keras.Model(inputs=inputs, outputs=outputs)
model = Chapter_01.compile(model=model)

##### **Train**

In [None]:
callbacks = Common.callbacks(model_dir=model_dir + "03_train_feature_extraction.keras")
history = Chapter_01.train_val(
    x=train_features,
    y=train_labels,
    x_val=val_features,
    y_val=val_labels,
    model=model,
    callbacks=callbacks,
    epochs=epochs,
    batch_size=batch_size,
)
# Plot history
Common.plot(data=[history], labels=["Feature extraction 1"])

---
# **Training using feature extraction with augmentation**
---

##### **Convolution base**

In [None]:
conv_base = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False,
)
# Previously learned weights on ImageNet will be overwritten if trainable is true
# Freeze all layers
conv_base.trainable = False
print("Weights: ", len(conv_base.trainable_weights))

##### **Model**

In [None]:
inputs = keras.Input(shape=(180, 180, 3))
x = get_augmentation_layers()(inputs)
x = keras.applications.vgg16.preprocess_input(x)  # New
x = conv_base(x)  # New
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(units=256, activation=None)(x)
x = keras.layers.Dropout(rate=0.5)(x)
outputs = keras.layers.Dense(units=1, activation=tf.nn.sigmoid)(x)


def get_model():
    model = keras.Model(inputs=inputs, outputs=outputs)
    model = Chapter_01.compile(model=model)
    return model


model = get_model()

##### **Train**

In [None]:
callbacks = Common.callbacks(model_dir=model_dir + "04_train_feature_extraction.keras")
history = Chapter_01.train_batch(train_dataset, val_dataset, model, callbacks, batch_size=10000, epochs=epochs)
Common.plot(data=[history], labels=["Feature extraction 2"])

---
# **Training using fine tuning**
---

##### **Convolution base**

In [None]:
conv_base = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False,
)
# Weights of last convolution block will be updated
# Finetune the last convolution block
for layer in conv_base.layers[0:-4]:
    layer.trainable = False

##### **Model**

In [None]:
model = get_model()

##### **Train**

In [None]:
callbacks = Common.callbacks(model_dir=model_dir + "04_train_feature_extraction.keras")
history = Chapter_01.train_batch(train_dataset, val_dataset, model, callbacks, batch_size=batch_size, epochs=epochs)
Common.plot(data=[history], labels=["Feature extraction 2"])